 I think like throughout the day like we have been concentrating on why red is like so now I think my talk will be a little bit about what ok. So now my talk will be about what beyond red is or like what is not the sweet spot of red is and like what are the alternatives. So though ok so I work for aero spike so I will be a bit biased towards aero spike. So whenever I get a chance I will be talking red is aero spike couch Cassandra volt whenever I get a chance. So I am sure like there are some hardcore fans of red is in this audience. So please come with an open mind because when you started I am sure like you would have used something other than red is and you fell in love red is once you started using it. So try other things you may fall in love with them. So actually red is and aero spike has an indirect relationship like two hop friends. So salivator is a main author for red is there is a person called Russell salivant who actually developed aero spike using red is code base. So they work very closely with together and aero spike acquired alchemy. So Russell actually work for us. So lot of the influences from red is actually made it to the aero spike. So morning I think Siddharth was talking about using lower inside UDF. So now aero spike also has an implementation user defined functions using lower and we have the reverse effect also. So red is is coming with up its clustering scheme. So aero spike was built with clustering from day one and aero spike had influence on the current design of red is as well. It goes both ways. Yeah, I think the my whole presentation is going to be centered error, aero spike and red is it's a different database, it's both friend and a for for red is okay. So just to have a short list of all the features I am going to dig deeper into each of these ones. But I think red is everybody like in many talks it was said like okay red is is very fast. So I want to use red is so both red is and aero spike are like very high performance. Both of them are actually tuned for in memory access aero spike goes one step forward and it's actually is tuned for a storage as well. But red is mainly structured as a data store. So the literally red is means remote dictionary server. So it's mainly for data structure storage. So it defines lot of operations up front so that you can just right away use it. But aero spike is a generic key value store. Now you you can define user defined functions and it has other functionality. So already also has optional persistence of aero spike has persistence from day one and all. So clustering is currently in beta for red is where is aero spike like clustering comes by default we do auto clustering auto sharding and auto rebalancing when the cluster stage changes will go deeper like into each of these aspects. So the very high performance I think the key ingredients for high performance I believe is C. So pick any other nos equal databases couch written in airline Cassandra written in Java volt is written in C. So volt is actually very fast. Sorry. Yeah. So I think like one of the very key ingredient for any high performance database is the core principles that we follow. So one of the core principle is the language choice that we had and it is C. So we can extract a whole lot of it from the machine. Both the things are actually tuned for memory and as I said like aero spike is tuned for SSD storage as well. Okay when we go to persistence I'll talk more about the storage model. So ready says single threaded whereas aero spike is multi threaded. So don't think that single threaded is bad single thread is actually very good in particular models because you don't need to have any locks. So if there are multiple processing getting blocked on locks like it can actually hurt your performance if you are not using them properly. So Valdi is also actually a single threaded system and it performs very good. So single threaded system comes with their own advantages and they come with their own disadvantages because they cannot exploit the multiple course very easily. So you have to run multiple instances of Redis to exploit multiple course on the machine. Okay with Redis it actually becomes easy because every instance you can run and bind it to separate course and you can get the full performance but it has a problem with that approach because if you are both master and slave or on the same machine and if the machine goes down you lose both the copies of the data. Okay aero spike is a multi threaded system which exploits multiple course but any multi threaded system will come with their own limitation to exploit all the course of the machine because the thread switching from one core to other core has a thread context switching overhead and it can kill the performance of your multi threaded systems if the course become more than 8 or 16. So for that like what we recommend is you can actually run multiple instances of aero spike on the same machine. If you have let's say 32 core machine you can run two instances of aero spike and aero spike has a replication model where you can say these two machines are these two instances are on the same box. So keep track of this fact in the replication it's called rack awareness. So when you have big blade machines you have like 10 of these machines sitting on the rack and probability of the whole rack going down is there. So if you have multiple racks in your data center and so you can say that all these nodes these 10 nodes are belonging to the same rack so don't keep both the master copy and the replica copy on the same set of machines. So it will choose a different machine to do the replication. So coming to the storage model like I was saying Redis is actually tuned or sorry its feed spot is in terms of the predefined operation that it comes with. So I was actually going through the Redis commands and I saw almost A to Z everything except for those five letters in the set of A-Pay that it has. So it is your chance to build some A-Pay to fill that gap okay. Aero spike doesn't come with this whole predefined operations it gives you basic data types like strings, integers, blobs, lists and maps and lists and maps can be nested one so you can have a map of a list or list of a map I think which is not possible in Redis if I'm not wrong. So you can do nested data structures in Aero spike and operations are not predefined so you have to write your logic of appending to lists or removing entries from the maps and all okay. So we have clients in different languages so let's say you talk about Java or Python so they have their own lists and map data types. So we map our internal lists and map data types to Java map and hash map actually. So you can directly access this map as a Java map. So we have mapping from our internal maps to the map sorry the data structure supported by the languages and actually we use the message message pack protocol to do this conversion. Okay, so in Redis like one of the thing is it should be sharded from the application layer. So application has to know which node it is contacting based on the keys, the way you sharded and things like that whereas Aero spike it was written from day one that the whole cluster is a one unit that's it and application should not know any details about this cluster. Okay, so when you do any operations you're just doing to some abstract black box which could be multiple set of machines okay. So when you do any operations you don't know which node it will hit. I think this came up in the discussion so the max number of keys that are supported by Redis is 2.32 whereas in Aero spike like all the keys are mapped into a 20 byte hash so we get up to like 2 power 160 max keys for the whole namespace across cluster. So I don't know how this changes in Redis clustering. So I don't have like much to talk in memory so it's like well known fact that like Redis is like very good in memory performance and it has this one of the most common use case for Redis is the LRU cache mechanism where you can set a TTL for this and your keys things will expire as the time goes on and so same with memcache I think we were talking about LRU expiry right. So Aero spike also comes with all that stuff so you can expire keys to evictions based on your resource limits and things like that. Yeah, I don't have like a lot hell lot to talk in for the memory part right. So I can actually speak about that like when I speak about Aero spike. Yeah. At some extent this too so that's why I'm saying like it's not to compare against Redis but it was to highlight like what alternatives you have beyond Redis like to overcome the limitations of Redis. Okay. So coming to persistence. Yeah. Mongo sucks in terms of performance. Mongo sucks for performance. Okay. So persistence it has optional persistence of snapshotting and append-only file. So I think like that was covered. So in case of append-only file like when you restart Redis like you have to replay the append-only file log and reconstruct the data so which could take time during restart whereas Aero spike storage model is totally different and it's like tuned for SSDs and it is also tuned for in-memory plus HDD copy. So Aero spike can run in three more three modes like one purely in-memory one purely SSD and the other model is like you have entire copy of the data in in-memory but you have a continuous snapshot incremental snapshot rather in hard disk. Okay. So Aero spike implement its own log structure file system just do this incremental snapshotting and so when when you have to restart Aero spike like we just have to seriously scan the entire data storage file system and just load it back. So we have something called warm restart capability. So where the indices are kept in memory even if the process goes down. So when when you restart the process you don't need to build the whole index by scanning the disk again. So this planned down times for upgrades is a very common thing during operations. So if you have to do a rolling upgrade of your entire cluster you just have to bring down like upgrade the binary and then restart the binary. So it can just attach to the previous state that it was running. So what can take 30 minutes before this warm restart now takes about like 10 seconds. So the replication. So for replication you have to manually configure masters and slaves in red is so whereas in a respect like if you define a set of machines like automatically we decide what is the master what is the slave and the same node can act as master for some partitions and slave for some partitions. This is actually the design that is being implemented for red is clustering also. So a set of nodes can behave both as a master and a slave and you can define what is master what is slave and the entire data set is actually sharded into 16k buckets in case of reddish. They call it hash lots if I am right. Whereas in a respect it is split into 4k buckets and these 4k buckets are distributed between the nodes of the cluster. Redis does this asynchronous replication between master and slave. So if master goes down there is a slight chance that some of the keys may be missing. Whereas a respect follows a synchronous replication because we give more importance that if like one node is down you shouldn't lose anything. So because like we were saying that without cluster like if you have master and slave you need to have lot more machines to have that redundancy and high availability. Whereas you can in a respect like you need lesser number of machines but bigger machines with bigger RAM, bigger hard disk and things like that. Sorry. So actually this makes a difference from an operational point of view. So if you go to any of your ops team and say okay I have a solution with like 10 nodes or like 20 nodes I think they will pick 10 nodes because it becomes easy to maintain. So like I was saying that the replication is rack aware so if you define that I don't want master slave to be in the same set of machines a respect will avoid that. Okay so somebody asked to not compare with red disk actually. So none of except Cassandra nobody else has rack awareness. So don't do both master and slave in the same set. Not if you have SSD and SSDs are becoming cheap so very soon you don't need to. So comparing with replication Cassandra and Couch don't by default don't do synchronous replication you have an option to go for synchronous replication with an extra cost of latency. We have tuned our network stack so much that like we can do better than anybody else when it comes to synchronous replication. Okay sorry we do have sorry cross data center so that means replicating across geography is not synchronous so across geography is always a synchronous yeah so in cross data center yes so if you have like one in Bangalore one in Delhi yes that's a synchronous replication. So and for cross data center replication I think only Couch has cross data center replication there is a lesser known database called React I don't know what else has sorry. So like I was saying right or the way we split the master and slave is the same act same node will act as master from some set of partitions we call it Redis called it hash slots for the some set of partitions it's master for different set of partitions it's a slave so if one goes one node goes down it takes down both master and slave copies of those partitions so you will still have one more copy because sorry yes yeah that's right but even when it is multi master thing I think so it has a quorum concept where it says unless I wrote to these many nodes I don't want to reply back to the application yeah but when you write you have to say what is my default quorum sorry Cassandra sorry I was correcting myself it was in Cassandra so sharding so with again Redis clustering lot of this thing change a respite I was saying that we do a hash base sharding we split all the entire key space into 4k buckets called partitions and partitions are assigned to nodes based on the size of the cluster okay and if a node goes down or a new node comes up like we do automatic rebalancing of the entire data in the database in the entire cluster so couch before 2.0 it didn't have auto rebalancing so you have to manually the operator has to go and say now do rebalance okay so the entire phase of rebalancing in couch used to be a blocking operation no in this case couch there it has it has rebalancing couch has rebalancing yeah it has no no couch only so couch now has with the latest 2.0 plus it has rebalancing auto rebalancing and background rebalancing so you can do reads and writes while you are doing this so clustering again okay so let's not talk about Redis so a lot of other databases have clustering where people has to specify what is a master what is a replica so you have to define your clustering masters and slaves whereas in a respect you don't need to define anything like that so and the way a respect forms a cluster is using multicast heartbeat mechanism whereas in Redis it uses TCP like point-to-point communication if you have n clusters it forms like n minus one square connections between each of them so whereas a respect uses multicast it has an option for mesh as well because there are some environments which won't allow you to do multicast like Amazon so you don't have another option other than to form this n minus one connections no it should be in the same subnet so this is not across availability zones and things like that yeah so we were talking with Amazon like so they were considering to enable multicast in VPCs not normally yeah yeah there's a lot of other clustering solutions need this multicast not in the late not in 2 plus I guess right is it open or closed okay yeah Reak has auto-rebalancing so coach has two versions of their product one is open source one is their enterprise so oh that's what you mean by cloud end okay cloud ends version open version does not so I mean though they say they are open source they are like almost two years behind the enterprise offering that they have so so the clustering is I think is one of the biggest differentiator for a respect compared to any of the other no SQL companies so I mean like customers ask like okay if they are used to couch Cassandre any of this like they say how do I configure my cluster so it's actually proud moment for us like they say okay four node cluster boot four nodes that's it so you boot four nodes automatically it will form the cluster and you don't need to define master slayers you don't need to define your sharding so the moment you boot this four node cluster you can start loading the data yeah so additional features okay again forget it is a respite so it has this after in a script which will tune the respect for the machine on which it was installed so if it has like multiple course multiple sockets on the hardware so we do affinity of the air spike threads to particular socket so that all the threads that do the same job don't switch from one socket to the other socket and like within socket we have association to the course in the socket and so if you have this advanced network cards which support multiple cues so one of the common thing people don't get too much is the network part we just install like a software like by a very big machine like with 16 core but many people ignore this fact about the networking layer thing if it is doing a lot of network communication all the software interrupts coming from the network layer come to single core okay some a Linux distribution like AC Linux and all sorry scientific Linux and all do this IRQ balancing automatically but this is a least exploited area by many guys so air spike has this afterburner scripts if it detects multiple cues on the network card so it will do affinity based on the interrupts going from one core to the same core back or it happens at the socket level sorry okay so we are asking so we have the cross data center replication support across geographies which will do in an asynchronous way so we have support for second indices like where you can define second indices across sets across different keys so Cassandra has this second index support but it builds even couch so couch builds the second index in an asynchronous way so it has a background thread which will keep repairing this second index if anything falls out so we make sure that we don't fall in we make sure that we don't need a repair so user defined functions is something which is actually borrowed from redis again like the common developer we have Russell so you can define anything that you want in the Lua programming so that just demystified about Lua it's not really bad it's like one of the fastest embedded interpreter that's there so it syntax looks somewhat pythonish you can say it has its own constructs for loops if conditions all that stuff so it's not hard to program in Lua so with UDFs you can write your own operations on the record so you don't need to pull out the entire record and do the manipulation and then write it back so you can just put your UDF modify the record there and just return the result whatever you want and save a lot on the network transfers first of all and you also save sorry your programming model becomes easy if you have to do atomic operations it becomes very easy because it is single threaded per UDF per key so streaming aggregations streaming aggregations is an infrastructure built on on top of the UDF infrastructure that is there so it's like your map reduce infrastructure where you can define your map functions your radius functions so the first phase of this map reduce happens on every node so if you have one big map reduce job it happens all the nodes of the cluster you aggregate the results at the client and the final reduction phase happens at the client layer so this map reduce syntax looks like more of this python's map reduce and all so it's not like the Google's map reduce it it has a different style of map reduce but before Google came this map reduce concept existed and the form that map reduce was in the is the form that we are using so Google came up with a different form of map reduce so then these other stuff like monitoring GUI consoles and all that stuff so what does it mean for people coming from Redis like we are actually one of our Dalper has actually quoted some of the list and maps API that Redis has into UDF so you don't need to write the logic of UDF on your own so you have to just invoke these UDFs that are there so of course like there will be some DNM mismatch like for example Redis has this blocking list and block yeah blocking list pops push right so we cannot support that we don't want to block anything the connections are like should be quick operation should be quick so we don't have that support otherwise like I think we have covered decently the basic list and hash map implementations of Redis and definitely you are welcome to support other API I think that's pretty much it from my side so we were planning a meet up on a workshop on third thing like Kiran will send you so yeah that's it from my side I think benchmark always goes into this opinionated debate thumbtack has done a benchmark is a third party guy there is a tool called YCSB yahoo cloud source benchmark which actually has plugins for different no SQL databases so they did the benchmark using YCSB and lspike actually performs much much better than mongo very good in compared to kashandra the biggest advantage that ira spike has there in territory to couch and kashandra is the predictable performance that ira spike brings so kashandra written in java right like it performs good suddenly the garbage collect kicks in and everything goes down for a while and then comes up so and again like when the whole cluster is getting rebalanced and all there are small small small pockets where things get blocked so we do much better in terms of consistently they're delivering this high performance but if you want specific results like it is there upon our website also so I think like I think because of the additional features that ira spike brings not because of the pure single node performance so there not purely as a developer because ira sorry redis brings you all this predefined operations that comes so there I think redis clearly wins our ira spike but because it brings these other operational needs of for something which needs to be continuously up the clustering replication synchronous replication not a rebalancing I think that's where the real strength of ira spike is not it not for a pure developer but it's like an application as a business need more no ira spike don't have a hosted service but that's what many people are coming back and asking we want to use a respect as a service so I think the intermediate plan is to give a dedicated host service for people so if you come and ask I want ira spike I'll say okay we may manage the whole thing for you so you just have to do your reads writes analytics whatever you want to do you don't need to bother about system maintenance that part we will take but I think Amazon actually spoiled the whole world say now everybody sees software as API so everybody want to go to that level so I think there's nothing wrong with that so software as a service so we may eventually go there but our first step will be to maintain individual clusters for people yeah so right now it is a close source product and a paid product we have a three levels of offerings one free edition which is like you can do two node cluster up to 400 GB of data forever you can go commercial production like no licensing needed we have a startup edition which has more limits per cluster I think I don't know the financial part of it we have a startup package and we have full enterprise package where you get this cross data center replication and all so by default you don't get cross data center application but I cannot commit anything now we are actually collecting feedback from people how many people would want to use ira spike if it is open source so okay I want to see raise of hands actually if you can after this talk right if you can come to me and just tell me your use case your company I am going to tell my CTO saying that this is the feedback that I got and like we are seriously considering actually we went to the stage of deciding which license to use and if you have any particular preference which license you want it to be so pure GPL is very restrictive if you write anything on top of ira spike you have to make that open source oh no that is LGPL LGPL is GPL is fine even if it is if it is a network calls yeah so yeah definitely if you have choice of particular licensing you should let us know so it's like voting for ira spike to be open source so yeah so whatever I speak I don't think you will believe me to 100% so you should just try it so trying out is also one step okay when many customers that users has actually use cases that is there published on the website so I don't know how many people are from ATTEC industry so we have our biggest our biggest sweet spot in the ATTEC industry so apnexus is our customer so apnexus is probably the third largest rtb platform in the world and they do about a billion transactions every day on ira spike I have not talked too much about how ira spike design suppose my question to you is just a matter of it is catching up with you on all these clustering things or is it because decisions you have made it is can never catch up and you are always going to be at it I don't think like what we achieved is impossible to achieve so we are ahead of the race that's all I can say we don't do memory mapping so we have our own arenas cut off so like we allocate in terms of one gb up front and we maintain the whole as our own map we don't do memory mapping we don't do memory mapping and like I was talking about this warm restart feature right so for that like we have a shared memory so entire state is in shared memory even if the process goes down that will be lying around we can just come up and reattach yeah actually I don't hear much about react like when we started we when we started like we were hearing a lot about react and I think you know the news that all their top management left and that stuff so again react again their cross data center everything is paid they have some free version part the cross data center and all is paid so I don't know feature to feature sorry is also okay so I don't know feature to feature comparison about react we don't bother much about react now so we bother a lot about redis Cassandra couch volt we don't bother about mongra I think both like we have sweet spot for both so a snap deal like who recently brought came to a respite so they had like 10 mongra db servers replace 10 mongra db servers with two 2 a respite servers so they are using for a lot not a lot of cash they are using for their inventory management so they have a deals so they cannot give these deals to more than 10 people like they cannot give one deal to 100 people let's say they have the hotel deal so they have to limit that deals when they give so this is a very high very high throughput real time traffic for them so if they keep hitting database for all of this like there they have their version of truth which is my sequel the for their very high persistent needs but they are using a respite for the caching needs for this real time inventory management elastic search is like in the real sorry analytics domain space right so I think analytics is not our sweet spot so though we have this real time aggregations it's only real time analytics which you do on small portions of your data it's not that you have a petabyte of data and you can just process in seconds so our sweet spot of analytics is in the seconds range whereas all this map our elastic search sweet spot of analytics is in the hours range or minutes hours range so they don't mind running it on 1000 machines for 10 hours they generally they do it for end of the day processing whereas these analytics are real time analytics so you want to know how many ads you showed to this particular guy in the last 1 hour if you already showed them 5 ads like from let's say Reebok you don't want to show him any more ads so for that you have to track how many ads you showed how many visits he made in real time and you have to react to that immediately so for that analytics is real time analytics so I think they are in their separate sweet spots that's it from my side so please come for this a respite workshop we have our co-founder is coming he is going to present and we have thanks