 So, we'll first start about like what's a high level about error spike, where does it fit in your use case? And we'll quickly dive into the architecture of error spike. Why we think it is like superior compared to other non-seq file databases. So, why should you even consider error spike? So, this is not because we think we are great, but like people have been giving us credibility. So, this is Gatner research and they have a magic quadrant where they do analysis of softwares in a particular field. So, this particular magic quadrant is about DVMA systems, all DVMA systems, SQL, no SQL, everything put together. So, the top half is about like proven guys, like who have proven beyond doubt. So, if you see in the leaders, like you'll see IBM, Oracle, Microsoft, so who are proven beyond doubt. And if you see the left, lower left half. So, you'll see a busy activity going there. So, you can see all the popular names like MongoDB, CouchBase, Cassandra, all these guys, trying to execute and trying to figure out what needs to be done. But if you see one lone dot there, which is representing error spike in the visionary quadrant, so who thinks that they got the equation of the future right. So, I think like that's a big credibility that we got from Gartner. So, I just told you like where the world thinks we fit. And like, now this is like what we think we fit in the world. So, the top half is about your OLTP systems like your MySQL, which are totally off, and the NoSQL databases which you use to serve data in real time. All the OLTP systems. And the lower half is about your analytics softwares. So, the traditional analytics softwares are your Teradata's or Vertica type of systems. And on the right-hand side, you have Hadoop systems. So, in general if you see, the left-hand side is your monolithic single node systems. On the right-hand side, you'll see cluster solutions. So, we think the error spike fits in the OLTP space, but we are slowly expanding into the analytics space because like in our newer version of the product, we have aggregations, which are similar to, which is a MapReduce kind of framework. Not the traditional Google or Hadoop MapReduce framework, but we have our own MapReduce type of framework, okay? So, I'll quickly skip these lines because, so in general if you see what is big data, so the way we try to think about big data is like in three different categories. One is the volume, the other is variety, and the velocity of the data. So, volume is like you keep collecting all the web hits or your mobile tracking data, all the information and show it to your data stores, and you analyze them in the long-running jobs, right? Typically it is your logs and where you run your Hadoop MapReduce jobs and extract meaningful information out of this raw data, okay? And variety comes from different sources of the data. You have data coming from web, mobiles, sensors, all these things, and like all these different sources of data will have different types of representation of the data, okay? So, you have to transform in a unified way all the data, aggregate it and store it and use it later. And then comes to the velocity. So, the beauty of velocity is whatever you collected through your volumes and like analyzing that, the data coming from your variety of sources, you have to extract meaningful information and store it somewhere and use it, right? Without using it, all the data that you're collecting and analyzing is useless. And because you have a lot of data and you want to target a lot of people, so the systems which are serving velocity should be really, really fast. And that's why we have typical OLTP systems coming to picture. And like that's where we think AirSpike fits. So, this is like high level stuff, like AirSpike is all flash optimized SSDs. So, it's not that like, now that SSDs are popular, AirSpike started supporting SSDs. It's five years back, we decided that SSDs is going to the picture and like we never bothered about rotation file systems, okay? So, AirSpike can be configured in like three modes like which is totally in memory or totally on SSD or you can have in memory plus rotation storage also. But we don't support part of the data in rotation space but part of it in memory, we don't support that and we think that's going to be stale. So, this is throughput numbers, like you don't need to believe any of this now because you're going to see yourself when you're trying AirSpike today. But just to quote it, so this is a YCXB benchmark like where we reach you like 1.6 million TPS on a four node cluster in an in memory configuration. And single node clusters, if you want to see like we have blog posts talking about how to reach 1 million TPS using AirSpike on a single node in memory configuration again. And everybody thinks that like Amazon machines are slow, like you all are virtual machines, you cannot really extract very good performance from Amazon. So, we brought those barriers also, we were able to reach very close to 1 million TPS which is about like 960K TPS on Amazon machines. And the second, sorry, the last phase of this talk, I'm sure is going to talk about how to achieve that in Amazon machines. What are the choices you have to make on Amazon to get best performance on AC instances, all TCO stuff? So, we'll quickly go into the architecture part. So, probably this is most interesting for the leaks. So, if I had to summarize the whole of beauty of AirSpike in one word, it is just the hot spots in the whole system of AirSpike. So, my challenge for you in the next 20, 30 minutes is to find a hotspot in our system. There's no good answer, but let me encourage you to say that there are few answers which will come close. So, what do I say like, why do I say that there are no hotspots? So, if you see at different levels, we have nature that there are no hotspots anywhere. So, at a data distribution level, that means that sharding where we don't do rain sharding, which can typically cause hotspots. So, there is a default sharding mechanism, auto sharding mechanism, which is hash based, which will avoid any hotspots. And like note to note communication, the intranote communication, there's no hotspot. There's no special node in our cluster at all. Okay, so, every node is same. It's like the dynamic model. Dynamo model, I'm not sure if they have a cluster manager, cluster principle, where all the, for all the important nations, you go to the cluster, cluster manager, or coordinator, okay? So, you mean the hotspot? Okay, sorry. What? What do you mean, what do you mean? So, sorry, not necessarily failure. So, let's say there's a cluster coordinator, and if that cluster coordinator fails, and immediately I choose a different coordinator, that's not a single point of failure, but it's definitely a hotspot. Same with node to client communication. So, typically if you see other node secret services, like they have a configuration where like you say, you talk to one node in the cluster, and that node will easily turn tough to the other nodes in the cluster. We don't have that type. So, we'll talk in detail all these things. So, internally within the software, at third level, again, all the load is distributed between all the worker threads that we have. In terms of exploiting the cores on the machine, like we do a very good job of balancing the load on all the cores of the machine. Same at network, I think like network is one of the least exploited by most people. Because like you buy very expensive hardware, like with 32 cores or whatever, like you do a network on another board, that's going to suck. So, what you should invest is in network card which has multiple queues, where you can assign the interrupts coming from these multiple queues to the cores. So, a respect has an afterburner script which realizes how many cores you have, how many network queues that your network card has, and automatically distribute these network queues, or sorry, assigns network queues to the cores so that interrupts come in them or even it distributed across your cores, okay? And SSD level, so we have our own file system, we don't use the X-T4 file system. You can, you have an option, but we think that's not very fast. So, we have our own file system which does striping across hard disks or, sorry, SSDs. So, if you have multiple SSDs, you don't need to put them under a red controller to have good IO parallelism to do striping on things like that, okay? Just before diving into the architecture, so if you see like when you scale, right, things start to fail, like, and we have contingency plan or like mitigation plan rather, how to handle these things, like speed, right? So, every small bottleneck gets amplified when you're talking about scale. So, maybe like when you're doing it like 10K TPS per second, probably these things are not so important, but like when you're talking about 100,000 and a million transactions per second, every small thing matters a lot, okay? So, you cannot ignore them, even the smallest bottlenecks. So, machine starts to fail, right? Like if you have 100 node clusters, the probability of the failure of a node grows exponentially with the size of the clusters. So, it will be a nice solution if the software or whatever system you're using can do the same job in lesser number of clusters. So, because of the mental scaling that we have at AirSpike, so the same job can be done with a much lesser number of machines. So, network starts to fail. So, this is actually a hard problem and AirSpike is a AP system. So, what that means is if your network gets partitioned, it sacrifices consistency and we fall back to eventually consistent model where if the network partitioning restores, we do an eventual consistency, we merge all the data back and make sure that it is consistent. So, the load expectation. So, like you think, okay, for me like 30K TPS per second is enough, but like suddenly like your app takes off and like suddenly you are at one million TPS. So, expectations start to fail. So, you should have a lot of headroom and like we think AirSpike gives a lot of headroom. So, operations is again like an underestimated thing. So, if you have again like 100 or clusters, like somebody has to manage all these machines, right? So, and like typically they make mistakes. All these automation tools like puppet shaft makes this job easier, but like still mistakes happen. Some wrong configuration get pushed or whatever. So, what AirSpike does is like it removes most of the configuration needs. So, we do auto clustering, you don't need to set up something like, this is my master, this is my slave, you don't need to say any of that. So, to do a four node cluster, especially in Bell-Mettin machines, we support multicast. To have a four node cluster, you just put up four nodes. It will form a cluster on its own and it will also have automatic assignment of master and role nodes. It will do auto sharding. So, you don't need to say how to shard the data, you just can start doing deed sunrise. And auto rebalancing, if like one node goes down in your cluster, you didn't need to wait for some manual of it to come and say, okay, now rebalance this data. So, it automatically does rebalancing. So, and like this is like very standard, nothing unique to AirSpike. So, we pick the best default configuration, yet retain the ability to change them when needed. So, quickly about the data model, this is very typical of any classical databases, like you have namespaces, which is analogous to databases. So, database carries some properties like saying, where should this file, where should the data go, like which devices, which files it should go, like is it in memory configuration or is it on disk configuration and things like that. And we have something on sets, which are similar to tables. Just to have a logical organization of your data, nothing more than that. So, you don't need to create them in advance, like you can just create them on the fly. Okay, so record, so records will be like a key value pairs, but like the value part will have multiple bins, where you can set the bins, read specific bins or read all the bins of the record and things like that. So, the data types we support is like strings, integers, maps, lists, and we have something called LEDs, okay, why LEDs? So that restriction on the record size is the right block size that you define. So, in memory, we don't have a limit, you don't need to define a right block size. So, in memory, the limit is much higher, but when you use SSDs, what we have found with SSDs is like, if it is around like one MB right block size, your SSD behavior is very good in terms of IO, both reads and writes. So, we recommend to people to go about one MB. So, it becomes, one MB becomes your record size restriction. So, to break that barrier of one MB, we have something called LEDs, where we manage internally that one record spanning multiple blocks and we take, we absorb the complexity. So, you can say, you can think of it as ready list. So, we just keep adding to the list and just reading from the list. Okay. So, if a block size like a flight right block size is one MB, what abbreviation does that have on IO? IO performance. So, the number of throughput kind of reduces. Does that answer your question? I mean, so, you want to read block by block, right? But your records may be much smaller than that. Yes. We don't read the entire block, but we know the offset where we should start reading and if your record is like much smaller, let's say a hundred bytes, we packed a lot of records into this one block. So, we know where to read from and how much to read. So, your block is not pushing you to read it and now? No. No. No. No. Do you support commons? Sorry? Do you support commons? Do you support commons? Yes. Operations we support. So, we have increment, decrement, append, all these things. So, any questions about the data model? I think, oh, it's fine. So, in terms of architecture, I kind of already mentioned that like we have all the nodes in the cluster are equal for us, okay? So, everybody will be a master for some of the data and the replica for some part of the data. I think we'll cover that in detail later. And like then the clients keep talking to the clusters, so they will discover the cluster and they will talk. So, you just need to concentrate on your application logic. You don't need to do connection pooling anything yourself. Our client layer will do that for you, okay? And we have a cross data center product which will shift data from one data center to the other data center. So, the application within data center is synchronous replication. So, unless the write was written in two places in two nodes, we don't return an acknowledgement back to the client. But the cross data center geography application which happens across van links, it is done in an asynchronous way. So, this is a cross data center application. So, we tried to put no restrictions on the configuration between the source data center and the destination data center. The only restriction that you have is that both of them should have the same name space. The name, database name, that's it. So, let's say your source can be your in-memory configuration, your destination can be your on-disk configuration. So, how you can use, exploit this fact, let's say you have temporal data where all the writes coming like you write to this where you have an expiry period of, let's say seven days or two days, whatever you want. But you want to have more historical data on your other cluster. So, you can use XTR to keep streaming all the writes to your more persistent cluster and use this for your temporal needs. So, you can configure in multiple ways like active passive mode, active active mode. You can have a chain, so A ships to B ships to C. You can have mesh modes like A, B, C shipping to each other, parallely, all that stuff. Do you have a rack awareness? Yes. Rack awareness comes in the single node, sorry, within the clusters. Yes, so we'll talk about rack awareness also. Yeah, so I think like everybody keeps talking about disaster recovery, but I don't know how many people actually faced it. We actually faced it in Aerospy. So, during super strong standing, like this exhalates use case, but they had a super strong standing in New York, so their data center was getting flooded and they had half an hour to change everything. So, they had this replication ring going on and they just need to take out the New York data center out of it and they just short circuit the New York data center. So, the rest of the three data centers they have just continue to work. Like we got a good quote from that CTO. So, okay, we'll go a little deeper now. So, how do we do auto sharding? I was talking about that. So, every key that could possibly come in is already, let me step a little bit back, okay? So, we have partitions, okay? So, all the data will go into one of this four, okay, 4,096 partitions, which is hard-coded in the product. So, any key that comes is hashed using RIPE-MD hashing. So, RIPE-MD hashing is one of the best hashing algorithms which is used in the security domain for the least number of collisions, okay? So, any key coming into the database will be hashed into a 25 digest, we call it the digest, using this RIPE-MD hashing. So, based on the hash function, that is generated on the key, it will be assigned to one of these 4K partitions, okay? So, once, so these partitions are kind of virtual, right? So, they are not predefined or tightly bound to a node. So, depending on the cluster size or the number of nodes in the cluster at that moment, we assign these partitions to one of these nodes. So, if you see this whole sharding mechanism, one first-level sharding is static sharding, you can say, because that's purely based on the digest. And the next level of, sorry, I should not call the second level as a sharding, it's just a distribution. So, we assign the partitions to nodes depending on the size of the cluster. And as a cluster size changes, we reorganize the partitions and we'll go into details of how we do that part, okay? But partitions are a static area, okay? Not static, that's what I'm saying, it's dynamic. We take care of that. So, for every partition, there will be a master and a replica. So, who is the master, who will be the replica, will be decided on the flight? That part is dynamic. But once it has been decided, Yes. It will stay like that, sir. Unless the cluster size changes, it won't change. What? And the cluster will change only then. When a new node is added or an existing node goes ahead. Different partitions can be different. So, in this 4K partitions, let's say you have four nodes. Every node will be owner or master for one of the thousand partitions, right? So, every node will be master for a thousand partitions and every node will be replica for thousands partitions, more or less. You will take like maybe a little bit imbalance of five percent and ten percent. But then, this will still create non-balanced data. If you say like a non-balance to a amount of five percent to ten percent, yes, that's what I do. So, our internal test case has this test where they say like anything beyond five percent imbalance is not tolerated. So, the hashing is a trick here. So, the IPMD hashing is like one of the value I have assigned it to be most evenly distributed and least collision, both. So, this four zero nine six partitions is that configurable and not pre-certified? No. Unless we change in the product like which is hard coded, you cannot do that. And this is relevant to the number of nodes you have. That's right. And why did you decide on this number specifically? We think like 4K is enough. Let's say like, if you have 4,000 nodes then probably this is not a good number. But like with, we have internal again hard coded number to 128 maximum cluster size. So, if you have 128 nodes, this will be like 400 partitions each, right roughly. Okay, there are a bit. So, I think like 100 partitions owned by one node and like having replica is not a bad number. And is this hash function, is that confidential as well? No. This is a standard hashing algorithm. You get it in public. So, we'll go into clients. So, I was telling like the clients are also intelligent. So, what you have to do like when you want to talk to a cluster is you just give one seed idea of the cluster. So, the clients will figure out the rest of the cluster top because the cluster keeps talking between themselves and they discover the full cluster. So, they will communicate this information to the client and client knows, client will also ask, go to every node and ask, who are you master for and who are you replica for? So, if a request comes for a read or write for a particular key, it will know exactly which node to go to to do this write or a read. So, write always goes to the master but read can go either to the master or the replica. Okay. That becomes your single point. Again, not hotspot because there are like 4,000 partitions and this is evenly distributed. So, if you see like the load will be very evenly distributed. But, who will take the request? One client is typically a bad idea and you're talking about app servers layer and database layer. So, typically you will have like 40, 50 web servers running in front of like a corner cluster. Just sometimes the splitting of partitions will take 40 partitions, which is like starting from one again and then it goes to 30. Oh, no, no. Pre-defined, like the moment the cluster starts, this is 40. What if your partition will be like 100? It cannot go because again because of the hash function. So, hash function will evenly distribute all the keys in these 4K partitions. So, out of this digest, we put a few of the bytes and then again hash it to one of these 4K partitions. So, the hash function does all the trick of load balancing here. So, the hashing must be happening on one or two key. I mean, it could be composite key or single key. One key, right? It's a key value store. Right. So, every record will have a key and a value part. So, the hash is based on the key. On a key. On a key. Not on a computer. So, that's a unique identity for any record. Why? I mean, can I define that key? Oh, yeah. I can define that key. So, if then you rank a mind that enough is so that it will be in your list of 30 when the macros are gone. Correct. That's right. So, we don't take your key as it is and assign a partition to that. That would be a bad idea because that would create imbalance. So, how do you ensure data publication? Just in case there are certain sort of keys that are typically accessed together. You shouldn't assume collocation or you shouldn't plan in a way that these keys have collocated. So, I will do something with this. You shouldn't do that. So, it could so happen that you have multiple key clusters. Right. So, you have one data set. Yeah. For which you have one set of key value pairs. And then there is another data set that has a different set of key value pairs. Yeah. So, then you can organize into namespaces. So, like you put everything into this namespace or like a different namespace or like put it into sets, which are another tables. So, you should not plan your application to have data collocation across keys. That's your rain chatting, which has, which comes with its own problems. But the bins are collocated. Bins are always collocated, yes. So, you could, that is the case if you can choose a key which I know I can put. That's right. So, we have this list, maps, LTTs, data types. So, if you want something which needs very tight coupling, that's what you should be doing. Thank you. Okay, so, we're talking about client. So, the client, you just need to give a Cdip, it will discover everything, it will discover master codes. Directly send the read rights to the correct node. So, it's a single half from client to the cluster node. So, again, like we give all these semantics, like you can generate a transaction ID from the client and say for which I'm getting reply for doing asynchronous client libraries. We have asynchronous client libraries and we have auto retry logic in our clients. You can choose to disable that. So, we have all those options. So, SSDs like is one of the biggest strengths of Aerospy. I was saying like we have our own file system, we can bypass your file systems. But if you choose to use file system, you still have an option for that. So, one important thing to note in Aerospy is that all our indices are in RAM, always. Irrespective of your data is in the disk or not, the index is always in the RAM. So, why do we do that? It's to have very predictable latency. So, if you have everything in RAM like and the index trees are P trees, like the time taken for any key lookup is the same. So, and after you do the index search, you directly go to the disk, you know the location of the disk and how much data you read, you directly read it from the disk. So, you may start thinking that like if my process crashes, all these indices are gone, so I have to rebuild my index. So, we have a trick there. So, we use something called shared memory. So, if you have used shared memory, there's something called anonymous mode of setting up a shared memory, which doesn't need to have a process to set it with the memory. So, even if your process goes down, maybe you're doing an upgrades or anything like that. So, if the process goes down, like we, when you start the process again, we know exactly where to attach in the shared memory. We directly attach and we just quickly do a sanity check, whether the entire cluster index is in sync with the data, just the basic checks, and save it, we are ready. So, what you used to take like about half an hour before this particular feature, now takes about 10 seconds to restart your process. So, your indexes are not persisted? Every node will have its own index. So, in a way, index is also activated. Index is not persisted to this, if that's what you want to say, yes. And it is never persisted to this. It's never persisted to this. So, in the event that the whole machine goes down, which is much more rare than a process going down, we rebuild the index, we have to rebuild the index. There's no shortcut there. So, the index is an indicator, until you shift the index on the machine. There's no point, right? Anyway, you have to ship the whole record. So, again, do the operation of your right. Okay, index is a B tree, right? Sorry? Yeah, yeah. Index is a B tree, right? So, what if there is a lot of random data inside? So, every day, I get like 80 users and 20 users, every 30% of the data I see. What is the kind of operation I mentioned? Yeah, so, if you have to qualify more than a B tree, it's a red-black tree, sorry. I said, we treat it as a red-black tree. So, when it is a good data structure to support both a balance of reads, writes, and deletes. So, if you do a writes, if you just take a fully balanced binary tree, and if the root node splits, root node has to merge like you end up reorganizing a whole bunch of your data. But whereas the beauty of red-black tree is only you have a subtree of it just getting reorganized. So, it's a very good data structure to balance reads, writes, and deletes. So, you don't have like one bad phase where the entire tree is merging, like which takes like one second maybe. We never have that situation, okay? So, and like, okay, so, so we, okay, so talking about SSDs, right? If you take any software, replace your rotation disk with SSDs, it will go very fast, no doubt about it. But if you do that, the software will very quickly burn your SSDs if it is not handling the SSDs in the right way. And we think like we handle SSDs very properly because we never do in place updates. So, because writes can come at a very high rate. So, if you go and update, keep updating a same block. It's not like your rotation disk where it writes exactly at the same block where the original data was. So, what SSDs does is, so it will have its own blocks because it's like a EEPROM memory. It cannot update unless you erase the whole block. So, what does, what it does do? It has like 1K, 4K, whatever block size it has internally. Even if you update 100 bytes in that, it will create, it will copy whatever is there in this 1K block and modify your 100 bytes that you are trying to modify and write it in a new block. So, you are very quickly going to burn your SSDs if the software is not aware of this. Okay, so we never ever do in place updates. Sorry? Do you want me to append to log? Yeah, we have a log sector kind of file system where we read the original content and then do a buffer write in memory. So, when once we have this write block size like which is configured with maybe 1MB, once this 1MB block is full, we just write the whole thing into a disk. Okay? So, because of this thing like some, you have to mark some of the old records as deleted. So, it will keep generating some garbage or fragmentation. So, we have a background job which keep running to clean up this garbage. So, I already mentioned about this fast restart using shared memory when you are rebuilding the process has to restart. I also covered saying that like, you don't need to put this multiple SSDs or yeah, SSDs under a rate controller to have striping. We do that ourselves. So, if you attach four nodes again, this is based on the digest, we choose which SSD it will go to and we do the striping ourselves. So, again, no hotspot even at SSD level. Like you don't need to put every, all the data into one SSD. So, you can attach 10 SSDs to your box and say like, I want the namespace to use all these 10 SSDs. Okay, what we do is based on the hash which is used for partition assignment everything, the same hash function is used to decide which disk it will go to. This record will go to. So, again, because of the even distribution of the hash, all the SSDs that are attached to this namespace are evenly used. So, again, no hotspot at disk level also. So, there are more in-place updates. Yeah. It's a little bit of an update. Okay. It doesn't write it immediately back, it will do a buffer, right? All the writes that are happening at that period, it will be buffered when every block gets frustrated. Yeah. So, what is the probability of a failure, right? So, your concern is the durability. So, okay. So, the durability story of Aerospike is because you have synchronous replication. So, the probability of losing a data which Aerospike says or the system says it is committed is only, can only happen if two nodes goes down in the system exactly at the same time. So, if at least one of the nodes flushed its memory to the disk, you are not going to lose that data, right? Does that answer your question? That means changes are happening. They are being a multiple, same change. Yeah. And we do synchronous replication. So, unless it is written on the problem, the server will not respond back to the client that the write is done. But if you choose to relax this consistency guarantee, maybe for performance requirements, you have an option to relax this. But by default, it's synchronous replication. And that is one of the, every operation that is done. That's right. Why are you shooting a write? No, no. You have to, you have to set it at, namespace configuration. Namespace configuration. Sorry, the entire server configuration. Oh, entire. That's right. So, the SSD or? In-place update is just a technique used by different software. So, if you say like, traditional. Any device. Any device. It could be any device. Not to the other replicas. Not to the other replicas. Yeah. So, we don't do in-place updates. So, there are some software where they think that it is rotation device. And so, if you, you can directly go and update at the same location on the rotation device. We don't do that. Because if you do that, it will burn SSDs. So, again, like I was saying, that SSD is the strength of AirSpy. It's not because we know these techniques. We have a very exhaustive knowledge of what SSDs are good, what is not good. And like, we have a certification tool which is also open source. Which certifies the performance of SSDs, okay? And like on our website, if you go to our website, like, we have a recommended list of SSDs which qualify our ACT tool, like AirSpy certification tool. Again, like, these are like proven SSDs, but like, there are people like who say, okay, I, you may go with different SSDs for different cost reasons and all. So, again, like a good story around this is, when we were coming up with SSDs, like, like four years, five years back, like these SSDs were not reliable and like, they're just maturing at that stage. So, like AppNexus, which is one of the biggest customer has about like 100 nodes in deployment across four different clusters, like they have four SSDs on each machine. And like, they wanted to replace their entire set of SSDs on all the machines after a year, because at that time, like, SSDs don't have much lifetime. So, what they did, like, they brought some new Intel SSDs and said, okay, we want to use these SSDs. Said, okay, let us try our ACT tool on them. So, they didn't qualify. So, we said, like, they will not run. So, AppNexus canceled, like, SSDs, like, 400, order for 400 SSDs and like, at that time, SSDs is about, like, $500. So, that's like about $1 to $1 million out of getting canceled. So, Intel, like, why are you cancelling this order? So, there's, AppNexus said, because, they're expecting to satisfy this. Intel was arrogant, so, what is their respect? So, like, they came to us and, like, we said, like, see the behavior of SSDs. So, if you see the SSDs, right, like, you have to run them, like, for a long duration. So, when you just buy new SSDs, they'll have a lot of free space and the defragmentation at the controller level was very nicely. Overtime, it degraded. So, after a day, like, the performance going down, we suddenly fall down, okay? So, we showed this to Intel and, like, they realized that they had a bug in their firmware and they upgraded the firmware and, like, they gave the same SSD again with a fixed firmware. So, now they've passed and, like, we went ahead. So, every time Intel releases a new SSD, they give us free SSDs now. So, same with Samsung, OCC, all these guys now. Okay, already covered briefly. So, self-configuring clusters. There are two modes in which you can form clusters. One is a multicast mode, where to start a four-node cluster, you just put up four nodes. The multicast IP is specified. They discuss on this multicast IP, discover each other, form a cluster. You don't need to do any of this. Again, master-node assignments. So, typically, this is a blocking phase and, like, which lasts for about, like, 100 milliseconds. Okay, we cannot take reads and, sorry, the writes during this period because the master and pro-ownership is changing at this moment. So, the writes which are already initiated may fail and it can be returned and they'll succeed after 100 milliseconds. So, what happens during when you're adding in... What happens during the re-balancing phase? No, this is only when deciding the new master and replica modes. And the rebalancing can go for a long duration where the data gets migrated from the old master to the new master of a new replica. Okay, during this phase, we accept reads and writes. Because we know... So, read and write performance impacted by these rebalancing... It does impact because, first of all, there's a lot of network traffic going around and it's equivalent to doing the same amount of writes on those nodes where you are shipping the data to the new masters and new replicas. So, it does take an impact during that period. Again, there are tunables where you can say, like, at which speed you want to do the replication, how many threads you want to do and, like... You can tune all of this depending on how much hit you can take. Okay? So, the important point is, like, while during this whole rebalancing, we continue to take reads and writes. So... Sorry, that's a good question. Yeah. So, all the writes that are happening during this rebalancing will be general so that the migrating data and the new writes that are coming are not lost. So, once we migrate all the data, we make sure... Go through the journal and make sure all the writes are in the partition. So, is your custom management protocol homegrown or you're using Jdo... It is homegrown, but based on the standard faxers algorithm. It is homegrown. We wrote it ourselves. We don't use anything. Not JdoP. We don't even believe in other file systems. So, we wrote our own. I think, like, yeah. So, consistency during writes, I think, again, I already covered all that data gets synchronously written to the replica. You can relax this if you want all buffer writes queuing to the disk, all that standard stuff. So, we have second indices which are built synchronously again, not like some other softwares which will asynchronously build these second indices or use. So, again, what's the point there? Are they like... Second indices are pretty standard, right? So, you want to get all the records, all the information about people like living in the city, Bangalore. So, you can create a second index on your bin of city. So, we make sure that we update these as indices in a consistent way. So, we preserve the asset property as given by the rate, okay? So, how do you query the as indices? Again, because we distribute the primary keys evenly across all the nodes. So, for every second index query that is coming in, again, the second index query has to go to all the nodes. So, it is kind of a MapReduce, but you don't need to write any logic for that. So, it goes to every node to execute this query. All the nodes form the data to the client back, okay? So, all of this happens in parallel. Same with aggregations. Aggregation is giving this MapReduce framework to you. So, you can say like, I want... So, if you want to run more complex queries, right? So, let's say you want to aggregate the average age of people in Bangalore, but with some other clauses. So, you can write any complex logic that you want in the MapReduce framework. So, what happens is like, once you trigger this job, the map and radius phases happens at every node. So, once the radius phases happen at every node, the result is pumped to the client which triggered this aggregation job. And the final reduction happens at the client which triggered this, okay? So, the MapReduce job is written in Lua. So, this is like something in internals, like thread model, we follow as a model internally. So, we pass the workload from one set of threads to the other set of threads and make sure that one thread is not overloaded. So, again, we use the same digest function to distribute the load to one of the worker threads and get the job done. Again, like thread core pinning to make sure that the threads don't keep jumping from one core to other from one socket to the other socket and things like that. Handling the network I already covered. So, our auto-burner scripts takes care of assigning queues to the cores, all the end of handling, CPU handling, all cache-friendly in SL1, L2. All our indices are 64 bytes so that it is very end of cache-friendly, definitely end of cache-friendly. Again, like vaticarnals, all that stuff. That's the theory part. Any questions on the theory? What is this for serialization, data serialization? So, we use message-back for list and mass, for strings and native assets. No, no. Serialization. So, how do you handle the logs? Is it at the record level or at the ground level? Yeah, that's at the record level and if we talk in terms of assets, we give a committed-read-isolation level. I'm sorry. He's asking what is the locking semantics? Is it at record level? Yeah, it is at record level and if we talk in terms of assets, we give read-committed-isolation. So, you cannot read data which is changing. What if your transaction is spanning across records? Yeah, we don't have multi-record transactions at the moment. All the key value interactions are like single-recorded. So, the transaction starts internally and ends internally. We don't give the control to that or anything. This is so much hardware-dependent, SSD and everything. So, if we have to offer it as a virtualized mode, is there a virtualized mode? So, the whole rest of that today, we'll be working on virtual mode and you'll see that's not really the case because you can run in memory mode which doesn't have any specific hardware needs. But behind the scenes, the SSDs which are popping up. But, EBS? SSD is optional. So, you can run, set your namespaces into your name in memory mode. You can also say like, I want my data in memory but I want to store in a rotation device for persistence. And it's like continuous persistence, not like snapshot. Yeah, EBS latency, if you see like, they're not guaranteed and like, the direct-attached rotation devices will give you an average of five millisecond latencies but EBS will give you like anywhere from 10 to 40 millisecond latencies. So, can't really depend, but we are getting our story around EBS so that's some story in making that for a respect. No. This is not like your traditional SQL databases where you can join, but like if what people use, what they do is like, they have these tables sitting in memory at the client layer. So, what they can do is like, you can run your second index, build this table and then run one more second index and you do the join yourself. So, the database assets doesn't give that but there are people who are already doing these joins at client level. No, bins don't have a specific sort of. So, there's some distance, there are some instances of them, the large ones presently should be... Yeah, but it's a set of columns, right? You can equate it to a set of columns. See how, and like this is flex schema, like trends, typically they need it. So, you can say like, my record has only three bins but this record has many more bins like you can do any of that stuff. So, nothing is predefined, flex schema. Okay, the bin, I can have six bins of this. No, no. It's key and bins. And how do you look up bins? So, you can... They're not like five bins. Yeah, the client API you can say, I want all bins of this record or you can say, I want like bin one, bin two, bin three. So, it will go fetch one. Sorry? I can look up based on this. Just the bin names. No, no, bin names. Yeah, you can think of it that way. Let's not call it key because we'll confuse with other key. Yeah, just bin names. So, like if you use Java client, right? So, it will be automatically mapped to a Java map where you can simply say like, from this map, give me bin one. So... And how does that start to, again, the bin names are different. So, I'll give you the... Storage layer? Yeah, so for the time series thing, right? Which may typically go beyond the typical record block size. You should be using LEDs where you can keep appending like beyond the right block size. Okay. And like for that, there is an API for list data type. We call it L list, large list. Where you can say, I want the record from like this offset to this. So, from 15, next 20. The way you mentioned that there can be some, like, near hotspot things that you could ask about. Good that nobody came with that. So, yeah, I think like... I think that the controller itself for the SSD could become a hotspot because, as of now, most of the display, the SSDs are SATA-based. And off late, new builds of SSDs are PCIe. PCIe contract. So, the controller needs to be changed because the controller itself becomes a body like that. Yeah. Again, all these controllers now have like 32 queues at the controller level. So, controllers are also becoming smart. But if you want to overcome that bottleneck, you can always add multiple SSDs. Right? So, instead of buying like one terabyte SSD, buy like four of 200 GB SSDs. So, that's like work around. Currently, you don't support the new build of... For PCIe, you can attach. You can attach. And, like, you can say like, Linux file system, right? A raw file system will abstract out this for us. Okay. So, you don't need to directly get access as a PCIe device. We don't do that. We access it as a raw device. So, coming to the hard spot, like what could be a possible hard spot is like, if there is a skew in your application, right? So, the vice-a-sp benchmark that I was mentioning, so, it generates a Zipfian distribution. So, what that means is like, 50% of the read rate request come to one key. And, like, 25% will go to two keys. Next 12% will go to four keys. So, the load is like very skewed, which is not a good... So, at your application level, also you should try to solve that. So, I mean, for that kind of case, I don't think there can be any database which does like just two levels of three levels of replication can solve that problem. So, if you have like 10-way replica, all these replic... Request coming can be distributed on these 10 replicas that you have. But again, it's a trade-off between that and whether you can maintain 10 replicas, synchronously. So, probably that's the closest answer that comes to your hard spot. What's your pre-language? So, we have clients in almost all popular languages, like C, Java, C-sharp... So, pre-language, what do you... Sorry? What do you use for pre-language? Again, I think like... Yeah, so, if you're asking for recommendation from our side, so, any event-driven models are pretty good. So, they're very nice on the CQ. So, for RDB, we must be a SQL. You can go and look at the content of the collection. So, we don't have a SQL ourselves. So, all the API is... All the query mechanism is API-based. So, we have the query client-based in different languages. So, today, we'll go on the Node.js how to query. So, you can see the same pattern in all the languages that we support. So, at the application level, the operations are simply connect, get, put, delete, second and next query, all that stuff. So, all of that will come with APIs. And we are in the process of probably coming up with some SQL. We don't have our own SQL yet. But we have a SQL-like thing in SQL, which we'll show for like, operation in SQL. How do you want to meet people using Aerospy? Skip that slide because you may hate marketing. Okay, so, in different verticals, like in mobile vertical, by by advertising. So, by by advertising is one of the strongest domain for Aerospy because these are the guys who started having scale problems now. Every other industry is making up to that. So, mobile and by by advertising is having the more scale problem because they want to track everybody in the world. Everybody's visiting any website in the world. So, like, you can imagine the scale at which they operate, right? So, then marketing, search, retail, space, like for product listing. And if you want to talk about Indian customers, like we are in movie, marketing, mapping, filming, and business. Okay, indirectly, a lot of big companies use that because Oracle bought responses and one more. Look at it. Look at it. So, Oracle is a lot of customers. If you want me to say that. And Twitter is our customer because they bought mobile. Mobile. So, if you see this slide, right, one interesting thing is like Google, Google app is probably the top spot in the web traffic and advertising world. So, if you leave out Google, like nine out of the top 15 companies who have this advertising are powered by Aerospy. How is this one slide, but I couldn't read it. Did you read a blog from Google? Sorry? There was one slide. Oh, you want to see the numbers? But I couldn't, yeah, I couldn't see from here. Yeah, while I'm moving the slides, you might have seen a recent blog from Google Compute Engine where Cassandra achieved one million transactions per second using 330 nodes. We can do it with probably three nodes. Yeah, so this is the graph. Like, so this is Aerospy, which was like probably two years back. This is 2.0, this is 3.0, now that's a default. And like this is Cassandra, Mongo, Couch. So again, the important point is not just about throughput, it's the latency. So if you see the latency is like all the way up to 200,000 transactions per second, it's like under one millisecond. Like where you see as the throughput goes up, their latency keeps increasing. Yes. This is like different, but that's right. This is a great latency. This test is performed by you guys? This is YCSP benchmark. Oh. Done by thumbtack? Yeah, sorry, thumbtack. Cassandra is really bad. So, any other questions? So you said nine out of three people other than Google? Google? Or to the other system? No, no, never. Yeah. Sorry, 3D is performed at? Yeah, well, to answer your question, others actually started before we existed. For example, turn, build their own. Rocket fuel has something which they're patched together in a bunch of times. So they all built their own systems. Just like Google there. Which we completely understand all the solutions. Yeah, and some of them have been switching to us most recently. Two of the top one is switching over. We hope to get all of them. Is this like the difference between Cassandra and like the one that you said is like huge? Like you said, they use 3D nodes, so what do you mean 3D nodes? That is very difficult to do. How many have so much more efficiency over them? Because that is their respect. No, they have to do it. So that's why I don't believe these numbers. You'll see it today. We'll show it live. I mean, I'm sure they are not so dumb that. No, no, no, no, no. Okay, you said something like this trigger something. I'm sorry, I have to, is it okay? They're not dumb at all. Definitely very clear. It's very possible that they are smarter than the people in AeroSpike. That's not why we are better. We are better because our philosophy, what we have done is we've taken the speed of hardware as it goes down. We have taken our system, we've written the database. You know, I'm a database person who has worked on IBM and so on 25 years ago, the PhD in databases. And I know how those systems are written. They're written for optimizing a rotational disk. The example I have is, rotational disk is like visiting in Bangalore. Rotational disk is in California. Main memory, well, with Bangalore traffic, it's like 10 meters away or something. So it's close, okay? SSDs are maybe like half a mile away, one kilometer, two kilometers away. When SSDs came in, everybody slaps an SSD, and Sunil mentioned that. You slap an SSD in an oracle, it'll go five times faster. We will go 100 times faster. Not because we weren't compared to the oracle, but we took SSDs and what Sunil wanted out, we wrote the database storage system. That's number one. That's the SSD base. In memory, there is multi-core. You know, I don't know how many of you are database experts, but you must have seen, some of you at least must have seen Postgres, you know, the inventor of Postgres, right? Like Mike Stoemrikers, paper in like 2005, 2006. He said like, it's time to read out the database because all the old stuff, it just doesn't work anymore because hardware has changed so much. There's more main memory, there's more multi-core. So that's the other aspect of things that Aerospect did. So we wrote the entire database from scratch in C as fast as it could go against the hardware that existed today. They can't even compete. How can you? You use Java. Come on, man, I mean, it just goes to lunch. Every time you do garbage, we're talking about one millisecond. One millisecond is not even in the, you know, Java doesn't do anything one millisecond. It just sits there. And that's nothing to do with it. You use it for applications. I'm a Java programmer myself. Don't use it for systems, which are real-time. And that's the philosophical thing. And anybody can do what we did. It would just take you four or five years since more people like Sunil and others and Raj, the other guys sitting there would get about one million transactions per second for us on Amazon, if you look around sometimes. But the point is, you just work on it, okay? We worked on the hard problems. And these are the results to show. And that's why we're making it open source. And now you guys are here. Are there any comparisons with Hanna? Hanna is a SQL-based system. So probably it's not a good comparison for to do signal key lookups with a key value store versus signal key lookups in Hanna. They're like it's not on joins and all that stuff. But they're doing the same, Dr. Srinivas. Try doing updates in Hanna and do a query. And come back and tell me what happens. That's the difference. We are a read-write system. You can do queries and error spike. You can do a backup on error spike. It will run with high throughput reads and writes. Hanna is used for typically analyzing, what are they called? You know, when you go to a store, you have a transaction, it's on your cash register. Right? Internet is an order of magnitude. Two orders of magnitude bigger. And cash register transactions go on to work. And that's what error spike runs in. Hanna runs from that. You stick everything in memory. It is still enterprise-based. And it's, I think, that's the whole world-medicine memory. And the truth is, they are solving a problem at a certain scale where there are not that many updates. So it works. And not having updates, problem is actually an order of magnitude easier. You can think following it. If you don't have updates, it's actually an easy problem. If you have updates, it's a lot of problems. That's what other things you think about. Yeah, if my time may add, right, I was working with Saib. Come, come here. So that is the end. Okay, so I was working with Saib. It's these days, so it's acquired by SAP, right? Hanna, I looked at Hanna. Hanna still have a problem that, there are two problems. First problem is that, entirely in memory, okay? So what it means is when you're running real Hanna system in your application now, your cost of power is more than your license cost of Hanna itself. The amount of RAM that these days need is humongous. Secondly, they still cannot take updates at the pace they would want to. So what they have is they have two stores. One is the store which talks about only the update part and the other part which everybody talks about taking the live reads and writes. And the system they work on is mostly ERP and CRM system which is mostly manual driven. They're not still not got to a point where they can do internet of things where they're actually tracking all your trucks and all their equipments and all at that level. And they are rewriting the stuff for SSDs anyway, so. I mean, they are still growing, they are learning about the stuff. To compare, we are a little bit ahead in terms of technology, in terms of SSDs. And of course they are smart people, so they do really good stuff on memory, but they are solving a pretty difficult problem in terms of lot of, no SQL stuff doesn't do everything which relational it does. They have a hard problem to solve and they're solving it. And they solve it pretty well, but they have their own set of problems. You cannot just run it. People like us, startups cannot run it at all. You will cost, billion dollars in a bill of electricity is not something you can afford, right? So, just some addition of this one. You talked about in-memory and systems and memory hazard. So, just curious if you can, do you recommend using maybe Aerospy as a cache as well? Sorry, sorry. Sorry. So, do you recommend using maybe Aerospy as a cache? There are customers who use it as a cache. Yes. So, they do use a cache again, right? I didn't concentrate on much on one slide. So, if your data size is too big, right? So, instead of having, let's say, 20 machines with like 100 grams each, you can put that whole thing in like three node cluster with an SSD. So, it finally boils down to faster case study, like which I skipped is like, where we replaced 192 in-memory machines with 18 Aerospy machines with SSD. So, we could store the same amount of data in like much more smaller memory, sorry, smaller number of machines. So, because like machines, right, you have to cancel a lot of cost, like power cost, maintenance cost, all these things. So, like we could do the same thing, sorry, not the same thing with 10X, but throughput with 10X less at fast, that end-to-end fast. Let's see your, what I understood, you also explored, basically, you have taken two advantages of the fact that one, the fast of SSD is falling really fast, and the access latency is much closer to main memory. That's exactly right. So, probably, I would probably use it like a cache in between. Exactly. But then, because if the cost is dropping, the cache becomes a little bit bigger. Right, that's right, that's exactly right. So, we'll take a... Similarly, sir, with the biggest scale of the store, I'm not aware of the biggest scale. Big sequel. Anybody know about big sequel? I don't know. You know about slow sequel. I've written it myself. Can you use parts of your system, like this, maybe just the file system here itself? If you want to store in a file system, you can do that as well. I don't know about your file system because it's off the map of files. I mean, because I'm guessing you also taken into consideration the variant of this flash as different write problems, characteristics of the rotation disk. Okay. So... I didn't get your question completely. He's asking if he can just take your file system part. We use it for entirely different things. That should be possible. Yeah, so let me give a story behind that, sir. There's one company called Fusion IO, where we actually ran the entire error spike as a disk controller in the drives. Yeah, that's exactly what I'm saying. So, because... For example, I think Linux has a button interface. Razer, you're not talking about Razer. No, no, no, Razer is predominantly for rotational disks, but I don't remember the name. There are certain file systems that Linux supports open source, which are specially designed for flash. So I just wanted to know if I can use that, or how does it come out? Do you have any problems with that? Not just the storage level. We don't have that. But like, you know, saying that we are open source now, you can strip it out, but if you make any problem... No, but... Is it like, is it a tightly coupled, that it might be hard? It's not too tightly coupled because we have to support multiple storage models. Like, we have a clean interface saying that I want to store this record. So in the code, right? But if you see, the transaction layer says, I want to write this record to disk. So internally, it's like overloaded function, which will either go to memory or SSD or KBS store. So probably you can strip out that layer and put your storage logic above that, or commit logic, whatever it is. So I think it should be possible. So because our storage layer is separate, because we have to separate, support different types of storage, we wrote that with a clean interface. Just let me know if I'm doing the offline. What's the file system? Do you have any optimization? For example, on the file system, the file system will rise up. Pure file system you are asking? Yes. The standard file systems, not raw file system you are asking, right? Yeah. Pure file system, I'm saying, on flash. Okay. Does it have any optimization? For example, rise up is, I know, there's a lot of optimization for small files. Yeah. Okay. Yeah. So a lot of small things. Small things. Small things. Do you have any such optimization? We kind of do that because I was answering, I think his question was, if my record is like just 100 bytes, my write block says it's like one MB. What do I do? So I show in all the records into one block and write that whole block. And when I update, I just read that, say that mark that this is deleted and write to a different location. So we have that optimization, but that optimizations are tuned specific to key value store needs. Because you have the same kind of chunk and long-size optimizations that you have to use for even distributed files, right? Yeah. Now, we can face pretty much similar problem. How do you choose? Because if you choose a really small chunk size, then basically you store a lot of metadata for like a large part. So I just wanted to know whether it's like enable or then optimize? Yeah. So I think like if you're talking about Hadoop chunks and all like those chunks you have to remember where are the replicas at the chunk level. We don't have those needs. Yeah, you don't have those needs, but I'm just as an example for chunks. You have to think about those things because if the chunks is really large and you're not doing any packing in that, then you're actually losing a lot of ground space. But if you make chunks as really small, then again, we have the other end of the spectrum where you have a lot of metadata about each link. That is true. So again, like metadata is just a bitmap of all these blocks that we haven't say how much amount of it is free, that's it. So we don't have those needs. So it is kind of tailored to our needs, but I don't see a reason why you cannot exchange it. Do you have any references from financial services industry? Oh, we have all pumplets. I don't want to distribute in a deep content. But okay, so tomorrow if you are coming to the conference, yes, we have all the collateral to distribute, yes. Do you also cater to hybrid workloads as in both transactional line warehousing? Like I was saying, our core strength one year back was like pure OLTP. Now we are moving into analytics space where we have our own MapReduce framework which you can write your MapReduce jobs. So we are slowly stepping into analytics space. Yes. So we are already there. And have you done any comparisons with some of the NMVD records like MPV spam or Jamfire? But like we compared with all the traditional and secret areas. To add about that, what other company are you using it to buy? I'm little known, I don't know that what other company is you just wanted to add to that. So Rocket Fuel is one of the companies that it is using edge based and they have it in something which is very much optimized for that, right? So the kind of problem that you face is that you get a lot of delays, a lot of random leads, right? That is where you get a problem. When you do sequential insights, you really don't get a problem. The production price is like that. Everything has to be very, right? And at the same time, but we need a persistent core. There's a lot of optimization that you actually do. Some things could be actually done on the application layer, something like that. So we are somewhere around, you do a partitioning scheme, then you do a pre-partitioning and then you make sure that your partition doesn't have to be split, like you did hashing and other things like that. So these are some of the areas which companies have overrout figured out when they actually started using them when companies like EOS write a model and they put something on top of that. And certainly it works that it doesn't work, but it works. But you have to go over it two years to understand how that normal system actually works, right? So it can affect your model in MongoDB. These can actually affect, but you would take around two years, go through a lot of engineering work. Customize for your needs. Customize it, right? And here you go with it. I don't know that it does. No translation, Sabin. I think. Thank you. So we'll take a quick 10 minute break and then we'll get started, like hands on with the respect. What's the final video? I'm going to look on the scheme. If you're okay with that. We have little, we don't think you can select. He's enough for all the touches. Just a few. I don't care. I don't care. I don't care. I don't care. I don't care. I don't care. I don't care. Next time you'll have to say, he was not in the previous team. I don't care. He was beautiful. Nothing was the best for us. I think he must have been. I don't think so. He asked for suggestions. I said nothing was the best, right? So here, what you have to say. Okay. Yeah. We'll log in to the incident center. We'll log in to the, so each one of you will get one machine. We'll log in to that incident center. Yeah, I'll share this collaborative. I think everybody should open this collaborative. Let everybody come. Okay. Where is that? There. Collaborative.com. GQYXT. GQXYT. Everybody who's already here, can you open this? You are in. Collaborative.com. GQXYT. So whatever I'm typing, I'll put it in this page and everybody can just copy base the commands. GQXYT. GQXYT. Collaborative.com. GQXYT. No, just view this document. Collaborative. We open the web page. Yeah. What to do after that? I'll let you handle it. So, See, everybody's seeing this page, right? The commands are here. Yeah. Okay. This programming language, I think I'm getting a choice to move it out. But anyway, I think like, once you start seeing that page, you don't need to see it. This is only to share my comments with you. Please don't. Sorry? No, this is a general one. You can use the chat box if you want to make fun. Okay. I hope everybody is, sorry? You're doing the first quickly one? So, when you enter this, this small slip was distributed with an IP. So, okay. So this will be your recent instance Sorry? Yeah, I think like we have more, don't worry. We have a hundred of them. Can we access this? Yes. You'll be logged in later. Yeah. Why are you... First we need to download this PM file to log in to Amazon. So, I'll just show you that. So, the PM URL is a secret key. So, pick this URL and say like, curl history. If I should. Let me copy paste this one. Everybody is hacking this? Yeah, so there's a curl command. Just copy paste the curl command. You'll get the PM file. Where do you copy paste it? In your shell. So, if you run the command, it will copy into your shell file. Some people are using Windows. They may not have parallel installed or something like that. Yeah, yeah. You don't have any virtual machine? Yeah. Is somebody purely on Windows? We have virtual machines. Yeah. So, we'll just search there. That's all we need to do. But you can use the PM file. You can supply the PM file. So, if somebody used only Windows and now how to log into EC2 using PM file should help. EC2 user. So, once you... So, once you download the PM file, you just have to do this command. Use your IP here. Is it fine now? Yeah. So, once you download the private key, so you just do a search minus. No, really download it. Can you just... Is it start or step? Yeah. This is the one. Yeah, just a second. So, you need to do this. See, it's smart to say it should be read only. So, once you had the EC2 user and your IP, the IP that was given on your slip. It was made use of the code command. Sure. Can you tell me? It's not able to download the command. Yeah. Now, you had to change the mode. So, you had to do CH mode on the PM file. Only 4.0 0.0 will work. Can you help people if somebody started? Can anybody start? Please call our folks, can you? Yeah, if you have shell... But, shall we mention it? The public key should be obtained first, is it? Yeah, you have to download your CH mode and then log in using this. It says permission denied within private. It says complete. Yeah, actually, I don't... Did you have CH mode? Yeah, I've done CH mode. So, this is the error that I got if I didn't change a protected key. No, I didn't. That's not the error. I have to use the IP address. Yes. Yeah, you have to use the IP that is given in the... Yes. I think some people are using the second one, so don't use that. Use this one first. This will get your private key. Yeah. Is this clear now? Step one, step two, step three. Just one of those comments. Yes. So, can you please pack the big key? This is the... I mean, just in case, I have to type. Need to read the address. Yeah. Yeah. Yeah, it is getting read out. Yeah, double-edible. Yeah, double-edible. Yeah, double-edible. Yeah, double-edible. Yeah, double-edible. Yeah, double-edible. Yeah, double-edible. Yeah. So, how many places are we getting here? In the excitement. How about you? Excitement? Yeah, I think that's the same question, so I don't see anything like that. It's private key. It's private key. It's private key. Okay, I think we'll proceed. People who are stuck with Windows and putty, sorry, I don't know how to use it up. We have file directly in this magazine. See you in a minute. Sorry. So, once you do this, like we are going, so, Errespect has a, Errespect monitoring console, we call it AMC. So, you can see different statistics of the server that is running on that machine on that type cluster. So, we'll start the Errespect server and see, like, what's the values in AMC and stuff like that. So, we'll also generate load and see, like, what GPS you are able to get on those boxes. Okay, so, remember that, like, right now, the boxes are, like, wrong, which is not tuned. So, I'm sure it's going to talk about how to tune them to get even better performance on the same boxes. Okay. So, once you're on that, so there is a... No, no, it's taking time. Oh, somebody posted a, you are in use-portion with PM file. So, please do that. Okay. So, you have to know what is your public IP, what is the public IP of this box that you are locked in with. So, you have to run the laboratory command to get the public IP of the easy test that you have. So, everybody will have their own public IP. You will see this. So, my public IP is, like, 4 to 10. Oh. Okay. Actually, the slip that you have is your public IP. I was down. Okay. So, the... Okay. So, the AMI instances that you have already have AresPyke installed, the latest AresPyke installed in this AMI. And it also has the AMC software. It also has a load generation software we'll use to generate load. Okay. So, before starting, let's see how the configuration looks like. Can you change the profile of the item? Light background should be hard to be... You can say load pieces to... Light background to white and foreground to... Light background, yeah. Did it change? Please start now. Oops. Is this better? Yeah. Yeah. Yeah. So, do I have to get to the post office? Yeah. Password is... Password is... Password is... So, we can say the VM... So, we can say the VM... Password is... Escape. Password. Password is... Is it better now? Yes. Okay. So, you just have to do... You want to see configuration file... AresPyke... AresPyke.com So, just... You want to see the config. So, the... The service section is something which is applicable across the node, across the namespaces. So, we will have a service section where it defines like user, group which you've done, PID file, number of threads, transaction tools, etc. And then like where you want the arrow log file to go, the logging section tells that. So, the network section... We have to touch on this. So, network section tells like how to form the cluster. So, EC2 doesn't allow multicast in its network. So, what we have to do is fall back to the mesh model where you can set up the cluster node explicitly. It's only if you are doing everything hands-on. So, the two... One workaround we have is... We have something called cloud formation script. I'll quickly demo that later. Where you just don't need to do any of these steps. You just use cloud formation script. Say that this is the number of size of the cluster that I want. It will spin up all the instances, edit the config file accordingly to create the mesh, and it will form the cluster. Okay. So, this is how the network is configured. So, if you have multicast, right? So, the last few active lines will not be there. It will be just three lines, mode multicast, address the multicast address that you want to use, and the port. Okay. Quick check. Are there still people who haven't logged into EC2? Yes. So, I have just added two more lines to the cloud edit. That's a username and password. You can log into that machine, and from that machine you can just... So, then this is the namespace configuration. Right now, we didn't configure disks. So, there's one namespace configured, test namespace, which says replication factor is two. Of course, not a typical for single node. So, the memory size is like 54 gig configured. Default detail is expiry for all the records, which can be all written from the clients. So, the default is 30 days configured here, which simply says that. And also it says storage in this memory. That's all the configuration you need to get started. And this comes by default when you install the RS5. You don't even need to write this file. Okay. So, you just do sudo itc init.d RS5 start. Okay. So, just sudo itc init.d RS5 start. It should say okay. So, how do you check from terminal that your server is really up and running? So, you can do asinfo minus the build. It will tell you the current build version it is running. Is it taking time to spin up? The distance... So, it takes some time to start. Okay. Simply do asinfo minus the build. It should respond. That means your server is up. Okay. Just in a quick peek, what all stats can be seen. So, some people ask like, how do I know the health of the cluster is simply to raise minus these stats. It dumps like whole bunch of stats. You don't need to go through all of them, but just to know. Okay. So, now you have the public IP. So, what you have to do is... Sorry. Before that. We started the RS5 instance. Right. It is in the RS5 start. Now we should start the ANC instance. RS5 monitoring themselves. Okay. This is started. So, now you can go to your browser, give your IP, and go to port number 8081. Okay. 8081 is where your AMC is running. It will ask to give a seed IP of the cluster. Right now. Next time. Sorry. The last time. Just say, instead of... You need to RS5 start. AMC start. You need to RS5 start. 8081. Go to... In your browser, open the public IP. Call an 8081. It will ask a seed IP of the cluster. So, you can simply say, 127.0.0.1. You can say correct. Right. So, like all the stats that we showed, like is like presented nicely. It gives the different statistics like what is the cluster disk size configured. So, R node currently don't have disk size. It's not 0 bytes. RAM, 54 bytes. 54 GB is configured but 0 used. And then the number of nodes in the cluster totally expresses the bit number. Any alerts? Redirect reports. For the last 30 minutes. And we have the number of nodes. Currently it's single node. Then we have the name space configuration. And then XDR. Quote number 8081 is not there. What is XDR? XDR is a cross-status in the replication I talked about. You can say 127.0.0. Simply log. 80. Sorry. There is a URL in the collaboration. Do that. Okay. But 27.0.0.1. It's a local project. This is not the public aspect. So, when the browser asks for you, you simply give the private IP and click on it. You don't need to. That's just for your public IP. So, you can skip and just use the one in your browser. HTTP. Sorry. Okay. So, we will insert like one. Just a test, right? We will insert one record. So, we have a very basic Q2C called CLI. For basic things. You can say CLI minus OZ minus K minus K1 minus Bmin1 minus Bmin1 minus Bmin1 minus Bmin1 So, it inserted one record into the local node. This is just a test. So, you should see like one object in CLI. So, we will paste it in Colibri. This is just insert one record with key, mic, bin1, Bmin1. We will do more complex stuff. Okay. So, if you go back to your public, your ANC it will show that like there is one object inserted into the cluster. Okay. So, let's do more complex things. So, this is where we have a tool where you can load data which can generate some fake load like a different group or something like that. Okay. Different types of data. You can simulate your application profile. So, this insert one million records in this command. So, this insert one million records in this command. So, you can see here, right? I just, I will copy paste. You can see the TPS here. So, the TPS you are seeing is like about like 150,000 transistors. One million records are inserted. We didn't even time it. So, let's just time it again. Seven seconds each of them. Yes. So, this is insertion of one million records. Pure rights. It took about eight seconds. I will copy paste this. One million. I am sorry. Okay. So, this is using... Sir, what is the insert type in 8XL? So, this is C3 8XL. C3. C3 8XL. 8XL. 8XL. Okay. So, now we inserted all this one million records. We can, we can now try to read all of them. So, are you here entirely? So, W, initially I said I. That means just insert. So, now read update with a read, with a write ratio, read ratio 90. What is the size of each record? 50 characters. 50. 50 characters. This is a string of 50 bytes. 50 bytes. Yeah. 50 MP. This pure rights. So, we will see that. Okay. Okay. So, this command will keep continuously reading the data. I will just copy it. Somebody is changing. Don't change in the color. Okay. So, this will generate a continuous load of reads and writes. And it shows like what is the TPS that you are getting. So, remember that this is not the saturation of the server. It is the saturation of the client. Okay. So, don't, I said like 900,000 transactions per second, but this is generated by single client. If I have more clients, more instance of the clients, or more machines generating, the client can observe all of that. And like when we found clusters of four nodes each. So, one guy can run one person. We can run the client from all the four instances and we will see that. So, this is a silicon egg. I have not, this is the 1,065 repeated. Sorry? It seems to be 1,065 repeated. Did it stop at the load? Yeah, the job might have extra improvements. No, no, he said he got only 65 objects total. How many are there? And 60. Oh, okay. So, multiple. We ran with 64 threads parallelism. So, I think like one thread parallelism. So, you can see the ANC here. So, the reads and writes, the percentage of reads you are seeing is about like 150,000 transactions, pure reads. And like about 17k writes. So, you can see it in your personal ANC. Why do you do something? What ANC can do? Okay. So, you can see the different stats here, right? So, you can see the replicated objects like little more than a million. And then, total number of client connections that are currently active. So, the client, the single, because we are doing with 64 parallelism. So, we have about 64 threads parallely connecting to that node. And like one more, one or couple of connections which are discovering the cluster, any stage in the cluster and things like that. So, you may see this going up and down. Okay. So, you can see the last objects. What is the size of RAM occupied? So, you can think about 128 MP is occupied. Okay. Anybody has trouble reaching here? You can run your own load and see it for yourself. How much you can generate? Okay. Okay. Okay. Okay. Okay. Okay. So, in this workload, right? I just ran with strings but the tools is capable of generating many different types of data. Okay. You can see that it explains what does each parameter mean. So, it will give examples of, not examples, explanation of what all it means. Okay. You can take it back up if that's what you want to do. You want to do that. So, that's kind of feeding. So, if you do minus E, four, three, four. It does a lot of stuff. These are internal details which you generally need not. So, it says like what are all these partitions? What is the objects in each partition? So, somebody asked like how do you know that it's not this imbalance? This is your proof. So, if you see like everything is around like between 15K to 18K. That's because again like the number of data that we inserted is very less. So, if you have more data, the skew will become much, much lesser. Right? So, this is all the 4K partitions information. Each partitions information. How much data went into each partition? Okay. So, let's try to form cluster. So, pick any 4 people like 4 together. And I think partner with somebody who is running here. We are running there. Okay. So, let's form a line. So, this line cluster. Up to you. I think like you are just far apart. So, if you want to see that, you can see. I think like 4 people is good. Or like 4 or 3 like whoever is running who could log in there. What is the last one? Let it run. Let it run. Once we change the configuration we have to. Sorry? Yeah, you can stop the load. Yeah, you can stop the load. So, the way mesh works, right? Because multicast is like a gossip protocol. Everybody can talk to anybody. Multicast is slightly different. So, you have to say like one node in my cluster is where I will talk to and tell about myself. So, once every node goes and tells this node that this is my IP. That node again will share that information with other nodes. So, when I go and talk to this, let's say, mesh principle. I will give my IP to him. So, he will say, okay, I already know these 3 guys that I am aware of. So, it will give me the full list. So, I will go and directly talk to these nodes. Okay. So, what we have to do here in mesh configuration is to say that whom should I talk to to get the full list of the clusters. Okay. So, in your group of 3, 4, 5, whatever it is, pick one guy as your leader and use his IP. Okay. Private IP. Private IP. So, how do you, like, you can simply do this point face? Yeah. Oh, sorry. Yeah. Thank you. So, the IP will be, like, 1732.31.40.163, in my case. Why are you telling me? Okay. So, this will be your private IP. Don't use his IP. Yeah. So, don't use my IP. Okay. It will be, like, 100 node clusters. Can you tell me your IP? So, just a second. So, in the mesh address, the mesh port, right, it is written as, like, 10.10.10.10.. Just to fake something. Yeah. Okay. Why are you even using fake apps? We can do that when... But, you want to use Star Dion? Yeah. Just to save out object by using. So, you are having fun right, there is a match head plus written you know like it says 10 dot 10 dot 10 dot 10 dot 10 dot 10 dot 10 dot 10 dot 10 dot 10 dot 10 dot 10 dot So, I am picking him as the leader of my cluster, so which are 4 or 5, 3, 4, 5 from cluster you are forming. You have to pick that leader's IP address, put it in my cluster. Sorry, sorry, come run a bit, please. There is a question? Yeah. So like, who are the representative of your whole cluster? You should pick his private IP and just enter agonized mesh address in the cockpit file. Leader will not do anything. Even if he does, even if he doesn't, it's still fine. So even if he doesn't, it's fine. So you'll top the mix up. So once you, everybody modify it in the path files. Sorry? Yeah. The path. Sorry. The config file will be etc eraspyc eraspyc dot fun. Say it again. This is master's address. This is master's address, leader's master's address. Yeah. In case of master, it doesn't matter because you have to contact yourself. But out there, let's just see how it's going to come up. The main purpose is to find out how it's going to come up. The same config file. That's the main class. Yeah. Okay, so let's, so once the, one, sir. You can change it. Okay, so let's give a simple rule. Everybody change it to the leader's IP address. Okay? Yes. So once you do that, once you edit the config file. It doesn't matter. It doesn't matter. So once you edit the config file, you just have to restart your node. Eraspy, eraspy, eraspy, eraspy, restart. Okay? So I'm restarting my... You don't need to restart it. I don't need to restart it. Okay? So that's what I'm going to do. I restarted my node. Come on. Eraspy, can you restart? No. Eraspy, can you restart? No. Eraspy, can you restart? No. Come on. Okay? So here you can see like my nodes are increasing already in my NCA to three nodes. So I don't need to be out-of-sharting. Out-of-rebalancing also once you have clusters. So you can see the rebalancing happening here. So my three nodes... Sorry. Can I get your attention? You have a small thread where you can see this information. Okay? So I just added his IP as my leader. So you can see like I found a three node cluster. And you can see like the migrates going and incoming and outgoing. So this signifies that the data is getting rebalanced as we are speaking. Like you can see the object count that will keep changing. It will be refreshed every five seconds. And you see the replicated objects. It will again balance out. It doesn't matter. Eraspy, can you restart? Eraspy, can you restart? Eraspy, can you restart? Eraspy, can you restart? Eraspy, can you restart? Eraspy, can you restart? Eraspy, can you restart? Eraspy, can you restart? Eraspy, can you restart? Okay. So let's do an interesting experiment here. So while the rebalancing is happening, let's do reads and checks. Okay? So I... Why is the RAM is changing a lot? Because the whole rebalancing is happening, right? So it's available in RAM. So it's RAM or not? It is RAM. So when you are a single child, you see that only one child can read that. So now you have to publish all that data. So like in this country of... Actually this is the third time because you are reading all that. Where the data is getting... It should rebalance. Like we are seeing the rebalancing happening here. Yeah. I took down the node. Again by mistake I restarted my node. So that means it is down and now it's up. So it will show that if the node is down. Now it's green, it is up. Okay. Anyone who is still in a single node cluster, who still hasn't formed multi-node cluster? Okay. So... You see this process is very hard, right? So everybody started one instance and then you change this configuration, form, cluster, all this stuff. So I'm going to quickly take you through cloud formation that we have. So Amazon has this nice service for this cloud formation. Yeah. No, the service will be running. You just reconnect back to that. That doesn't work. Okay. Unless I'm a computer worker. I think the... ... ... ... ... ... ... ... ... So Amazon has this cloud formation where you can define your end-of-cluster up front and you don't need to define it so RSPy has a script where you can say like I want to do this. Sorry. Okay. Can you show me your screen? Okay. Can you show me your screen? You see the screen, showing me your screen. Okay. We have a problem. Same IP address. Maybe you long term the same machine. Oh, the Wi-Fi IP you are saying? No, this is IP. Listen to IP address. Listen to IP address. Okay. Okay. So how to do this cloud formation script? Like you just have to say like So you have to click a template file. You just have to download it out. Okay. This is the RSPy cloud formation script. Just click cloud formation script. So you can be the same. Sorry. Okay. Let's finish this one. So please. Okay people. Turn the microphone. It's already on. People are dominating me. Okay. So the cloud formation script, the second stage we have So basically it will ask just three questions. What instance type do you want to pick? And what is the key pair which is personal to you? I should say like number of instances I want in my cluster. I can say four and simply say next. So it will ask do you want to add tags and all. So it will just ask me to review. Okay. Okay. That's it. So you don't need to do any of these steps that we have done now. And like in bare metal machines if you have your own local land running it will automatically form a cluster using multi-cast IP. You don't even need to edit. It has special configurations for different nodes. So if you are deploying on Amazon you should be using the cloud formation script to make it easy. If you have bare metal machines you shouldn't be using hash. Okay. So this will go on. So I will not keep track of it. So we will quickly go into coding. How to write an application using ASPy. Sorry Node.js. So I am picking Node.js as an example. So we have uploaded our Node.js client in NPM repository. So if you have to start using Node.js all you have to do is run one command. So you just have to say NPM install ASPy. Okay. What is alert from.org? Any time a new node joins anything goes down, anything bad things which are happening in the cluster it will show alerts. There are some yellow events which need to have immediate attention. I think because of all this activity, nodes going in and out, you should be having that problem I guess. It should restore again like Yeah. You can take this. Network is stuck on that. This is ASPy. I think the network is becoming bad even I am not able to Yeah because they are. Okay so people who are not using their AMC maybe they should shut it down because what will happen is that all of individuals AMC they will keep pulling the AC too. So it's basically just your normal internet traffic. So even if you are not weaving it it will continuously keep getting traffic. So let's say in a team just keep on AMC running and others and just close that tab. I think like we are done with the AMC part so maybe everybody can close their AMC browser. It will generate a lot of traffic but like with 100 people it generates a lot of traffic. Okay. So please close this AMC console that is open. I think it is growing up I think. Let everybody close the browsers. I think hopefully the internet will become better. So I am also closing my browser. So this is a cloud formation that is completed. So I can simply say delete stack just dismantle my browser. Okay. Okay. Okay. I think it is acting up. We already downloaded Node on this box. Your ACP box. So there is a Node modules directly. So you already see a respect there. So what I can start doing is I can write my Node.js application. Okay. Okay. So she is the one who wrote the entire Node.js client. So I am just acting that I know it. Okay. So all I need to do is before let me do this. Okay. So maybe we will do it simple because it is taking time to walk through. So I have the entire very basic thing already written here. I will walk through the code. Yeah. So I will just walk you through this thing. So all you have to do is you initially say like require aero spike because you already have it. Okay. So you have to tell like what are the parameters the seed IP. So you can see that in the config. So config has host field which has an array of IP addresses. You can supply an array of IP addresses. One is enough. But that mission should be up and running. So what people do is like they may keep like four or five IPs. So even if one instance is not running it will connect to the other IPs and discover the current cluster. So you see the host and policies like what should be the timeout when it is attempting. So if you are doing a get or a put so how much time the client should wait. So if you say zero it will wait indefinitely. But here I am just giving a timeout of zero. Okay. So once you do this you have to say like aero spike dot client dot connect and say like what you want to do on connect. Okay. So here you have a function what is the behavior once this job is done. So connect job is done. So I am not doing anything here. Whatever is coming out of error and client I am just tapping to my console. Okay. So now we just talk what is the data model. So it is a key with a set of bins. So you have to define your key. Yes. This is a JSON structure or JavaScript object. So it needs to have three fields namespace set and key. So what is the namespace test set is my set and like the key I want to put is my key. Okay. So this is my key and like my record will be two bins in this case two bins in bin if you simply say one it will automatically detect that it is an integer and it will insert as an integer. So if you put this in quotes it will become a string. If you given as an array it will become a list if you put it as a map it will become a map. Okay. It will be automatically detected. So you don't need to say this is string, this is integer and things like that. And like every record will also have some metadata associated with it. This is optional but you can set a TTL for the record. So if I want to expire in like a day I can say like for an app so I can say 3600. And generation again these are optional things. When I write I want to write with generation 1. Okay. You don't need to say that like I was saying these are optional parameters. If you don't say TTL here it will pick the namespace configuration. We saw right default TTL that it is it will pick that. If I want to override with everybody I can do that. And this is at the key level. Sorry. Key level. Record level. Record level. So all I do is like AS object.put. I am just facepieing my key here. My record. The metadata. And like what is the action that should be taken as a return for that operation. When the operation returns. So here I am not again doing anything. So I am just dumping everything to the console. Okay. So I hard coded IPS here. You should like know what called box. So I just need to do simple put. Sorry. You are doing it from your snack. Yeah. So this should be running. So there are like different callbacks right. Connect also you have a callback. What is action that should be taken. And after put also you have a callback. Yes I understand that. So would you be able to this is my favorite callback. The AS object that might be called before the callback. Yeah. So you have. That's right. So I will put a sweep. Simply. What is that? You don't have to say that time work as they call it. You can have some things. This is not C. Because. It's called C. Probably I can run the get. Let me see if I get it at least. This is also so if I if I see my example is also like simple. So again I give the same key and simply say get. I don't need to specify my data and all here. The same standard steps of connecting and I'm defining my key and simply say we get. Okay. So. If we need to go to a little bit of this example again written by battery. So there's a URL shortener example running here. So it has these different steps where there will be a HTTP server listening so you can say like this is my long URL shortenate for me. It will store it in aero spike and it has different counters so that there's no hotspot again like we have thousand counters running. So every URL that comes like again gets it will pick one of the counter it will generate the next ID for that and it will give it back to you. So if you click the URL the same node server will again redirect automatically to the long URL. So it's like saying the bitly thing like standard stuff that you have. I hope it will connect now. If I simply say it is asking me like a URL. So let's say I click this whole tool search string it will say generate short URL. I think it's not connecting to the backend. I have got it so it generated this URL. Again this is stored in aero spike. So now if I just copy paste this URL in a different window the request will hit node.js it will go to the counter pick the long URL which is stored as a key. It will be stored as a long URL. The short URL will be this key that you are seeing here. So it will face that automatically redirect the request and you see the same Google search that you are seeing. So now that I know the problem the hard code you are at this. So it says aero spike okay. And if it fails it gives reasons why it failed or where it failed in a code. So it will tell what is function file line and stuff like that. So it also just for printing sake it printed that stuff. This is the callback dump when it was connecting. This is for the put. So if I simply do a get it will give me back my key. So we got back the integer bin and string bin here. So this is like the return status of the call. Okay. So okay you are also any questions on the clients. Like how to connect and things like that. So I think like if you pick any of our clients you will see the same structure. You have a connection stage. Use that object to get put. You can define your policies. I am not going into all the details. But the structure will be the same across all our clients. You might be trying to leave with the record. So was it a radix like new space? Typically with ad business we are not interested in this record after let's say 30 days. That's like 2 old for them. So different people have different use cases and some people want it to live forever. In which case you can simply say TTL equal to 0. Yeah we can store. We are just giving an option. So when you said TTL 30 what happens is system will automatically keep cleaning up this data. So you don't need to do that yourself if you don't want that data. So system will take care of that. Data is your cookies. Session keeps changing every time you log in. There is no point in storing that. Cookie data yes, but just your cookie value no. So any questions on the client side? Client APIs? No. Why someone is good to know? Yeah. Okay. So quickly I will point you to that. So like we do a get put. We have an operator. So if you see here like you can say like van pops like off.append, off.increment, off.read So you can do that. And this is not like a read update. It's just one or how many? Yeah it is automated. So operation is like the beauty of operation is the whole set of operations that you do will be under one log. So if you want to update and read it back again you can do that. But you shouldn't item two writes in the same operation. That's only restriction. On the same bin. So two different bin. Okay. So I think then we conclude the voting part. So we will talk about how to clone in general any system on AWS. Should we try? Yes, easy to have used. Do try. It does 1 million TPS, right? We have been talking about bare metals. That's where we do. Cloud. So how many nodes do you think it will take on a cloud on EC2 provider to do 1 million TPS? Three nodes? Okay, that's what he said. So one of the things that we do is we always underestimate ourselves when we are telling it to public. So that you are saying three nodes in reality I can show it to you on one node. So the reason for that is you go and say hey, I mean yeah, he said three nodes I will definitely do that. So yeah, we can do that on one node itself. But the main thing it's not just about aero spike. It's basically what we are going to talk about right now is AWS. It's not just aero spike which can do well on EC2 or AWS. Normally we say virtualization is not a place where you want to do high performance things. It's good to maybe run your Apache. It's good to run your WordPress. But if you really want something which is really, really, really high performance application. I mean I have always been a bare metal guy. I mean even at work people talk about virtualization and I'm like man that's not going to work. But so the whole the reason this entire conversation, this entire exercise came up was like we were close source earlier and we always supported virtualization techniques but it we never guaranteed performance on those. What we said that yes it will work but yeah we won't guarantee 98.5% of some millisecond latencies. Because yeah I mean Amazon doesn't provide us guarantee how can we do. So one of the things was that we did not know how we will perform an aero spike on AWS. So what we did was before anyone else goes once we became open source and people will try it out. It was like I time we went ahead and tried it out. So I mean earlier I just used to use AWS spin up instance SSH do this do blah blah blah, shut it down done. Never tried it from performance point of view. So once we started first thing was okay spin up an instance SSH install and just an aero spike. The first number that we got was 85k TPS. Now that that could be very high when it comes to you know rest of the world but for us 85k TPS is like on my Mac I get some 120k TPS man. So 85k TPS is nothing. So it was like okay what is the bottleneck and again because it was benchmark numbers we started on C3 8x. So that was like yeah that was a good machine. There was no bottleneck on it. We looked into it and what we found was that what was blocking was the network. So when I started I was using something called the PV image the paravirtualized image that is what Amazon was offering you know so far by default. Spin up C3 8x PV image and all it gave us was 85k TPS we used iPerf also to figure out what was it again same 85k TPS and about 1.86 whereas C3 8x is supposed to give us 10 gbps. So it was simply not working. A quick search on the net and everyone else was getting some 100k 120k TPS on Amazon that was simply not going to work for us. So then we looked around a bit more and we figured out that Amazon also has improved lately sometime last year they came up with something called enhanced networking and one of the requirements for enhanced networking was HVM. That's hardware, virtualized, something you can read or comment. Yeah, hardware assisted virtualization. So the thing how it works is basically your virtualized machine it has better slash direct access to the underlying hardware. For example let's say on these machines on the higher end machines the network cards that Amazon uses it actually has queues of the order of basically around 40 queues per card. I mean you are never going to get access to that that many queues on any of the EC2 instances. So but because it uses HVM, I mean Amazon still doesn't tell you but we kind of figured it out with various stuff. So let's quickly try it out. So I'm logging into some machine. Network died. So the thing that I'm talking about is AMC is one of the newer tools that you know you can use but okay this is a terminal way of figuring out how much how many we are doing. Basically what I'm going to do right now is run the same things that we did you know inserts or read and see what kind of numbers we get. I'm doing reads at about 215, 220, 250k. If we go to AMC again it will show similar numbers. This we show in 10 second time slices so that's why the first one came out at 199 the second will come on. But okay the main thing that we are talking about is in terms of TPS. So yeah by default HVM I mean you don't do anything each NIC will give you about 250k TPS. We needed 1 million TPS at least on the NIC. So what do we do about it? One way of achieving it of rather increasing your instance performance especially on a lesser instances basically up to like say anything .4XL is used something called RPS that is received packet steering. So the way it works is basically one of the problems with these instances is the resolution won't allow me to put it to 1. So what happens out here is basically the network is all the packets by default it goes only to the first code. So if you see it in top and you do top 1 it will show you all the 32 cores C3, 8XL has 32 cores by default all the packets will go only to the first and second cores and your rest of machine will just be lying around idle. So how to improve that? Let me quickly disconnect because I can't see because of the low resolution I'll just get my notes. Sorry about that. So this basically is a system which has RPS enabled. How do we know that? So this is what the default configuration for RPS underscore CPUs is. What it means is no matter like from where, from on which queue you are getting it, send it over to the 0th processor, 0th code. So it will only go to the single code. Now what do we want to do? We want it to spread it over to all the cores, all the processors that we have available. So basically it's same. Let's say if you want to spread it over to say the first 4 cores, so 1 plus 2 plus 3 plus 4 again in hex becomes F. So that's what we have done out here. So a value of F means that yeah, send it out to the first 4 cores. So any packet you send it will be distributed among the first 4 cores of the system. Again, top is kind of stuck because of the low resolution. Yeah, so terminal is not big enough. Okay. So you see this one, right? The network load, this particular thing, soft interrupts, they have been distributed among over 4 cores right now. So this has happened after we switched the value over to F. If you switch it back to 0, which is the default. So now you see that it's earlier it was on 4 cores. Now it has come down to 2 cores. So this is the default configuration. So basically this is the place where your instance will get bottlenecked. Out here we are using C3 8x large which has 2 queues per neck. On lower instances I guess even on 4xl, 2xl for sure has a single queue. I think even 4xl have single queue. So max that you can do out there is F. What we have figured out like even on 8xl, the best performance comes out when you are using F. Like if you increase the number of processors, even though RPS by definition is not supposed to increase the hardware interrupts on a bare metal machine but on this particular like this virtualization layer what we see is that there is a very, very high increase in hardware interruptions. So basically you are not able to maximize your RPS. So the limit that we found out here is like without RPS we can get about 250k TPS on a single neck. With RPS it will go about 750-800k TPS and if you increase it further more then it will just be stuck. I mean it will come down because again hardware interrupts have increased. So it will come down but the max that you can get with RPS on this thing is 800k TPS. So how do we achieve 1 million TPS? How do we do that? Like our bottleneck is each neck is tied to a core and it goes to that single core. What is the easy way out? Increase the number of necks because you can't increase number of queues on a virtualized layer. What you can do is increase the number of necks. So AWS has something called ENIs Elastic Network Interfaces The good thing about ENIs is unlike most of the AWS stuff this is free. You don't pay anything for it. I mean if you use elastic IPs you don't pay for it as long as you are using it but if you don't use it you pay for it. That's not the case with ENIs. No matter whether you just create them and keep them around or you use them there's no charge for using ENIs. But the only thing is there is again limits on how many ENIs you can attach or how many interfaces you can attach to each instance. For 8XLs you can attach about 8, for 4XL you can attach 4, for 2XL 3 so yeah it's documented. What is the limit? On 8XL we when we are trying to attach all the 8 necks but what we figured out that we needed only 4 more necks. Each nick was giving us about 250K TPS we attached them, we attached 4 and yeah we could get 1 million TPS. So I was initially planning to give a demo but what happened is that I ended up creating 107 instances for us in a single zone How could I do that? Amazon itself has a limitation of 20 on-demand of 5 spot instances so thankfully we ended up calling Amazon they increased the limit but the side effect is I don't have any network ENIs left in that region so I cannot attach any ENIs right now I do have another one but yeah we are running short of time as well so the easy way out to do is all of you have the C3 ATXLs for the time being or even if you don't have it, you can do it on your own 1 million TPS for an hour if you want to cost you 1.68 dollars that's the max for on-demand prices go for on spot prices that is for a C3 ATXL you will pay about 0.25 to 0.26 dollars that is about 15 rupees per hour so that's what you will pay for 1 million TPS that's for a server for a client you will need to spin up let's say maybe more clients you can use smaller instances, you can just use R3.2XLs so that's enough to push thing so the basic thing is for any instance types where you are facing this packets restriction use RPS if you are using PV for your existing stuff move over to HVM as soon as possible move over to enhance networking as soon as possible Amazon itself is pushing towards HVM so now if you go up towards EC2 console launch instance earlier it used to be top layers of PV and then you would have to search for HVM images now starting 1st April 1st of July I guess like whenever they launch the T2 micros and all they have reverted it so PV instances are completely gone from the default page now we have to search for the PV images HVM images are what are there by default one caveat about enhanced networking it's not available on all the existing or older generations one, it is available on the latest R3, C3s and I2s I3s these are the only types where enhanced networking is available M3s they have HVM instance but they do not support enhanced networking yet but as Amazon is gradually moving towards HVM do expect that support to come but you should be using either RC or I2 make use of enhanced networking another thing quickly about enhanced networking and VPC there is something called basement groups when you use let me quickly talk about what are the requirements for enhanced networking you should be using HVM that's the top most requirement another requirement so the requirement for HVM is you should be using VPC you can only launch HVM instance only in VPC you cannot launch it in EC2 classic so starting 1st of January 2014 like if you created your account after 1st Jan 2014 your account will only have VPC option you will not have the EC2 classic option earlier there used to be EC2 classic option so older accounts will still have that option you can go into VPC and then create so VPC and HVM is the prerequisite for enhanced networking another thing which is there but not widely talked about it is documented but not many people know about it or use it is something called the placement groups so what does placement group mean your instances will be clubbed together it's fine what does it mean to me why do I care why do you care is because in Amazon even though it doesn't tell you explicitly it is documented in their placement groups what happens in a placement group is you are placed in a specific 10G network so even though you might still be on an R32XL or R34XL what you will get is a 10G network you won't get the entire bandwidth but what will happen is your quality of your network will improve so that's what happened with us initially we were using R32XL for our benchmarks and we were getting about 1.86Gbps and we were like wow we came back to it and we didn't use placement group and we came out to 1Gbps and we were like what happened so that's when we figured out use placement groups always again like in case of Amazon it's not guaranteed that you will always get good performance ok I am going to take another 5 minutes less than 5 minutes to wrap up I am being hinted so yeah HVM enhanced networking placement groups I guess yeah that's pretty much it so do you guys have any performance questions about EC2 what all you are going to do do you have any problems about it I mean yeah or any other questions what we have talked about so far yes but right now it's only in a single instance I will distribute it across the instance so this particular instance is was in east just search for aerospike so you will find couple of AMIs this one is called aerospike fifth elephant workshop I mean this I will distribute it across regions I will do it later so the way you should actually be doing it is you should go to AWS marketplace because that is the place this one was created just for the purpose of workshop so we put the latest stuff out there this will definitely not be updated AWS marketplace is the place where we keep updating all the time and again aerospike that is the free edition you don't pay for aerospike only thing you pay for is the instance charges so you should be using AWS marketplace do you use any type of storage again we support but we are not benchmarked it here so far what we have benchmarked is the in memory thing but we are working upon it other cloud providers we are in the process right now expect a block soon enough about performance on EC2 right now we are in the process of comparing it with Google compute engine that's the next cloud provider other cloud providers we have some comparisons with the bare metal providers as well so yeah there are a couple of comparisons out there okay so I guess I am being told very gently to be to get out of the room but yes we are available outside I mean you have any questions feel free to reach out or you can mail me unsuredaerospike.com I will be there tomorrow so feel free to reach out to us over there