 and, you know, for Leslie Lamford, especially, and we use his algorithms in our product. I'll talk a little bit about it. What I'm going to really focus on today is to give you an idea of the kinds of extreme use cases that people have with youth that needs to be solved, in especially in terms of fraud detection, risk management, et cetera, for payment systems. And why and how we have used a different kind of database architecture, which takes SLAs to a whole new level. You know, one of the things in the previous talk was about SLAs, 99 percentile of latency being less than 10 milliseconds. Weird aerospike actually solved problems where the 99 percentile has to be well under one millisecond. So 10 times more extreme in terms of SLAs kind of SLAs that we have to achieve. So that's kind of a general high-level setting for the talk. Let's talk about some of the use cases and what people are trying to do in this area or have been doing. And these are all from all these cases I have here are real deployments. I will not mention any customer names, but these are pretty well-known companies in the world. The business challenge for a large retail bank, in terms of risk management, they hit this challenge a couple of years ago. And what actually happened to them was, and I'll talk a little bit about it later, but generally a relational database becomes very slow when you have to deal with a lot of changes as well as provide low read latency for a large number of requests. And this is essentially because of the way the data is brought into cash from persistent storage using buffering and so on. There are papers by Mike Stonebrick around this maybe 10, 15 years ago on why it is that it happens and why a database system needed rewrite. This was, you know, that is one of the reasons why the NoSQL kind of movement started. It got driven from the internet, where I have some experience, you know. I have some experience in databases as well as in running internet systems. And caching is a very common way to solve such scaling problems. But it came with a bunch of cost. At some point, this particular bank required, you know, needed to scale it up by about six X, they were already running with 150 caching servers. And then a hybrid memory architecture ended up with them using less than 10 servers, you know. And it's all about hardware and how you leverage flash and what kind of algorithms you can do to enable real-time access by not really using a lot of DRAM. Another kind of use case is in fraud detection. This is again a very large payment company in the world. And the fraud detection use case here is the business challenge again is very similar. These guys were already at scale. They were using a certain relation database in the caching solution in front of them. And what happened in that case was they again reached about 200 kind of servers. You know, all of the data had to be in memory or in DRAM. And the whole management nightmare of operations came in. They were achieving their SLAs at a certain scale that we're in. This was again a few years ago, after which they were growing explosively. They needed to kind of size system for 10 times higher. And again, once again, a 200 node cluster got reduced to 20 nodes using this kind of hybrid memory architecture that I'm talking about today. And that's the kind of changes. Again, you need to maintain the same SLAs of a millisecond or so for these requests. So if you look at how these systems are set up, essentially you have something on the edge. And this is a hybrid memory database, you know. Aerospike is essentially the only one I know of. That's it at this extreme level. And essentially you got a bunch of machine learning and various algorithms in the front edge which enables these systems to make real-time decisions based on some models. The models themselves are not necessarily computed in the front edge. They're computed using traditional technologies that you know of like data, Hadoop, HBase, that part, pick your favorite kind of analytics, algorithms that generate models on lots of data, feed that into a front edge database based on hybrid memory. And then in real-time, you can achieve amazing levels of what I would say throughput with very low latency and high uptime. And again, low latency, heavy write load, 100% uptime, all the things the Cosmos DB folks talk about are all applicable in this area. There's a whole bunch of use cases in this area even though I'm focusing more on the payments side. So what exactly is the problem? Why do you need a million transactions for second database? You know, it is very clear why the Cosmos DB folks need it. They're doing a global database, they're kind of replacing something across the world and of course they're going to have trillions of transactions. But AirSpike also delivers across all our customers trillions of these transactions a day. And some of them run on the cloud, some of them run on Azure even, you know, some of them run on Amazon, many of them run on their own data centers, achieving this. So what are these trillions of transactions, you know, which people are doing? Why is it even happening? And this is not actually a company or everything I'm talking about here is not about old relational kind of use cases at all. You know, I might have a place near this person. This is a really new use cases driven by the internet use, you know, for example, the Indian internet market is growing like creeps and bouts, you know. And I think it was about six months ago, all the money was sucked out of the system. That was the kind of interesting shock, right? It immediately walled in India to a leadership space in kind of, you know, what do you call it, in the new kinds of payments. And that's kind of interesting, right? And people like, you know, Paytm, you know, free charge and so on, especially Paytm, have become like household leaves in India because of that. And that's kind of fundamental to some of the things that are going on. Payment is, you know, ecosystem has, you know, normally when you go pay with cash, you just go somewhere and you pay cash, and you're done. You know, if you're using a Paytm account, you're paying somebody, there's a person who pays, the person who receives the payment, you know. But is that enough? Because payment comes through all kinds of channels, right? So you have people, you know, ordering something on a particular government website, and then there's a simple, you know, a simple model will suggest that you just do some fraud analysis based on justice information. But that's totally inadequate because, you know, when you're walking around, right? For example, sometimes the Wi-Fi here, as soon as everybody showed up, it was really working well in the morning, when everybody showed up, it kind of, it was a little glitch, some of you might have noticed. At that time, if I did, you know, I would do my Paytm transaction through the, not the Wi-Fi, to some other, you know, to my carrier network. Now, depending on where the fraud is happening, somebody could have hijacked either the Wi-Fi network or the carrier gateway or whatever it is, we need to be able to figure out where all these transactions are coming from. You can't just focus on the sender or the receiver, and that's kind of fundamentally the point. And then it gets even more complicated, you know, there are, you know, resilient channels through which all of these requests come in. So what is it that you have to track in order to detect fraud in a payment system? You have to really understand and have great visibility into every actor in your network. An actor is not just the person paying or the person receiving the payment. There are like a zillion other actors in terms of intermediate gateways, that, you know, you can come to a website which is kind of spoofing something on your network, that you think you're going to, you know, a website, but it's basically, you know, you're going to some kind of scaffolding page. And all of that needs to be understood by anybody on the other side. You know, they get all this information through the network, they can actually track it. And that's kind of fundamentally the point. And there are a bunch of stuff these are indicated here. But the main thing is all of them have in common basically one thing, which is they want to detect fraud in terms of what's going on in the world, you know, in terms of the network. And they have probably a few number of business transactions per second. You know, we talked about billions and millions and all of that. But actually the number of people actually in India or to the matter any particular place like North America paying per second is not like a million people are buying things for second and seeking payments. But in order to kind of detect fraud, you know, keep track of it, you have to not just do reads. You have to keep track of the fact that somebody came in and did make a payment, didn't make a payment. You've got to be making a lot of right. But also for each of that data, you've got to look at transacted history on every act of the system. To be thousands of basically reads and writes to the database for each business transaction. And there is really mostly the traditional databases simply will not handle it. It will handle it if you would do a lot of work on top of it in order to get the low latency, for example. Because if you can't detect fraud within seconds, you know, and if you let it go on for hours, you're going to lose a lot of money. You're going to, of course, correct it later. But the amount of business risk is immense. And you can make all of that go away if you have a database which can handle millions of transactions per second on the edge with very low latency and high output. That's fundamentally what a hybrid memory architecture allows you to do. I just want to say a couple of words here on the actual Indian markets. And speaking in Bangalore, I mean, I've been observing this market. I've been visiting like maybe every three months for almost like 15 years. I work in technology and basically in these kinds of areas. And what I'm seeing here is not any different than what I saw earlier in the US and what happened in China maybe 10 years ago. So all these companies, right, they're just not going to be household names right now. They're going to be immensely popular but they haven't been executed and all that. They're going to be huge global companies, like all of the whole Alibaba system that China has thrown out. There's going to be like many such things coming out of India, right? And the scale is only going to increase. It's going to double. If you think trillions a day is enough today, you're going to need like 10 trillion within probably a few years. So that's kind of how this market is evolving. And I think that's actually a great opportunity for the kind of work that the Cosmos DB folks are doing as well as what we are doing to kind of help companies reach there. There's also another interesting thing. There's a whole thing about, oh, we can just solve all these real-time problems using a streaming system. It is true that you can do really sophisticated things with a streaming system. But when you have a very farce database in the sense that you have, India has a billion people, each of them has, let's say, at some point we'll have like, everybody's going to have a device after the line's geo-announcements, all going to be smart phones. So what will happen is they'll get food. Why will they stop at one? So it will end up having like 10 to 20 billion things you have to kind of store in your database. Try running that on MySQL, right? I mean, it's not designed for that. And that's okay. There's nothing wrong with that. But when you want to actually access the 100,000 of these 10 billion things that are on your website at a particular amount of time, a streaming solution is brilliant. Even a million or so, it'll handle really well. But if you want to have all the history for these million objects over the last four months, each of them you've got to get a hundred or 2,000 more objects. You can store the 20 billion objects as well as all the history in DRAM and actually build one of these, all the startups have talked about, right? They need to be able to solve the problem today. They can actually solve it by using a database which can do a very good index join on these data and still achieve that and a fraction of the cost they would have to do if they had to store all of it in memory. And that's actually game-changing for many of the smaller players because that will give somebody like ATM the wherewithal to actually stand up to somebody like liars, geomoney and so on. And you're gonna see this happen. Of course, they haven't all do it all right, but technology is their friend. Disrupting the technology is what goes on with flash and so on that which I'll talk about in a second. Does that kind of motivate the general problem here? So what exactly is the work that we have done at Aerospy, right? And again, there was a slide from the previous talk which is pretty much the same thing I put up here. The only thing is the 99 percentile is less than one millisecond instead of 10 milliseconds. And that's okay for the applications they're doing for us, it's not for the applications that we are actually focused on. This is a narrow slice of the market, but it's really at the end, at the most high end of mission-critical kind of cases that you could ever care about. It's like a ATM or a PayPal or a Visa or Master of Epic, whatever, I'm not talking about these as customers. I'm just talking about these as players who have to solve these problems, right? So the one thing I want to point out is I have, as speakers, they've talked so far, it's consistency. I'll share a few thoughts on that later, but I'm really talking about real deployments that have happened at scale for eight years is basically a topic of this talk. And the future has to have consistency, much like what the speaker earlier talked about. And that's how much consistency can you get without compromising on the performance I'm talking about here is a fairly sophisticated problem. So what exactly is the current architecture with everybody, the traditional architecture here is essentially sticking a cache in front of a database. There are a lot of kinds of problems with it. First of all, it's very complex. Caches fail, and when they fail, loading them back in creates a huge amount of issues, and then it takes time. What do you show a user who's coming to your website when some cache has failed? The user doesn't care, they couldn't make a trade, they would have made a payment, their account has changed, they saw the new value. Great, cache failed. They go to the website again, they see the old value. This freaks them up, right? They're gonna come back at the back and say, hey, you know what, I made the transaction, my money's in the account, my money, I can't show it to the back, so I'm gonna go, oh, don't worry, don't worry, the cache failed, I'm just gonna come back in an hour. This is not a good user experience, okay? And I think the government will have regulations against this kind of stuff. And so people could go to jail and so on, right? In some countries for this. So the consistency of the actual view a user sees is extremely important, and caching is extremely limited for these kinds of applications. Curiously, it works well. Scalability is another issue, that they're talking about, right? You have everything in DRAM, you know, if you want to store petabytes, petabytes, or even terabytes of data in DRAM, a company like BigBasket will just not be able to compete today, right? But they need to compete, and there's a solution which I will talk about, but that's kind of the idea here. What I'm gonna talk about is the work that we've done in ErusBike essentially eliminates this dual nature of cache and database, and just ticks the database, which is as fast as a cache, in front of the edge, so you don't have to worry about persistence, okay, because we put everything in flash, and we read directly off of flash, we will use massively parallel architectures to make sure that you can push through a million, whatever transaction, you know, we've tested up to like 5 to 10 million per node on a 56 port machine, but really, we can push as much transactions as you can through memory, and we will make sure that there's enough architectural, you know, flash kind of storage attached to the machine for vertical scaling that we are now able to generate millions of transactions per second using SSDs, okay? At sub millisecond latency. What that gives you is essentially very high scale, very low latency, and of course you need a distributed system, it's a given, stable state, you know, essentially the other important thing is fast development, I think again, the Cosmos DB talk about it, so you don't want to kind of have the traditional model of making sure that you have to, what do you call it, prefix all the schema, so fast application development is really important. We have a shared nothing architecture in Aerospike, that's a, again, I'm not gonna dwell too much on the technology here because it talks just for, you know, 35 minutes, but there's a VLD paper, you'd expect that by now, you know, we have two of them, but read the one from last year. It describes an architecture in a gory detail, it talks about how Aerospike achieves the kind of performance I'm talking about here. We use a very specific distributed hash table algorithm with no hotspots, shared nothing architecture, the smart cluster applied the smart so that it catches some of the mapping so that you're able to scale it very well. Transactions and long running tasks have to be prioritized in running. This is a fundamental point. Whenever a node dies in a distributed system, you have to actively redistribute the data. Redistributing the data sometimes takes a long time, but however, normal transactions have to continue because we're talking about some millisecond response time, so we need to have enough copies, we need to be able to confuse those transactions while these other things are going in the background, like rebalancing. We also, I'll focus less on this, XDR has basically a replication scheme as a cropped data centers, so that that gives you global distribution. Fundamentally, what you predict from this talk is indexes are in DRAM, data is in SSD. This is a fundamental part of the aspect of the hyperlibrary architecture, and then this results in prediction. Using parallelism, we can provide extremely predictable performance, very high uptime, with low management. Again, this is actually a repeated thing, I'll put it back. One thing I'd like to focus on is, I talk about a hyperlibrary architecture, which is kind of the short and the left of this picture here. We also, in Aerospike, have a pure in-memory configuration that you can do. What hyperlibrary provides, and some of the techniques we use are pretty classic, in a large sector file system, we use a copy and write mechanism. We always, it's like the group comment that you're familiar with, with relation databases. Indexes, again, are in DRAM. It's highly parallelized. You can fit a whole bunch of SSDs onto the nodes. Okay, and essentially what you can do is, in parallel, access data directly from this. So cluster formation, essentially, is all about BlackSauce, and this, again, was invented by Leslie Lambert, who you heard from in the Microsoft talk. Essentially, the idea here is, the practice is actually an optimal algorithm for consensus in the distributed system in the presence of failures. And that's really important, because when you have a cluster which is running, it is possible for any, this is a very simple example here, so where there are three nodes trying to form a cluster. Essentially, the basic idea is, each of these nodes has to see all of the other nodes. How fast can you kind of come to consensus that the cluster formation is exactly N1, N2, and N3? And we use basically Paxos for it. We only use Paxos for cluster. We do not use Paxos for actual data syncing, so on, because that wouldn't give you the one millisecond response time. So when new nodes come in, the nodes go out to do that. In terms of partition map, the basic idea of aero-spike is we use the right-ended 160 algorithm as every key, and that provides a 160-bit digest of 20 bytes, and we take 12 bits of it, create a 4K partition list set, so to speak. So given a key space, it is split into 496 partitions. Now, each of those partitions, think of it as a divided count for algorithm. Each of those partitions are mapped to nodes, and then you have a mapping of partitions to nodes with first copy and second copy and third copy and so on. So if you have two copies, there will be two columns in this mapping table. And this mapping table is computed based on the original thing I showed of the cluster list, which Paxos kind of allows you to do. One of the things that we do with our partition mapping is that I want to focus on the second line here, partition P2. There's a whole algorithm. Again, this is in the VLDB paper. You can look at it. We also have all of the code out there in our open source version. So everything I talk about here is actually implemented and more of it is implemented in the enterprise version, but the community edition essentially has all of these things. So you can download it and buy it out at your convenience. So if you take the second thing, that the way our partition map works, we have assigned all of this in random. What that allows you to do, but also we preserve the order. So when the node, what the portion of the picture shows here at level B is that N5 has gone down. If you notice, N5 has moved from this and the rest of the nodes are the same list. Because we are showing three copies here, so N3, which did not have a copy of partition 2, will now get a copy of partition 2. And when P5 comes back up, it will reinsert itself in the same place. This is a way to reduce the amount of migrations you do when nodes come in and out of the package. We already talked about Paxos and how the gossip protocol is used to figure out a new organization cluster. I'll skip that, that's that. Here is an important thing to note. So massively parallel architecture is important because what we do in Aerospy is we can take data and we kind of short it across, or cluster it across nodes. But within each node, we can do it across disks. So in this particular case, we have like five disks per node. And then we have four nodes. So it's a many-way split of the actual data, which means you have many SSDs which can be read directly, which just kind of gives you a limit on the throughput that you can actually achieve. Rights or synchronous within a cluster, reads can go to any node, reads are simpler, just go to anything. And all of this has to compete within one millisecond. And we can do much less than that. But that depends on your hardware. If you're a cloud provider, it'll be a little slower. It'll maybe be like 0.5 milliseconds. But if you're on your own bare metal, fully tuned, it'll be like 0.2 milliseconds. So those are the kinds of things that you can achieve. I think this is just a summary of what we do. I already talked about it. In terms of data correctness, I think we basically focus on returning the latest copy of the data, no matter what's happening in the cluster. The caches, we basically eliminate caches I found out earlier. And mixed workloads should also be able to run well. In terms of distribution, what Interspy does, it allows asynchronous replication across clusters. So when you deploy Interspy, you always deploy it essentially with a cluster in every data center, if you will, with an asynchronous replication link between both. You can do two-way replication if you want, or you can do a hot standby. It's basically an option that you have in terms of how you can set up your system. The idea here is since we are really focused on low latency, if we would straddle a cluster's nodes across a wide area network, we will not be able to preserve the predictable latency which you need. So the trade-off then is if one of those clusters disappears, you do have a little exposure in terms of the lag that you have between when first cluster drops to the second cluster. This lag is kind of proportional to the speed of light in some sense because how far these clusters are determines what the lag is to a large extent. So if you have these clusters, many of our customers run clusters close to each other on the same data center, or DR, and it's quickly low lag. So also the other thing is we have to really focus a lot on high uptime because all of these systems, they're only useful if essentially the system runs all the time. And the other thing that we have actually spent a lot of time on, which is hard to go through here, but I think once again you can read the paper, is we spent a lot of time in optimizations and C in terms of how code is written. If you look at the traditional databases like Oracle, SideBase and so on, they're all written in C. There's a reason for that. We also work a lot on figuring out how the course work on these modern architectures and we make sure that you can do CPU pinning, we can assign the threads to the right course, and also we have our cache line. Our index entries are exactly 64 bytes, which kind of coincides with the cache line. All of that actually makes a really big difference here. We also, a little bit about the data model how data is organized. Aerospike is a row-oriented database. So what you have are columns, which we call bins. The bins can have like schema. It's really a column. So you don't have to fix the schema. And what you can do is various kinds of, what do you call it, types in there. The important ones to point out are lists and maps, as well as GOJs on. These are ways in which you can run spatial queries on Aerospike, but you can also essentially keep lists and maps and key values and so on. In terms of, this basically just explains you how we use message pack in terms of the internal format. Again, a lot of Aerospike is all about efficiency. How much can you get out of the hardware? The interesting thing is we work on commodity hardware. So if you're running it on bare metal, we get really, really great performance. But if you run on cloud, we still get impressive performance, depending on the instance and so on that you use. So it's all about squeezing as much performance as you can out of whatever hardware we are given for the benefit of the application using the database. A little bit about secondary indexes. We allow indexing on the base type of course, string, integer. We also allow indexing of one level on lists and maps. So the idea in Aerospike is we try to do things which will enable us to keep the performance as it is. So we do not, whenever you see the features in Aerospike, they will be limited. So every architecture in this case is enabling us to scale NX higher but with NX cheaper kind of footprint in WCO. So when we have that kind of requirement, we can't just do any kind of indexing because some kind of indexing will make the system really slow, not just for that kind of index, but for generally for all access to the database. And you want to stay away from that. So we can level deep into a map or a list we can index. Our indexing is based on a scatter-gather algorithm. So what that means is it is better for low-selectivity indexes. So when you make a request for a secondary index, you want to get a bunch of data back. So that means you can send the query to multiple nodes and get all the data back to the parent. That actually works really well. However, if you're just going to get an email entry back, it'll still work, but there's a little bit of work in there. So you're better off using the primary index in Aerospike, as much as you can get. So what exactly does all of this architecture... I've kind of given you an overview of this architecture. What does it help you in actual deployment? That's kind of interesting. And that's the surprising thing. This is from a real deployment, which kind of cut over maybe like 18 months ago. So what we have here is the previous system, which was actually missing as a list. This is like a... less than two milliseconds, probably. They were missing 98.5%. So they were missing 1.5%. They were only making 98.5%. Once they cut over, we were able to make the hybrid memory architecture 99.9%, which means 0.05%. That's 30 times better. You're missing 30 times fewer requests. I mean, that's huge. Why is it huge? Because everything that you're detecting, if you can't actually detect, you should continue to complete the transaction. You can't wait. If somebody is capturing money, you transfer. You're not going to like say, oh, you're waiting for fraud. And then if your fraud algorithm cannot finish in time, a lot, you deal with it later in terms of what you're doing. And that is very, very important to understand, is the improvements you get from the techniques I'm talking about here are not like you're not going to get 30% improvement. You're going to get 10x, 20x. Otherwise, all of this work that I talked about is not really worth it for somebody to deploy. That's fundamental. The important thing is you can do all of this with low cost. This is another actual deployment which went live probably, you know, six months ago or so. Look at the kind of savings you get. This is the previous one. And the savings are not aerospike savings, okay? The savings are triggered by aerospike and the savings are simply between the size of your cluster, how much DRAM you put on it, versus you can stick it all in flash with a small amount of things in DRAM which is basically very good data. When you look at it at that level, you are actually taking advantage of the billions of dollars of investment put in by Intel and Samsung and Micron and all of these manufacturers who are driving this faster, right? No other database in the world today allows you to drive that, you know, to use that in your application at the levels that you do. Which means one of our earliest things about aerospike, our earliest slogan was we provide the ability for startups to grow to Google scale without spending the kind of money that they basically, Google would be able to do that. Google's got a lot of money. How do you do that? Because Charles would have to figure out how to do that. And essentially, this is how you do that. You produce something which can scale at Google scale at 10 times lower the cost, you know, and we ended up with like 10 companies over the last eight years who used their technology and grew to a billion. Actually, it's in Google. So you can see how this thing supercharged an Internet economy. It will not supercharge a very staid, slow economy. Enterprise software is great. It works for that. For somebody who wants to take over the world like an ATM or an in-mobile, they need to grow fast. They don't have the resources the big players have. The good news is the resources are produced by Moore's law on the hardware side. A little bit on the various kinds of... So Inespeak is not kind of a technology which is general purpose. We don't go in saying, you know, for example, cost plus GB, it's a general purpose database. We don't compete with that. We don't compete with Oracle, right? What we compete with is we compete for a use case where nothing really works. So if you look at the kind of replacements we have here, everything is different. Why? Because people go with what they go best. They try to make it work.