 This is Dave Vellante of Wikibon.org, and this is theCUBE where we extract the signal from the noise and bring you the data that helps you make better decisions. Adam Fuchsia is here. This is the third series that we've done of Chalk Talks. The first one, we looked at big data lessons learned at the NSA. In the second, we took a look at the security ecosystem around Accumulo, the database that Adam helped build while he was at the NSA, and today, we're gonna look at how Accumulo is optimized to maximize performance. Adam, take it away. Thanks, Dave, good to be here. Yeah, so we're gonna cover some of the secrets of how Accumulo really gets all of its performance in real time. A lot of this derived from Google's Bigtable and things that we've extended in my previous career at NSA, as well as what I'm doing now with Squirrel. So to start out, we'll look at one of the key elements here, which is the log structure of Merchantry, which really backs up a lot of the key value operations inside of Accumulo. So Accumulo stores key value pairs and it stores them in sorted order. And we take tables, which are really just collections of key value pairs. We break them up into partitions, which are known as tablets. And focusing in on just one particular tablet, we can talk about a couple of operations specific to that data. So in particular, if we're just looking at the tablet data flow here, we have data coming in from the left and these are key value pairs coming in in random order within the boundaries of that tablet. On the right, we have key value pairs coming out, but they come out in sorted order. So in in random order, out in sorted order and in between, there's a mess of operations to perform that sort. And we're going to optimize two different elements as we perform that sort operation. The first is to minimize the latency. So as data streams in, we want that data to be able to contribute to a query with as little latency as possible there. At the same time, we want to minimize our impact on disk resources. So Accumulo is a disk based architecture as we store data on spinning disks. We don't want to do a whole lot of random IO. We want to perform sequential IO whenever possible on disks. So that's performed through this mechanism and I'll take you through that here. So key value pairs coming in in random order immediately go into an in-memory map and this is a balanced binary tree. It's random IO, but it's inside of memory so it's efficient and as soon as it's in that in-memory map, it can contribute to a query. So we've minimized latency there. However, in-memory map, that's a volatile memory. So in order to preserve durability, at the same time, we write the data to a write-ahead log. This write-ahead log for Accumulo version 1.5 and beyond is stored in HDFS. So it's actually replicated on multiple nodes and it's synced to disk by the time that we acknowledge a write. Another issue with just storing things in memory is that memory is much smaller than disk. So at some point, this memory will fill up and we'll need to perform what's known as a compaction operation or minor compaction, also known as a flush operation. We flush that in-memory map to a file which is stored, again, in HDFS. So that write on disk, we want to make sequential. So we buffer a fair amount of data in memory and write that all as one sequential stream to an R file. Memory is much smaller than disk drives. So it'll fill up again. We'll do another compaction operation writing to an additional R file. Each of these R files then participates in queries. And as we stream data out of it, we'll stream, we'll merge data from each of these independent containers through a mechanism here to provide a single sorted stream of key value pairs at the query. The more R files we have, the more sources of data that we have, the more seeks that we have to do on disk. So query latency is impacted by the number of these files. So in order to minimize query latency, we do this, an additional background operation here, which is known as a major compaction, where we take multiple files, merge them together to perform, to create one single globally sorted file across all of those. And that operation happens as a background processing operation. So we have background threads doing the minor compaction, background thread doing the major compaction, and then threads attached to the query or the write performing those operations. So all of this is great. This is basically a standard log structure merge tree design. It goes back into the mid 90s for technology. One of the things that we've done with Acumulo is to put another mechanism in here, known as the iterator tree. So as we stream data from these multiple sources and merge them together, it actually goes through a series of operators. And those operators take place not only on the query path, but also on the compaction paths. So there's a series of operators that these sorted streams of key value pairs go through. So if we focus in on what those operators are, there's another view here of that iterator tree. So you can think of a series of our files and the in-memory map contributing to some following process here. The first one that we're going to do is really a merge. So we'll merge all of these individually sorted key value sources together to perform a single stream of sorted key value. But we're not done yet. So operations like deleting key value pairs or the cell level security that you're familiar with inside of Acumulo, that all goes through iterators and is performed in iterators. So that's a set of system iterators. On top of that, we can extend our capabilities to application-specific processing. So you might consider the case of I'm inserting key value pairs, I insert the same key twice with two different values. What do I do with the values? Do I keep the most recent one? Do I keep all versions of it? One of the operations that we do inside of the application specific iterators is versioning. So we can decide to keep the most recent value. We can keep the most 10 recent values. We can keep the most recent 90 days of values. So those are basic, what we would call filtering operations. In addition to that, we can do aggregation operations. So instead of just filtering out some of the values, we can calculate a function of all of the values that we've seen. And if that function is associative and commutative, then we can reason about it and we can say that really any online statistic fits naturally inside of that iterator tree here. So one example of that is word count. Anybody who's ever done the MapReduce tutorial has gone through a series of documents, pulled out the words, mapped those words to a count of those words, and then inside of the reduce phase done a collapsing of all of those counts, adding them up to provide the final word count. We can do the same thing, but we can do it in a streaming fashion using the iterators combining capability or aggregation capability to perform a very efficient merge. So for example, if I have this corpus of documents, I generate a set of terms. Each term I associated with the number of times I see that. I'm going to see the term A many times inside of the iterators. I may see two of those key value pairs. I can add those up. I get another version. Maybe that's one of my compaction operations. I do another compaction operation on the next two. Maybe I do a major compaction, merging those two together to get my final or at least most recent count. So the term A being seen four times would happen as an aggregation inside of the iterator tree. So when we look at the performance of that, essentially what we're trying to do here is avoid read, modify, write. So as I'm inserting randomly ordered key value pairs, if I need to look up the previous value, I'm essentially reading that off of disk, causing a seek on disk, and then writing a new value of it could also result in a write to disk. That type of randomized disk is very inefficient. So since we're piggybacking on top of this log structure merge tree mechanism, we can get much more efficient writes of online statistical data, where we're keeping that aggregated footprint instead. So a cumulative has this thing called a compaction ratio. And that actually allows us to shift between extremes here. On one extreme, we might want to say that I only want to keep one file at any point to really minimize the read latency that I see. The cost of that is every time I write new data, every time it flushes the disk, I need to compact it with that existing file. And as such, I'm going to be doing a number of copies, a large number of copies of each of the key value pairs that I write, where those copies are essentially rewriting that key value pair on disk. The flip side is maybe I don't want to do a lot of copies. I want to minimize my impact to disk, so I can keep the copies down. But the cost of that is I get a lot of intermediate R files, and that has an impact on read latency. So what we've offered is a ratio that you can set in between there. We've tried to optimize it for the general use case, where it's a mix of reads and writes. But essentially, we can bring that number of copies down, which makes our writes much more efficient, and at the same time, keep our read latency fairly low. So there you have it. That's the secret to Accumulo's online real time performance for a number of applications. Thank you, Adam. So you're seeing some of the innovations that Accumulo built on top of Bigtable and, of course, Squirrel just announced Squirrel Enterprise, which is an application development environment to allow programmers to take greater advantage of Accumulo and dramatically simplify that environment. You can go to squirrelsqrll.com to see more information about that and some more of Adam's work. Go to youtube.com slash silicon angle to see this and other videos that are relevant to Accumulo and the big data space. This is Dave Vellante at wikibon.org. This is theCUBE. Thanks for watching, everybody. We'll see you next time.