 Good afternoon, hopefully you have gotten some coffee so you can stay awake. My name is Lorena Poland. I'm a tech writer for data stacks. I'm also an Apache Committer, one of the first non coder committers that the Apache Cassandra project got. And I've spent about 11 years with Cassandra now. And I was supposed to talk about another topic, but I'm going to talk about unified compaction strategy. So this is compaction for all. Go to my next slide. There we go. I'll give just a really brief reminder of how compaction works in case I have anybody who hasn't had a lot of experience with that. Why you would want to use unified compaction strategy. We'll talk a little bit of the innards of leveling and sharding and improvements that have been made with this compaction strategy. And finally a comparison to legacy compaction strategies. So today, well, I better stick with my slides. So Cassandra has a pretty straightforward way that it deals with doing rights. It writes to a commit log so that if you have any interruption of service, you can read that commit log back and get any of the rights that might not have gotten written to memory or disk before the node went down. It also writes to a mem table. So that's a in memory storage, which is a representation of the table that you're writing. And periodically, depending on hitting a threshold value, the mem table will flush to SS tables that are on disk. Those SS tables are immutable. That's a really big part of the story because you can't overwrite them. They don't get rewritten. You just get new SS tables when you write new data. Because of that, you run into two problems, write application and read amplification. So they all stem from the same problem, immutable data written to disk. In the case of write application, what that means is that you end up, you can end up with multiple copies of the same data or even outdated data that data is taking up disk space. And it has the potential for being rewritten multiple times through the compaction process. With read amplification, you have immutable data written to disk. You could have multiple copies of the same data. When you go to read it, that means you might have to read multiple copies and try to figure out from those multiple copies what is the valid value that you actually want to return. So you would think, okay, let's just, you know, we're going to have all this immutable data written to disk. All we have to do is compaction and everything's good, right? Easy. You know, you take all these values that you find over here on the right and compact them down to a single record that has the most recent data and, you know, Bob's your uncle. Well, yeah, you know, you might want to check the box and say done, but it's not that easy. It turns out that you can't really compact all the SS tables. It's extremely slow. It's it's resource restrictive. It's using up memory. It's using up disk and parts of the data may not need to be compacted. So, in fact, what you need to do is that you need to operate on a subset of SS tables. And the question becomes, how do I pick what subset to work on? Today, we have two strategies. Well, until a few weeks ago, we had two strategies sized to your compaction strategy or STCS. And in this method, what occurs is that as you get SS tables written to disk, you group them basically by size. So you're looking for SS tables that are approximately the same size. And those SS tables can then be compacted and pushed up a level. So these levels are ways that you can kind of keep SS tables grouped. And this, you know, this one works pretty well. It's a pretty simplistic way to do compaction, but you end up with a few problems. The highest levels end up not getting touched because you don't write really huge SS tables all the way up to the top all that often. Now you have these really large SS tables up there in the higher levels, which are going to take up a massive amount of space to compact them, because this technique needs double the space of what you're compacting in order to do the compaction. And it doesn't really care what it has in the SS table, so you can have overlapping data in a SS table that doesn't get sorted out when it does the compaction. And there is absolutely no real parallelism that you can do here. So folks came up with another way, which was leveled compaction strategy and said, well, let's give this one a try. So here, what happens is that a SS table is split into similarly sized tables and promoted to another level. What you end up with is a lot of small tables as you go from level to level. And you can end up with low space. You do have low space overhead to do this because you're splitting them into small pieces. But you may have problems with triggering the compaction, and there's even more parallelism problems here. So to date, people usually use STCS for my brain just fried. Use STCS for, I have a 50-58, come on, Patrick, tell me, reader writes. Who do I have here who knows about these anyway? It's write heavy, yeah. And level compaction strategy is good for read heavy. So you have to take a pick. You've got to pick one or the other. I know these things. I just had my brain go crazy. Oh, see, I should have gone to my next slide. I told myself write heavy workloads, STCS, read heavy on LCS. There are some problems with this, though. It's not really easy to tell. Do you have a write heavy or read heavy? What do you have a mixed workload? What are you going to do? And before UCS, you basically had to pick what you were going to do ahead of time, and you had to set it. And once you had set a compaction strategy, it was hard to change it because you had to do a major or full recompaction of everything that was there. The other thing was that workloads can change over time. You know, you might have one kind of workload when you start up your cluster, but as time goes on, it may turn into a different workload than you had. So a lot of people have thought long and hard about these topics and done a lot of research into it. And one of the theories or ideas was about using unified compaction strategy. So here we go. Unified compaction strategy to the rescue. Well, UCS is all the things, okay? Basically, you can do size tiered, you can do leveled, or you can do anything in between, all right? And it is actually, although it can simulate those older styles, it is actually a new compaction strategy in and of itself, which has improved immensely in a number of different ways. And what is great is that because it can simulate these older styles, you can put it into place in your system, even if you've been running your system for years and years and years. So you can switch between strategies at any time. Literally, you could just change the values that set the compaction strategy and it'll sometimes kick off a compaction. But when compaction comes around the next time, it'll switch to the new compaction strategy that you would like to be using. The other thing is you can configure the strategy at every level. So you can have a compaction strategy for your L0 level, which is different than your L4 level. And a lot of times, that's a real advantage, right? You might want to be doing right heavy at a low level, but read heavy at a higher level. It reduces the space overhead dramatically by how it works. And as we'll talk about it, you can parallelize, you know, my computer kept saying that wasn't a word, but it just seems like one, doesn't it? You can parallelize the operation so you can get faster completion of your compaction. And if you've ever watched compactions going on, which are just like, you know, can really make you crazy, especially in a production environment, you'll be happy of that. Lastly, this is a stateless process, so it has set advantage as well. Okay, I'm going to hope I didn't... I had to pick a level of what to talk about to you guys, and hopefully I picked the right mix of heavy stuff and not heavy stuff. By the way, the guy who wrote this code is in the room, and if you want to know the heavy details, I'm going to tell you to talk to him. So basically what you can see here is a graph. And in one case, you can see if you're at the bottom, the bottom axis is a factor we'll call W for the moment, a scaling parameter. And if you go into negative numbers on that scaling parameter, you'll see that your read amplication is really high and drops as you go to the right, whereas the right amplication is low and goes high to the right. The other way around, yeah, okay. The other way around. I should get this right. It turns out that this stuff over here is what you want if you're trying to do writes, and this is what you want if you're trying to do reads. Okay. There's a whole bunch of words here. Levels are determined by the SS table size. We'll talk. It's not quite size, but that's a good first piece. The per level changes will change as you get a higher fan out factor. It turns out that that fan out factor can be devised for either that tiered value where you have a T equals some F value or leveled. If you want this to look like leveled compaction strategy, you could set T equal to two. And that you will get up to T minus one, whatever T is set to SS tables per level at rest. The results will move up a level when they grow large enough. Again, I'm using size. You're going to see in a minute it's not quite size. And you can make the relationship of F and W as you see here in the two equations for when you have a situation where you have tiered or you have leveled values. So that scaling factor W can be used to basically skill yourself all the way from LCS to a mixed workload to STCS. And it turns out that a lot of times it's much easier to relate W to an A value that tells you something about whether you're meaning to do leveled or tiered. When you set scaling parameters, by the way, in the actual configuration, you could use the integer values. So for instance, where W equals minus eight is the equivalent to legacy LCS, that's the kind of behavior you're looking for. And it turns out that that setting means that your F value will be 10. So L10 is the equivalent of W equals minus eight. Whereas when W equals two, you have the equivalent of the size tiered compaction strategy, the legacy, or T4. Neutral is when you're at W equals zero. Because you can set the scaling parameters for the levels, you can see here that you have the option to set different values for each level. And by doing T and L, it keeps you kind of in mind which strategy you're looking at for each level. And basically the last level that you set will be used for all of that level and all levels above. So for instance, I suppose if you wanted to have STCS throughout, you could just set T4 as the scaling parameter and you'd have T4 at all of the levels. Yes. Yes, thanks. Thanks for having me. Yes, the number that's after L and T is the fan out factor. So I simplified in a couple of slides here that this was size. And in fact, that was what was used for the version one of UCS. But all of you are super lucky that about two or three weeks ago, the version two got released in Cassandra 5.0. So if you pull down Cassandra 5.0, there's a beta one docker container, docker image if you'd like to use that. It's a handy thing to test with. What is actually used now is not size but density. So what density does is that it and the density is computed as the size of the input SS table divided by V, which is the fraction of the total token space represented by that SS table. So why do you care about using density instead of a size or a number? Well, if you think about size and number, I think you can probably, without too much work, realize that you could get some imbalances. You might have a whole bunch of tables come in and now you're having to do a bunch of work because you have all these tables and you need to get them moved up and you have really disparate size SS tables come in and what are you going to do about that? So the density basically allows you to sort of even that problem out. As the SS tables grow denser, they're likely to move up on the levels, but they also control the size of the SS tables so that you don't get really large SS tables anywhere in the system. That per-density level grows by a fan factor, that fan factor again, F, which in this case needs to be greater than or equal to two. So it also understands the progression of data in the LCS and the STCS cases. So even though if you're using basically the equivalent of those older legacy styles, you're still getting some advantage from the density calculations. The overlap in your data in SS tables will allow you to get triggers at the correct points if you happen to be in a mixed state as well. Okay. So what happens with density and overlap? So density, again, it's the size of the table divided by the token share. Now, this is kind of an interesting thing. When I was sort of getting into this and looking at the calculations for how do you get density and move on to some of the additional calculations that are behind all of this, at one point I was going over a calculation and I asked Brandemere. I was like, I don't understand. How do you get from here to here? This one you're dividing by a quarter and then this one you don't divide at all. What's going on? Well, when you're promoting into level zero, you're promoting 100% of that token range. Okay. So my fraction was one that was kind of, ah, what happened? So overlap is looking, you only count the overlapping SS tables threshold and overlap is what usually drives read amplification, right? You have to read from more SS tables if you have a lot of overlap. When compaction is going to actually perform, what happens is that you lump some of these SS tables together in a bucket and with UCS what you can do is you can form the buckets from overlap sections or what's set as transitively extended. So you can see here, here's three different examples. You might have four SS tables come in, each of which is 100 maybe bits. And they can be compacted into one large table that has a density of 400 maybe bits. They could be compacted into four tables of 100 maybe bits still, but a density, a higher density, or in fact, depending on how they get put together and sharded, you could end up with four of them that were slightly different sizes but still have the higher density. So the second part of this magic is sharding. And this is what really leads to the parallelization of the compression. So you can set token ranges. In the initial build of UCS, that was a fixed value. So the sharding was a set value. And as you'll see, there's been a little bit of a switch to that as we go on. But because you can parallelize, now you can sort of deal with each of these verticals without concurrently. You don't have to have them, they're not dependent on another token range in order to be doing the compaction. So the way that the sharding works is that reasonable breakpoints are found. So you can use reasonable breakpoints rather than just making arbitrary breakpoints that might end up with a lot of overlap because of how you break them up. And this also means that your compaction doesn't have to have the same rate. It doesn't have to have the same rate across the token ranges, and it doesn't have to have the same rate on the levels. Ooh, let's get some math. So basically, in this most basic idea, really there's only two things that determine what's going on for sharding. One is a basic shard count, a lot of times that's set to something like four. And a target SS table size. If you have the situation, and you're using a condition here that uses the density, the base shard count, and that target SS table size that you're trying to reach, and if that value is less than one, then you will shard into that basic number of shards. But if it's larger, and you can see there's a little bit of a calculation here, if you want to read about that in more detail, you can go look at the docs that I wrote. But basically what it does is that it means that your shard, the number of shards you're set to is a multiple of two times the base number of shards that you set. So when compaction starts, the first thing that you need to do is calculate that result density. You use that in the condition to decide how you're going to split the SS tables into close to a target SS table size. And basically what happens is those SS tables kind of get split close to the center. It's kind of like between that value divided by a square root of two or that value times the square root of two. And so you can see here in this example that if you set the size to one gigabyte, you're really not going to have any change in the number of shards or the size of the files until you get to that one gigabyte size. And in this particular case, you'll see that it sort of steps up in shard count, but the size of the files stay the same. That was an interesting way to do it, but it turned out that a little bit better way had a little bit of a more of a nuance here. And that had to do with adding a minimum SS table size and a growth factor. So by specifying a minimum SS table size and that growth factor, you could affect the shard count as the data set grew and so the data on each level could be evened out a little bit. And you can see that now you have one shard at a really small size and three more levels based on the math. The one thing I will point out is that you'll see now that growth factor comes into the last case, which is where many of the calculations are going to end up with based on the SS table size. And you have a factor of one minus lambda in that factor that you're doing for how many shards you're going to have. Who thought they were going to see math today? Okay, so here's that case, all right? We still have the target size set to one gigabyte. We've set the minimum size to 100 maybe bits, sorry. And in this particular case, for the graphs on the right-hand side, you have your lambda value set to 0.33. You'll see that you now have, as the shard count grows, you're actually growing the SS table size commensurately. So if you use a value for lambda of 0, then the SS tables don't grow. If you use a value of one, the shard count is going to get fixed. And it turns out that if you use lambda of 0.5, you're going to get the shard count and the SS table size growing at the same rate. Okay, let's go through an example. So here we're going to set a base SS table size of 50 maybe bits. We have scaling factors. You'll notice I have, it's tiered at the lower levels, and then it switches to leveled compaction. We have a target size of 100, which is twice that base size, and we set the base shard count to 1. Did I? Yeah, I hope that's right. Okay, so here we go. Compaction, we have four tables. Compaction is triggered. We calculate the shard count. You'll notice that we have a density of 400 maybe bits with a token range of 1 or 100%, so we're going to get four shards. Let's go ahead and write those in. They're each going to have a density of 400 maybe bits. And we come along, we delete the sources now. Okay, we've compacted into new tables. The shard boundaries don't matter now that we're done with compaction. We have a new set of sources come in. So here come some more SS tables. And this time we have them, they're 60 megabits. That means our density is going to work out to be 240 with a token range of 1 again. So we're going to end up with two shards of 120 each. Delete the sources, get rid of the shard boundaries, go on. All right, now we've actually hit, we've hit a condition which is going to cause us to do a compaction in L1 because we've set that at tier three. And as you'll see over here, basically on the, if you just think about the ones that are on the right hand side, you've already gone over that number. So what we're going to do is have to look at the overlap sections that occur between those SS tables. And in this case, you can think of, if they're over each other, they have overlap. So it's really easy to see that A and E have overlap, B and E have overlap, C and F and D, F and G. But D, F and G are what actually trigger the threshold here. Okay, so in this particular case, because F is actually involved in C as well, we extend the bucket to include C, D, F and G. And that means that we end up, if you add up all those numbers, you'll have 430 megabits in half the space, okay? So we end up with a density of 860 megabits and we're going to have eight shards. If you did those calculations, you'd have eight shards. So here we go. We end up with two SS tables that are 135 and we end up with two SS tables that are 80, okay? Remember that could be within a square root of 2 factor smaller or larger, all right? And that's in fact what happens here to get those four into it. And we delete the sources and the shards and the buckets no longer matter and we could continue that, of course, but I think this is where I finish that example. I just wanted to remind you that the thing that's so important here for those of you who are operators and are actually running this stuff, that because the strategy is stateless, you can switch at any time to a different target. The work that was already done will be beneficial even to that and that the splitting and the sharding is not effective when you make changes. So how do these compare? Well, with STCS, the buckets and levels end up with similarly sized SS tables, but UCS uses a predefined band of sizing. So the selection is more stable and predictable, whereas with STCS you end up with odd selections of buckets and spanning SS tables of wildly different sizes. They do have similar triggers, but with that additional work that was done, UCS is more efficient about doing it. When there are multiple choices for picking SS buckets within a bucket, picking SS tables within a bucket, STCS will always group by size, but UCS will group by time stamp, which can have some advantages. And UCS efficiently tracks the time order and whole table expiration, which STCS doesn't, which by the way, there's actually people who have been experimenting with using unified compaction strategy for time-windowed compaction strategy as well. For leveled compaction strategy, they end up with a very similar effect, as I said, but in unified compaction strategy, the SS tables are structured such that they can easily be switched to tiered. Oh, tiered, yeah, to the UCS flavor of tiered and changed with different parameters. And whereas the LCS SS tables are based on size only, the SS tables in UCS can handle the problem of space application by sharding on specific token boundaries. LCS splits SS tables based on a fixed size with boundaries that are usually following outside SS tables on the next level, which kick off compaction more commonly, and unified compaction strategy aids with much tighter right amplification control. So really UCS ends up using sort of a combination of both of those techniques, but does it in such a way that it is much more effective for you and uses less resources. That's really the important part. You know, you don't get to say this too often about Cassandra, about an upgrade, all right? Anybody who's ever upgraded from one to another. But in this video case, it really is easy. All you need to do is go in and change the factors in the Cassandra yaml, and you got to change, okay? The factors that you are changing are the scaling parameters, target SS table size, minimum SS table size, base shard count. I can't believe I have a typo and SS table growth. And I can tell you there is way more detail, like I said, if you want to go look at the docs about this topic. So thank you very much. I hope that that gave you some idea about how to use it. Do I get at least a B, Branamere? Yeah. Does anybody have any questions? Has anybody tried it? Oh, yeah. Go ahead. Oh, to migrate your actual cluster to 5.0 from, say, 4 or 3 or something? Well, you do have to be using 5.0 to get this compaction strategy. That's where it's introduced. Yeah. Good answer. When you're testing, always try one node, then do the rest. Has anybody tried UCS? Yeah? And what do you guys think so far? The numbers. Right. So, right. So just to reiterate, to summarize, basically it wasn't difficult to change from LCS to UCS. It was a little bit better. They wish they had some sort of auto ranging to set. And I know that that's been talked about as a future enhancement to this. So yeah. Answers for tombstones. Yeah. Yeah. Good. Okay. Well, thank you all for coming today.