 Good morning. Glad to see you. Glad to see you. Glad you could come. I'm going to talk about how to estimate what you're going to get from VDO. Does everybody know what VDO is? All right. So VDO is a device mapper target, DMVDO, that as such maps a logical address space into physical storage on an underlying storage. And while it's doing that, it also provides inline block by block de-duplication and compression. And so as a result of that, you get, you know, it can be thinly provisioned. But then the question is how thinly provisioned? What kind of storage savings am I going to get if I put my data set on a VDO device? So I'll talk about how VDO actually works. So you'll be able to understand how to run the tool to estimate the storage savings and what it means. So the VDO device consists of two major pieces. There's a block map that manages the allocation of the storage on the underlying device and manages the mapping from the logical block address space to the physical storage. And as I said, while it's doing that, it's de-duplicating and compressing as it can. And of course, it's a many to one map. And that's in the, this is actually in two separate modules. That's the block map in its associated machinery is in the KVDO module. And then the other big piece is the de-duplication index. And that's what keeps track of where the duplication of data actually lies. And that's the UDS module. It's a separate module. So conceptually, how does this work? Conceptually, and this is greatly simplified, when a block is written to a logical address, a logical block, the first thing that VDO does is it computes a signature for that block of data to recognize it. And it uses the murmur hash 3, which is a very fast but very effective hashing function. It queries the UDS index for that signature to see if we've seen that block of data before. If it's, if the block is found, if it's a block we've already seen, it adjusts the block map to indicate that that logical block is already, the data is already on a physical block. And if the signature is not found, it stores and allocates a new block from the underlying storage, writes the data to that block, possibly compressing it on the way with LZ4, and then adjusts the block map to indicate that that logical block is, where that logical block is, is stored. And in your case it updates the index. And so what's in that index? It, at heart the UDS index is a fixed size key value store where the keys are the block signatures that were computed for the data blocks, and the values are hints to the block map for where to find the, find the block on the underlying storage. And it also has a extremely efficient index into the, into the key value store, and that's the, that's the real secret, that's the magic of it. The, that mask, the master index, and the most recently used part of the key value store are memory resident, and the rest of the key value store is on a backing store. The, I'm being deliberately vague about what most recently used part is, but I'm not going to give like percentages or numbers, but the, the memory footprint as well as the backing store are fixed size, and the size is fixed at creation time and can't be, can't be updated, can't be changed, not easily anyway. So it, it's sized by the memory footprint. So each gigabyte of memory footprint represents 10 gigabytes of, roughly, of the backing store, key value store on the backing store, and that gets you about a 10 terabytes of, that map, that records about, up to about 10 terabytes of 4K blocks per gigabyte of memory footprint. So the memory footprint is set when the UDS index is created. So one of the options when creating a VDO device is to set the memory footprint, and hence the, the, the size of the index. That gives you the deduplication. That's the, that's the deduplication window, the number of, the number of hashes that you can, that you can store in there. And that shows you sort of how far back in time you can look for duplication. And it's stored in MRU order, and I'll get back to that in a moment. In the, and as used by VDO, the backing store, the fixed size piece of backing store, is on a contiguous piece of the underlying storage that VDO allocates, along with its own, with the block maps metadata. So, some of the key features, key features of the UDS index are that it is, it's very decoupled from the block map. And it's in a separate, it's even in a separate kernel module. It's decoupled from the block map. It's advisory rather than definitive, so it's not closely tied to the block map. The block map part doesn't depend on it particularly. So that, for instance, if the system is heavily loaded, the index won't be a road block or a bottleneck in the, the right path. Among other things, this means that it has a very skinny API. There's only a handful of calls to basically start and stop indexing and request, you know, feed it hashes to look up. It also has very minimal interfaces to the system services that it needs. It needs to read and write its backing store because only part of it is in memory at any given time. It has, you know, some threading for performance, so it has to manage threads and it does, it just has some timekeeping. So the interfaces to those things are very, are very small, very slim. The very mathematical part in the middle is written in a very standard C. It's, you know, it's very portable, very, very standard, you know, vanilla C. The bottom line, literally, is that with very minor changes it can be built either as a kernel module or as a user space library. And of course, that's what we're going to exploit in a moment. When you, so to, to back up for a second, when you create the UDS index, or in particular when you are creating a VDO device, you can specify, there's a lot of things you can specify for the, there are several things you can specify for the, to size the, the deduplication index for that VDO volume. This depends on some convenient properties of real world datasets. There is temporal locality, which means that a block is, a block that's written is much more likely to be a duplicate of a block that was written recently, as opposed to a block that was written a long time ago. I mean, if it back upsets a duplicate block in a backup, it's much more likely to be a duplicate of something that was backed up yesterday than something that was, say, backed up last year and was deleted and hasn't been, you know, in a backup set since. So this is, that's one of the reasons why a fixed size works well. Because, and also as I mentioned, I think I mentioned the key value store is stored in MRU order. So whenever, whenever a hash is added or found, it, it moves that to the front of the, the front of the index. So the things that, it doesn't matter very much if entries roll off the end of the index, because those are the things that are extremely unlikely to ever be seen again anyway. And since it's merely advisory to its advice, I don't want to, merely is probably the wrong word. If, since it's advisory to the KVDO, there's no harm to the, to VDO or KVDO or the block map to write another, another copy of the data, you know, you might miss a tiny bit of, of deduplication, but there's no correctness harm in it. So, so, so that's a factor in, in setting the index size, the, the temporal locality of the duplication. The other thing is there's a sequential or spatial locality that's often found in, in real world data sets. And that means that if, if one block is a duplicate, chances are a whole lot of blocks around it are also duplicates. So think, you know, big backup sets or container images, VM images, things like that. If you find one duplicate block, chances are there's a whole slew of duplicate blocks nearby. What that means is that the, not every, not every hash has to be in the, in the master index. So if a, if only a fraction of the hashes are in the master index, once one duplicate is found in the master index, that will now be part of the most recently used part of the, of the key value store. It'll be in, it'll be in the memory resident part. And those, the remaining, the, the near, a lot of the nearby duplicate blocks will be found in, in memory already without having to go through the master index. So one of the options when you create the, the UDS, in other words, when you, when you create a video volume is to specify a sparse index, and that's what that means. So a sparse index has all of the key value pairs in the key value store, but only if a selection of them are in the master index. And so you get, you could get, you might get 10 times the deduplication window for the same size memory footprint or, or alternatively, you know, the same, the same deduplication window with a smaller memory footprint. But bear in mind the backing store is, is, footprint is still going to, is going to be the size of the whole deduplication window. So, so what can we do to estimate the storage savings that we're going to get and try to maybe tune some of these parameters a little bit? Well, I said the UDS can be built as a user space library. The murmur hash 3 is also very, you know, it's mathematical standard stuff. So it compiles nicely as a kernel object or a user space object. And similarly with the LZ4 compression that VDO uses. So the estimator is built against the UDS library. The user space murmur hash 3 object in the user space LZ4 library. So it's using the exact same, the exact same code that VDO is using. So what it does is it basically does what KVDO does up to the point of, you know, writing to storage and keeping a block map. So we don't need that for this. So for each, so if you, you can run it over a directory tree. So for each directory it will read each file, for each file it will read each block. For each block it will do just what VDO does. It will compute a murmur hash 3 signature. It query the index. If the, if the hash is not found in the index it's a new block. So we treat it like VDO would treat a new block. We check to see if it's compressible by LZ4. And while we're doing this we're tallying up the results. So it keeps, you know, keeps a running count of, you know, files, files, red, blocks, red, blocks duplicate and, you know, how compressible, how much compression is available. And then it prints out a summary at the end. So here's our command line. So, so now let's run it and see what happens. The, the, my, when I implement a command line utility the first thing I implement is the help option because I can't remember what I'm doing from one minute to the next. So, so there's the help. The first two options up there allow it to just, just compute the results from compression and just compute the results from deduplication because it can sometimes be hard to tease those apart. The index option is mandatory because of course in VDO the VDO hands it its backing store in user space, the user space utility, you have to give it a file to tell it where to put a file, where to put its backing store, a file to put its backing store and please don't put it on the file system that you're, that you are estimating because it will just be confusion. And then the two interesting, the two more interesting options, the memory size that's again specifying the size of the index by its memory footprint and the sparse option which, you know, again, whether to use a sparse index or a dense index. So here are a couple of examples. In a slash U2 workspace I have about 50 checked out source names, mostly UDS and KVDO source plus all of their, all the test harnesses and test infrastructure and probably more log files than my management thinks I should keep on there. So I ran the estimator over it and bolded things that it found it was almost half of it was duplicate stuff because of course if you've got, even if you have different branches of VDO checked out, chances are most of the files are the same. Also all those text files. So did you percentage means how many files are replicated? Well it means how much it, how much, it's actually, no, it's how much it would shrink by deduplication. So that's how much reduction you get. It's confusing. And so I could, if I moved that to a VDO device I could put it in a little over one fourth of the storage. And yet probably make my management happier. So I tried that. I copied it into a VDO device and then I said ask VDO stats. That's a selection from the VDO stats command. And I'll be darned. VDO stats, actually if you multiply the logical blocks used by 4K, it comes out to be just a little bit more than the 86 gigabytes of, that were scanned. And that's because the estimator does not count the file system metadata. But obviously the VDO device does because it's written on it. So it came out, it was kind of surprising. It actually came out to be just about the 73 percent that the estimator guessed. So I think this is a good, that big, big thing full of sources and log file, text files is probably a good candidate for, for a VDO. I tried another thing too. Slash U2 slash exchange is a bunch of backups of, of a mail server. And this is about two and change terabytes of backups. And evidently, it doesn't compress because evidently the contents are compressed but the, a lot of the backups are backing up the same file day after day apparently because the duplication shrinks it by 8, almost a factor 10. And then I tried the same thing with a sparse index. I tried, I estimated the exchange backups with a sparse index. And notice I lost about 2 percent of the duplication. So these things, also these backup files are huge. So there's probably a lot of, they probably have very good spatial locality. So it doesn't need all the, everything in the, in the master index. It'll find most of the duplication. And the Red Hat Enterprise Linux storage guide has some real good guidelines for, for configuring a VDO and including configuring the UDS index size. So I would suggest starting there, starting with those guidelines, using those as the arguments to the estimator, run the estimator. And then you can adjust the sparseness and the size of the index and see if, you know, see if you can get the same duplication with, you know, with a smaller index or get a bigger, either way, right? Either get a, the same duplication with a smaller index or get more with a, you know, or get the same with a sparse or nearly the same. So it's on, it's on GitHub. It's open source. It's hanging around there. Any questions? Sir? Well, I can't expect the 90M video under a mainline kernel, because it's part of the Red Hat kernel. But, like for Fedora, my link is right there. Sorry. So the question is about getting into the mainline kernel and I can't, I don't know the answer to that. There's, we are, we're working on it. We have a parallel development going on to, you know, make it kernel friendly or something like that. But yeah, right now it can be kind of a pain to, you have to go, if you want to run on Fedora, you have to go get it from GitHub and build it. I think actually the risk of revealing important, I mean, we've had meetings. We're having meetings about, you know, about this. But beyond that I can't say much. You're going to get a copper as well. Pardon? Oh, yes. Oh, thank you, Andy. Copper, it's in a copper repository as well. Was there a question in the back? For me. Oh, okay. Oh boy. Is there any chance that you guys could essentially, you know, long-term future integrate the BDO stuff into say, so like when you create an LV device or add like a physical volume for BG, it transparently, like there would be an option to just say, hey, you know, optimize this for BDO as well. And LV just takes care of it for you. So the question is about adding it to LVM. I'm not sure about optimizing. There's ongoing effort to integrate it. Integrate the video with LVM. Oh, really? Yes. Sorry, that's the question again. You are saying that video is a video of creating blocks. And it means blocks on that one. Yes, yes. And it's four K blocks. It's highly optimized for, video is highly optimized for deduplicating and compressing on four K blocks. And it, let's say, it can go less than wonderfully if you use, try to use it with things that have a different block size. Thanks. So this, we'll scan files. As a matter of fact, I didn't mention that. Yes, if you, on the, so the question, the question was about scanning other things like such as whole devices as opposed to, as well as directory trees. And yes, if you call it with a, just like dev SDA as the argument, it will in fact scan the block device block by block. So yeah, if you have some storage that's whole device based, this will scan that as well. Who? For example, if you compress four K block to three K block, what does it do? Ah, so okay, so the question was about the compression. The compression is done at the four K block level. We're going to throw us out in a moment. The compression is done at the four K block level. So it doesn't bother, the video doesn't bother with compression unless it compresses to less than half a block. And the estimator takes that into account. It does the same thing a little bit. Yes, yes, if it compresses to two K, it will shove those two K block. And the block map accounts for that. The block map may point to a physical block or a compressed fragment of the physical block. So, thank you very much. I'll be around. I can answer anything else. Thank you.