 All right, so hello everyone. I'm Alexa. I work at SUSE on container runtime-based stuff But I also work with OCI, which is the OCI image spec. It's been mentioned a couple times yesterday Yeah, so today I'll be talking to you all about why container images in their current form to borrow phrase considered harmful and some things we can do with we can do about it so first of all, where can you get the stuff I'm talking about so Umachi, which is a tool I wrote that does OCI image building. It was mentioned yesterday You can get it from here and I'm going to give a demo later, which hopefully will work And there's a branch there with all the code so you can play around with it yourself and tell me when it breaks. I Have a blog post that I published earlier this year about Problems with TAR specifically in relation to container images. I'm going to go through a fair bit of it today But it's like 20 pages long and I can't really express that in 30 minutes So if you would like to go and read me going slowly insane, you can go look it up there And yeah, you can get my talks later from my github So the first point I guess is like a theoretical question about what is the best possible image format We can imagine what what are the features we want? What are the things we want it to have? And this is just a list that I've come up with which I think sort of summarizes what most people want The first one is deduplication. This is sort of the big thing You don't want to have to constantly and this is a both transfer and storage So you don't want to have to be downloading stuff You already have on your machine or on your server and you also don't want to be Wasting disk space having copies of files that you already have a representation of on your disk You wanted to be parallelizable Okay, you wanted to be parallelizable So you wanted to be the ability of actually to both Download many different things at the same time as well as Extract them or otherwise represent them on the file system in parallel You want it to be reproducible because if you have two different tools that build the same image or build an image That is very similar. You want it to be that this deduplication oops that this deduplication will deduplicate between them even though they've never seen each other Not avalanching effectively. This is I'm borrowing a term for cryptography here with avalanching like hash functions effectively you wanted to be Sorry Scripping up against my beard Yeah, you wanted to be not a non avalanching So in other words if you make a small change enough in a single file that a container image You don't have to have to make an entire copy of the of the whole thing all over again And transparency which is related to you know, what is actually inside a container image? so Let's just this is what a container image looks like An OCI container image the same applies to docker to docker images The formats are basically identical minus a few changes and the main thing we're going to be talking about today is this part here these layers So I mean, I'm sure you're all aware of the fact that container images have layers in them This is sort of the first thing you see when you first run docker build or docker run where you download all these different things And all of everything one of these layers is a separate tar archive of a root file system that is then applied in order so What's wrong with tar? There are a couple of things wrong with it So if we go back to this ideal image format, so this is everything you want. This is what tar gives us So basically none of it none of these features that we would like are actually present in tar archives especially with the way that we use them and This is the part where I'm going to tell you all about all the various problems you current into a tar archives So the first one is let's imagine you have a container where you have a large file 10 gig file You touch the file so you update this metadata But you haven't touched the data of the file you would assume that well Okay, this layer has to have this metadata in it But you wouldn't you wouldn't copy the entire filing to both layers Well, you do actually you because tar does not support the concept of a metadata Entry it only supports full files You have to make an entire copy of the 10 gig data So now you've doubled the size of your image or doubled the size of this file in in the second layer even though you haven't changed it And the other thing is that when you extract these layers, there's no way of well Currently we don't have we don't have this optimization and it would be quite difficult to do But effectively you extract the 10 gig archive and then you re-extract it a second time Completely pointless there. I do you already have the whole thing. Why would you need to re-extract it twice? But okay, this is something which and this is an argument you can cut into is like if you look into all of the various tar Extensions, there is an extension which you in theory could use to solve this problem But then you run to this sorry and the other there's a second problem, which is If you delete a file If you delete a file you don't modify the original layer because this would be bad So instead what you do is you put like a tombstone in the new in the new layer Which means that if you delete a file and container image not only does it container image not get smaller It actually gets bigger it gets slightly bigger by whatever the size of an of a tar archives Head of header section is which and I'm not I'm not pranking you. That's like that's actually how it works Oh as an aside, this means that you can't create a file called dot wh dot Anything inside an OCI image or inside a docker image. In fact, go you try this after the talk run docker Touch dot W dot wh dot foo then commit the image and then run it again. You'll see that your files disappeared It's because this this is not supported Yeah, which is which is very fun. So okay The other problem is that let's imagine you modify the file slightly. It's a 10 gig file. You flip a bit You add a little bit of data to it Now you have to now even with the metadata extensions that some that some tar format support There's no way to solve this problem within tar because well You have to have the entire thing because there is no way to express listen I already have this file, but add this little bit to it. Here's the diff from the previous version Another problem is let's imagine that we have two versions of some image You've downloaded this one already and then you download this other one You have to redownload the entire thing because the whole thing is one tar blob There's not really much you can do about it You have to get a whole copy of the entire thing even though only a couple of Binaries have been added. You still have to download the entire thing. Oh And also if you if you extract these These are two different thing Archives they get extracted separately. So there's no duplication on the storage side So this is this is duplication on the transfer side, which is that your your downloading stuff You already have but then when you finally extract it You don't even get duplication on the storage side on on the file level So if you have two copies of bash, I mean how often does bash get updated in in Ubuntu, right? Or how often does ping get updated in Ubuntu? Not very often So it's it's a bit silly to have to not just redownload ping every time you download a new Ubuntu update Okay Yep, sorry Every time you download a new Ubuntu updates You have to redownload it and once you do the extraction on the storage side You've now stored twice as much needlessly And another thing is is that Different distributions ship similar things. This is a little bit. I'm being a little bit optimistic here In reality binary when you compile different binaries and different distributions They actually come out differently But there are there are files shared between distributions and it makes a little sense to redownload them each time you You you get a different image oh And so this is the and then also tower is not reproducible. So if you have Aside from the fact there is actually no no real spec for tar At least there's no spec for the tar that everyone uses in modern use If you get past that even if you use the same version there are all sorts of ways that you can get mess mess around with For instance, you can end up with the wrong order of entries inside a tar archive And even if you work around this for instance the go the go archive tar implementation extended attributes are a map and in go maps are not deterministically ordered when you iterate through them So when it actually gets splattered out as a tar archive, it's it's not in order So effectively if you redo this a couple of times you'll end up with different archives Even though Nothing has changed about it. And even if you were to iterate through the files system in order and all the rest of it So what's the alternative? So these are so these are the problems with our archives There are many many more if you'd like to ask me later. I would be very very happy to tell you over several beers Because I need to like express the anguish, but if we get past all of that. What is what is our alternative? So We take this existing structure. We have you know the index which has I forgot to explain this Index effectively is like is a An index effectively has it's when you open when you have like Ubuntu 18.04 This is what stores the tag So it has a link from a tag to a manifest the manifest stores your configuration Points to your configuration configuration includes stuff like I don't know the working directory What user you're running as these are the things and then it has this sort of layers These are all content addressable So they're all by hashes which would make you think that you get the duplication and you do get the duplication for each For each blob, but you don't get it anywhere else So the idea is is to go from this to this so we still keep the index We still see keep the config, but rather than having these layers. We instead expose All of the inode structures. So in other words all of the files and directories to OCI and then we have each File gets effectively chunked up and I can go into what type of chunking The whole chunking topic, but effectively you end up with a blob for each file or if the file is really big You end up with a bunch of different blobs such that if you change back to the 10 gig use case If you change one byte inside a 10 gig file, you don't have another 10 gig blob needlessly inside your image store and Yeah, and so this is and then this is not avalanching So if you change a single chunk then yeah, you have to regenerate this this root blob Which is a JSON well currently a JSON we can have a discussion later. It's it's a JSON which has pointers to all this stuff Yeah, the chunk gets updated. Yeah, the root gets updated after update these things But you don't have to you don't have to have an entire copy of of zsh of ping and everything else inside your image And this also helps on the download side as well So inside this I node structure, we have the type the metadata and then we have like inline data So for inline data what we would do is this is the thing like sim links It wouldn't make sense to have an entirely new blob inside the console just will store just to store the contents of a sim link So we sort of inline this this is just like a map of strings to strings The same thing applies to like device devices like active devices You would store the major and minor number and all the rest of it and then indirect data is only used for files And this is where you point to chunks inside the store Okay, and this is actually in so then we can start playing around with some interesting optimizations so rather than having to do the current method which is that When you have a ociv one image or a docker image You have all these different layers the only way that you can extract these to create a root file system Is you extract each tar archive one by one and then you create your root file system? This means that if you have as I mentioned earlier if you have two layers that have The same file. It's the same hash You have to get make two copies of it's on the storage side. He doubled the storage size What we can do by having this this opaque structure or not sorry transparent structure Is that you can have a intermediate inode cache where effectively you have your your root with all your inodes in it You have all these blobs you then reconstruct You because you have this inode in the in the inline data for files We store the digest of the entire file so you can when you're extracting a file you can first check Do I already have the data on disk in this inode cache? And then you can ref link it into the root file system Which then gives you it gives you several things the first thing it gives is it gives you file bit File-based the duplication on the storage end of things which currently does not exist within within docker at all You can't do this unless you were to create a layer for each file which well some people might think of doing this But as far as I'm aware no one really does this But also but but you get it for everything and also because ref links don't require privileges This also lets you have as rulers containers or unprivileged and privileged users Can extract these file systems and they get duplication even though they they can't mount over lafs Because the traditional way of doing duplication is with over lafs Obviously, I'm privileged users unless you're an Ubuntu can't mount over lafs So this this gets past that problem, and it also gives you file based eat it, which is pretty neat So yeah, so it's time for me to give you a demo Yeah, so I did write this code just as an assistant advancing case it doesn't work I did write this code last night. So if if it let's let's hope this works. Can everyone see that? No My god, okay, let's you're you're all free for the next five minutes Background Sorry Yeah, hang on foreground Okay, well you can't read my okay, well Okay, let's just okay, this should be fine Okay, I'll find I'll switch to bash all right Okay, all right, whatever. Let's let's switch to bash. Oops. It's going well. All right, so Everyone can read that I hope okay good. So yeah, so here we have an OCI v1 image Yes, we have an OCI v1 image that has a couple of Ubuntu images inside it So this is and this this is actually two versions of 1904 with just a single package update effectively That I downloaded on like two different days So what I'm gonna do is I'm going to show you what it looks like when you do this snapshotting. So if I do a muchy unpack Never mind darn, but it's a it's a disorder doesn't matter. Okay, so pseudo muchy muchy unpack image OCI v1 for you Yeah, so I'm extracting what I'm doing here is is so much is a tool that lets you extract images It lets you do image manipulation. It's you can use it to build build tools What this is doing is that it's taking from this OCI image store. It's extracting the image that's called 1904 new and then it's putting it inside a an OCI runtime bundle With that name. So this contains the configuration which you would use with run C and then it has a root file system Which is what I've extracted as an aside it wanted doesn't set the the Make time of slash so it's we're in the 70s now and yeah, it has some other stuff that's not really interesting for this So yeah, so we have these two images and now we're going to snapshot them. So if I do a muchy So I'm gonna create a new A new like OCI stores, this is a separate authorizable store and then I am going to muchy to snapshot image of CIV to 1904 new and then I'm going to snapshot The root of s Okay, and so what this is done is that it's constructed if we go back to here. It's constructed It's constructed this entire root tree So it's what I what this just did is it constructed the entire root tree It constructed all these chunks put them all in the image store and and all the rest of it And if you look whoops if you look inside OCI v2 26 There's like 2000 blobs in here now because obviously each file has been expanded and all the rest of it So now if we then we can now do the same thing for whoops we can do the same thing for The old one as well and because they share Because they share OCI v2 because they share a lot of blobs actually so it should be noted These are uncompressed. I'm not compressing out of these blobs I didn't have time to lend the compression for it But these are uncompressed files and it has it takes up 86 megs a single lubuntu image when you extract the tar archive Is about 70 is about 70 megs So there's only been a 10 meg change in this entire image with OCI v1 If OCI wasn't if it wasn't compressive, you don't take into account of compression It would have doubled the size because you'd have to have a copy of the entire thing twice so Yeah, so now I can now Restore these images so I can unpack them. Oh Actually, I should probably show you so just to show you what it looks like. So if I now I'll go and look at Hang on pseudo jq dot P2 blobs 256 and then I will take a look at Take a look at this thing. Oh, sorry. Let's put that to less Okay, so so this is this is basically the roots the roots that we have here, right? So that's this thing the roots you can now see it here And it's a directory the metadata or inode mode or the rest of it and then For sim links so this is a sim link obviously the target is in line and if we go look at a file The file has it has the digest of the entire file stored inside it Which allows us to do this caching with file stores and it has This file is empty, let me look at a file that isn't empty Pearl Pearl isn't empty. Okay, so So, yeah, so we have this and then inside we this is chunked So these are two separate chunks that you can can they to get to the to get the the final image and you have Yeah, and then you have and then you have all the rest of it So now what I can do is I can now restore this So much e2 restore Image or civ To And what restore doesn't this is kind of neat. I'm surprised it worked. Well, I shouldn't say that loud I'll say that after it works. So Yeah, I've supported this this ref linking of file stores. So what you can now do is you can now have root of s1 Shoot on his root Okay Okay, so this is gonna take a second because it has to reconstruct everything has to get all these blobs It has to rebuild everything but the file store is now populated with all of the blobs from this image so if I now create a second root of s it's almost it's done almost instantly and The and there is so first of all, it's quicker because you don't have to open these files Read all of them and all the rest of it because you can just look at the digest and say oh wait I already have this iron over here. I'm just gonna ref link this one without having to to copy everything So it's nice because it's faster I don't know about you But I find pulling images to be like the slowest thing in the world Especially in Australia because you have to also download the thing and like the hundred meg images takes forever But secondly, it's it gives you duplication on the storage side of things So if I did this a bunch of times the storage overhead would effectively just be the overhead of the inodes because all the data is being duplicated and then if I have to extract The old one which only has like a hundred meg dip It also finishes pretty quickly because it because the inodes are the same that like half of well more than half You know 90% of the stuff isn't different between two different versions of Ubuntu So even though it had new inodes it reconstructed the file the file store dealt with it all and inside the file store is yet another Oh my god Is another content addressable store so it's you know It's just a bunch of content addressable bits and then and then these bits are being ref linked in And yeah, all good. So this this already gives you a lot of benefits obviously once this is both polished and used elsewhere This gives you a lot of benefits over over the existing tar base solutions But the next question is what is what is the next thing we can do? What is the next oops? Yep, okay, what is the next step we can do so one of the things that is sort of earnings a very very long-signing problem with container images is that You don't know what's inside it With with layers you have these tar archives and yes You could open the tar archive and you can pause through it and you can read out the contents and you can hash them and all the rest of it And then you could in principle compare to something else So image scanners do this it's quite intensive because you have to you have to read through tar archives every single time And there's there's also no ecosystem around distributions giving you like this file has this hash effectively so How do we get her how do we get around this because this store is entirely is entirely transparent to OCI and Because the digest of each file is stored inside it in principle and this is sort of what we get into hand-wavy territory because This is all stuff that's still in my head, but I'm hoping will work out is we can actually start looking at having effectively distributions or vendors shipping preferably signed obviously Manifest saying that live foobah version X is Contains these files with these digests There is a slight complication with with config files There are ways we can try to work around it But at least for at least for binaries It's the case that you could do this very trivially as well as you know base config files Where you say okay from vendor X live foobah blah has these files in it And then you can very very cheaply just iterate over this list of liners is just a list in well currently Jason But it can be whatever of serialization you like This you can just iterate through the list check does it does this exist? Does it have the right mode does the have the right digest and you never have to touch any of these chunks? Which means that scanners will will improve from this and also it's actually now possible for us to think about having an Ecosystem about distributions and other vendors shipping these types of manifests and linking it up And so on which I think which I think is going to be it would be quite exciting to see As as future work and obviously We'll take it will take even more work, but we'll see so what are the next steps So the next steps are that one of the things and this is this is where I slightly cheated so One of the problems so the first thing is we need to reduce the size of transfers So if you look at the OCI v2 The OCA to image it's only 86 mags even though it has two images in it And if you compare this the OCV one one, it's it's bigger But the OCV OCI v1 one is compressed remember so if we were to do compression on this It would be better But the main problem actually is that As I mentioned before the deduplication you get from chunk based Ddupe on a file level You would especially when you're downloading images from different vendors and different distributions It's actually quite surprising how little similarities there are between distributions and a lot of cases This is something I ran into last week and I was like desperately trying to come up with a solution for it Obviously, I'm putting it in next steps because I didn't come up with a solution for it But one of the possible solutions and this is actually I hope to speak to learn out about this afterwards is we could we could think about Decoupling the way that the storage works because this this store where you have a Single file that has all the data in it and it has this digest information is very very useful to make the optimizations I mentioned, but it's not But for transfers it's not ideal aside from the fact that obviously when you transferring images if you have again One Ubuntu image has like 2000 blobs in it Currently image registries are already struggling with one blob for the entire thing You would have having 2000 round trips each time you want to download an images a little bit extreme So we would need to think about whether we can come up with a different different protocol for doing the transfers The other thing is is yeah, the bill of material stuff is is very much hand wavy I have no idea how this would actually work in real life and what type of concessions we would have to make So we need to sit down and design it hopefully folks will come up with ideas afterwards that they can tell me about and This is all obviously this is all like experimental branch of a muchy There is absolutely no spec document whatsoever. I rage through this like at 10 p.m. Yesterday, so I Can't really show you exactly what the spectrum looked like Especially because there are probably more things that people want or want to put in and how we how do we handle this and currently? It's all in JSON and JSON has It has problems, but these these things we can discuss And finally get everyone to switch and this is this is sort of the biggest problem to be honest All this other stuff is just sort of work that someone could do and and it could get done But switching is sort of the really painful part because obviously right now people are still using they're not even using OCI images In most cases, they're still using Docker images with even though they're very similar They're not exactly the same and these improvements while they at least in my opinion I think they would help a lot of people improve their efficiency of container on times when they deal with images You have to get people to switch to OCI v2 And I mean if it's difficult to get people to switch to OCI v1, which has basically the same structure Almost down to down to the last detail. I can only imagine how complicated complicated It's going to be to get OCI v2 going But yeah, I think I think I finished a bit early. So I I have any questions I can go through the chunking stuff if anyone's interested about that, but yes, okay I'll go through the chunking stuff then. All right, so We'll do questions in a second So the problem with the problem is storing storing whole files and this this is actually interesting is the different choices You can make to solve this problem. So as as you might have seen As you might have seen when you were looking at let's go look at Perl again You'll notice that These two chunks are not the same size There's also that they're basically a random size if you look at the other one Yeah, these are these well, they're both Perl doesn't matter the point is is that these are these are different These are different sizes and the reason why you you don't want to do fixed-size chunking is this for this So if you imagine a file And if you were to do fixed-size chunking you so you split it up at 4k sizes This actually gives you as an aside the ref linking stuff I was doing if we could do this easily then you would get even better deduplication because You could deduplicate not just on a file level, but you could deduplicate on like the 4k sector size level because on Linux There is a limitation with the underlying The underlying system that the reflinks uses which is that you can't because fastons are extent-based You can't like deduplicate half of an extent. It has to be Extent-sized effectively. This is this limitation extents of 4k on Linux and I looked into it. You can't make them one bite. This was an idea would be awful to do but it's actually not possible In case any of you are wondering whether this is possible But yeah, so if you were to cut it up in 4k sector sizes, this would help us with the deduplication But you run into this problem So okay, let's we have foo inside this large file and if we swap the foo to a bar all good, right? Great. We have it in the same sector. No big problem The problem comes when you have an insertion or a deletion So if you insert Baz in the middle of this file Not only does that sector get invalidated, but the rest of it gets invalidated as well So effectively you've run into an immediate problem where any insertions and obviously Most times you added a config file or modify config file. These are insertions not replacements So you would run into this problem very very quickly when it comes to yay We're duplicated, but we only duplicated the first third of the file in most cases This doesn't work. So Yeah, so the solution is actually is a very very old solution to this problem Which is content-defined chunking so rather than chunking by fixed sizes we chunk based on the contents themselves so there is there is a paper by By Michael Rabin using Rabin fingerprints guess what the name came from To basically you can fingerprint to the data you can and then you can do content-defined chunking And so the way this works is it's basically like it's a mix of rolling hashing and like Bitcoin or number Wang Whichever one you prefer effectively you have you you take the entire file and Then you chunk it up this way and the way you determine where the chunk boundaries are is that effectively you check whether or not the last 64 bytes or whatever the window size is whether or not the Hash of this using you using Rabin fingerprinting whether the fingerprint is less than a certain number So whether it has a certain bits zero which is basically how Bitcoin works Well, let's not get into that. The point is it is a very similar idea Like effectively you decide the chunks this way and then as if you have as long as those last 64 bytes don't get affected Then insertions and delete so obviously changes are as normal, but insertions The boundary moves along with the insertion and if the last bit were to get modified Then you would invalidate to chunk to chunks, but that wouldn't be too bad I mean even in the worst case, you're not you're not validating the entire file But yeah, so that's that's basically how chunking works Sorry Yeah, or if it's modified so if the boundary is modified then it won't hash the same thing But the next boundary will still hash the same thing So you've invalidated this one and the next one But the rest are unless obviously the insertion is added a boundary extra there are like lots of cases where this can where this can break but In most cases it's not be quite statistical, but in most cases it does work out and this this idea I stole Actually the entire idea for this for this design I stole from a backup tool called rustic that I use they have a basically the exact same design for all of this I think they don't they don't have a linear list. They have it as like a tree of blobs This is actually my I was my first attempt at this and the last time I gave this talk My idea was to do it using a tree of blobs where you have like oh a directory has children that you point to Turns out if you run this on like fedora if you do this snapshotting on fedora There are more metadata blobs in other words more blobs that describe Directories and directories don't have any contents other than their children which you can figure out from their path name There are more of those blobs than there are of actual data Which obviously is given the other problems with with lots of blobs is a bit wasteful, so um, but it's basically the same idea anyway, so That's basically it for this any questions Your refling trick won't necessarily work if things are compressed on disk Can you talk a little bit about what your vision is for compression in on the storage format once you've downloaded the image? Yes, so um, so there is yes, so there are actually There are several ways of handling this One of them so right now because we do the chunking inside the OCI spec You have to have the second inode cache and for compression for transfers if it were to use OCI regular transfer We would have to do compression on the blobs because otherwise you just end up like we lose immediately to just our archives Which is not a good place to be it doesn't feel very nice when you're losing to a format that has so many problems but The solution to this actually might be If we were to actually abandon the idea of having the blobs we have in OCI be used as a transfer medium So in other words, you you actually would store the entire thing So in other words, you could merge the inode cache with with this where rather than pointing to a bunch of chunks You just point to one blob and then on the transfer side We use something like ca-sync and all these other things which on the transfer side can can help us with With doing all the chunking because really because we can't chunk on the file system because we don't because we don't have fix size fixed size chunks We can get we don't need really in inside this image store to do it Especially because and this is also did a little secret is that the inode cache has to be a full copy So that this actually doubles the storage size because you have one copy for OCI and then one copy for the image Oh for the inode cache the nice thing with this model is you can reuse the image store for different storage drivers But if you merge the inode cache then you would have to have a copy of the image in each storage driver But yeah, my idea actually is that the best way to do this might be to merge the two and then refling directly from The OCI store into the root file system and then on the transfer side we do we do chunking Make sense Well, okay, you're not happy, but I'm sure you'll let me know about that later Could you talk about wondering the file system for starting a container? Do you think it's a good idea to use fuse or something like that? Sorry, thank you said again. Do you think it's a good idea to use a fuse? okay just it well oh right as in like use fuse the root file system and then open the the OCI blobs like that way without having this this refling store Yeah, this is something that I have considered the main well I've been told that fuse performance is getting better this and this is this is It would be doable you could do it and and I think that it's if this is something that people are interested in doing It would be definitely something you could do, but I think that I Think I think you might be I think you can design something that has That works for this mode that also works fine for the fuse mode and actually Tyco the guy who just asked the question has this incredibly crazy idea of having like you can mount like have an internal driver that mounts this thing and Then does it without fuse at all, which is extra crazy and but this would but the point is is that the design? At least as far as I can tell would still work either way even if you decide to use fuse or use this raffling model and the nice thing about the well These days you do have you can diffuse on privilege, but If we were speaking a year ago, I could tell you that this this allows you to do it and privilege as well But yeah Hey Yes, this looks like a this looks a lot like a kid. Have you thought of that? Yes Yeah, it does though I Don't actually have a counter argument against that it doesn't like it right it looks very much like it Basically, this is a consequence of OCI basically taking the same idea the kid has which is the concentration will store and all the rest of it I actually looked into git pack files to see whether or not we could reuse this for distribution As far as I can tell I mean obviously I've looked at it for like a bit but as far as I can tell the Optimization that the git deals with don't really apply to what we need for instance get you generally I would hope no one checks in like a copy of pearl into their git tree I'm sure someone does but but assuming you don't do that, right? Oh, I'm assuming you're well more importantly. I hope you don't like git commit an entire like sort of Distributions like and all its files Yeah, okay, well Alright, well, maybe get good Okay You know what I said? Yes, it is like it and and and I have looked at actually I have looked at what git does for various things to see whether we can we can take improvements some but yeah So did you play with different chunk sizes? Different chunk sizes. Yes, I tried it. Well, I did try it very different chunk sizes. So Yeah, and I actually tried different chunking algorithms. So for instance the async uses Buzz hash, which is a different chunking which is different Fast fingerprinting or hashing algorithm that lets you you do a very similar trick As far as I can tell you don't get I Have tried for a couple of different chunk sizes and each time the the like inter Distribution duplication is not very good as far as I can tell it might be that That if you had I don't know if it was the case that all the files are compacted together Maybe it would be better, but I don't know I this is something I'm entirely sure about because my first impression was oh if I play with the chunk sizes The chunk size is too big or too small. Maybe if I make it smaller, it'll it'll de-duplicate better between distributions That sort of works, but it doesn't give you the benefits. You would think it does effectively but yeah That's it. Okay, you can ask me outside Okay, thank you. I know there was more question