 And this talk, this is pulled from a past talk that I've done quite a while ago. So I'm not going to, I'm going to blaze through a lot of this. You can go find the slides online later. There's links and or demos that you can see and try for yourself or have more information on. This is only a 15 minute talk. So there's more content here than 15 minutes. I'm going to blaze through that. So yeah, pass the file system validation. This is something that came out of an assumption for a lot of the container images. There are things that you can do to make image, you know, make file systems read only. And it was always a rat hole of a conversation because eventually somebody would say, well, you can mount it read only. You could do this other thing. Or you might have a copy on write file system. Or what if it corrupts? What if somebody does cat-dac override and they start going around you and you're like, like OS tree tries to solve certain things and read only ways. And at a certain point you just have to make an assumption that it's either good enough for our use case or if there's like a break glass scenario that you can just like wipe the file system and let the imagery pull itself or something like that. How to verify that what you have on the file system is what you expect to have on the file system. And so some folks would end up having the conversation of, you know, DM Verity and these other things, but they're like pretty tailored solutions. At the end of the day, you had to have some kind of a way to package. Some way to have content addressability when you pushed and pulled those packages around dealing with compression, which is a pain in the ass. Anyhow, all these different pieces, but still do it in the most generic, like the most common denominator approach. So I think that's all of them. Anyhow, it's crazy when you deal with these packages and different packages try and solve this in different ways. Everybody in the world has seen like a SUMS file. You download some kind of content as a SUMS file and you have, you know, MD5 back in the day and then SHA-1, SHA-2, SHA-56, SHA-512 and you have to have all these different IDs of something depending on what's most efficient for your machine to compute or what's irrelevant these days. And, you know, different formats. We tried CPIO for a little while and it was cool. It's under a similar POSIX standard as tar archives are, but neither one of them accommodate, you know, X adders and, well, file capabilities part of X adders. Anyhow, like long file names and like just terrible stuff, they're all, they just keep recirculating terrible stuff. And even with container images, we're pushing around like 500 meg gigabyte images in their tar archives, several of them together. And sometimes it's so mind-bogglingly stupid, like you will make a, you will change the owner of a file and it will copy up, you know, a one gig file or something like that, like it's terrible. And so now you have these bundles that are getting pushed around. They have to have some kind of a content address ability, like an ID that you're pushing and pulling them around. And for folks in here that have, who in here has used Docker either before Docker 1.8 or Docker 1.10, really. Docker 1.8 folks, about half of you. Who knows that the Docker ID that you had, it hasn't changed for a lot of people and now they call it digest, which I'm glad that kind of disambiguated that they're two different things, but that the Docker ID from back in the day, it was a SHA 256 number before Docker 1.8 was not content addressable. There was absolutely no part of pushing, pulling, building, having any hopes that the ID that you got is the same that's on disk before Docker 1.8 was garbage. And every single time you pushed and pulled the image, it mutated. It was actually a different tar archive every single time you pushed it. So having any kind of like infrastructure for signing and assurance model was out the window and it's a, anyhow. And so even further in this content address ability is that once we started getting reproducible tar archives in the workflow so that we could have content address ability and build something like a digest, the things that some system started choosing was the content addressable address of the compressed archive, which was a, is a terrible decision. Like if you're ever in this space, I have ideas, but they're terrible ideas that always take the content addressable ID of the uncompressed piece. Do yourself that service because in that situation, it was relying on Golang at a particular compression level and the way that the Golang does the deflate and inflate, not Merkel, yeah, the Huffman trees, Huffman trees, the cascading trees of the words, it was consistent to itself, but it was inconsistent with GNU or some of the other ones, like there was two or three Zlib, all three of them were inconsistent with each other. And then often people would make certain types of like optimizations, like pass, especially if you were using Zlib. If you passed any of the flag parameters to do optimizations, then it might make your GZipped archive inconsistent with itself from one compression to the next. It would be inconsistent. And that's not even to say like, oh, we've moved to XZ, BZIP, LZ, LZMA, whatever. It's garbage now. So it's, there's some examples here where you can actually see, you have to do, even for GZIP from the command line, you have to do dash in because in the GZIP header, it adds epoch. So one time to the next, it would be different, but if you do dash in, it doesn't include epoch. So you could have some consistency within itself. Real, real exciting stuff here. Zlib, this is using Ruby, but still Huffman only. You have to add that flag, otherwise it's inconsistent with itself every single time. Golang, it's terrible. Well, even at Golang at some point, like we were like, cool, it's consistent to itself. Well, at some point they made optimizations for like architecture dependent optimizations for go, and it broke all the compression. And so every Docker registry out there that had these one gig images, all of that cache was invalidated. Like you'd go to pull an image and it would fetch the new image, checks on it, and it got a different checks on. So all these registries were immediately having to like rebuild all the images and reflush like, oh my God, that was fantastic. So verifying at rest, regardless of like trying to make it efficient for whatever your back end is, whether you are rsyncing files around, gosh, some folks were using like iSCSI mounted over NBD to try and have like arrangements, but you still have to have this content addressable piece, as well as something like RPMQ capital V. How many folks ever touched that? Dan. It tries to show that what's on disk is different from the RPM that you originally installed, like has somebody tampered with some of the files, and so you can actually get some kind of an assurance model. Obviously, if they're at that level anyways, they could have just modified the RPM database, whatever. But DM Verity ends up mostly being like an Android use case, but there's hopes that it could mount and do these like read only like with absolute certainty that the file system is exactly at the state you expected. There's no corruption. There's no root kit, whatever. There's been some IBM research happening forever in the IMA EVM world where you can have keys loaded into the key, you know, key cuddle of the kernel internal key rings potentially have like namespaces with their own keys, and within that namespace, you can only see the things that you're your unshared space can have access to with the keys that are loaded for it. And even if you escaped and got into another namespace that you couldn't see those files because the kernel wouldn't let you read them because they're not in your key chain. Cool. We're not there yet. It's still happening. So a most passive, most generic approach at some point when I was trying to figure out how could we do this given all the file system approaches, given all the different package format styles was something I remembered from back in the BSD days was mTree and they used it largely just for like the file system layout piece of like them and their source trees and ports. So it's in free BSD, open BSD, stuff like that. There is a mTree port that was done for Linux and it's kind of kept up to the free BSD approach. Neither one of those accounts for extended attributes and some of the other, like some newer checksum, I think they've got up to five, shop off 12th now, but X adders was the biggest thing. And even though free BSDs has since added EXT adders into their file systems, they're not using it heavily so they haven't added support for it in their mTree. Those are some of the things that I wanted to tackle. So at some point I started, I set out and I write a lot in Golang. So I wrote a clone of, you know, a drop-in equivalent in Golang but it added support for a few different pieces like extended attributes. There are things available in the Linux space today. So if you DNF install BSD, BSD tar, you can actually say tar dash format mTree dash CF, you know, like you would make a file from this tree and the output file won't be a tar archive but it will be an mTree looking output. Leonard has CA sync, which has an mTree output. So you can either do an mTree of a directory which is similar to what Go mTree can just do like an mTree output of a directory. Or if you've made an index, like a CA IDX index, you can say give me the mTree of this backup that I did. It's kind of useful. And Umochi that I mentioned a second ago that can like manage container file systems. When you do an Umochi unpack, it actually creates an mTree of the expected output. So I've got like two minutes left. This is the Go mTree looking output. I don't think I included the X-datters in this example but the new mTree output has this notion of sets. So you can say in the following until I unset or until each individual file overrides it, expect these to be the defaults. And so then it will only have that keyword if it's different than the prior. And so then it walks through and it actually steps through directories. So these are children of the bend directory. And it's got an annotation format to, like this is actually angle bracket for the beginning part of test, the command that's square bracket. So it escapes that in its own way. And so you can see here that these files, even though it's on disk, that this is indicative that it has passed through a TAR archive at some point in its life because the nanoseconds have been trimmed off, TAR can't support nanoseconds. But you can get a timestamp of those things. And then the SHA-256 digest or whatever. CACync does more of the old style mTree approach where every line has a fully qualified path rather than stepping down into the children. Some things are not quite as compliant with the old approach. It's not as much an issue anymore. People know how to handle some of these characters. But for the BSD world, they considered use cases where it might pass through a system that can't handle all the character sets for God's knows reasons. Anyhow, but this is, you see this whole approach here. This file system also has passed through TAR archives at some point. Leonard introduced this, and it jammed me up when I first tried it out, but SHA-512 digest was a key word, but doing the SHA-512 trimmed to the 256 length was not a key word. So I added support for that in go mTree. But that's neat enough. This is just an example of doing scopio copy like I talked about in my prior talk about what Aaron Antonio talked about as well. Doing an emoji unpack of that image that it just fetched into a directory called busybox bundle. And then seeing that there is now the digest of that image.mtree is a file. And then you get that same output. So now you have something you can check. You can, even if the user mutated it, they can check that out against it later. I made go mTree be able to reproduce and check against TAR archives, not just at rest file systems. So you could know what you're about to validate before it's like in flight. You might have a workflow where you get an archive, check the signature, lay it on disk, and then have something to check later for an assurance step. So there's several more examples here. You can go through them all up, but libarchive or bsdtar, there's. So you could do that on most Linux systems and it does the old style approach, but it doesn't escape everything. So if you find yourself in any kind of an assurance model where this might be important to you, these are the kind of artifacts that in some of the work that I'm doing, I'm looking at including these so that you could have a sniff check of the contents of an image without having to download it and run an analyzer first. You could actually say like, this is the SHA 512 or 256 or 512 256 of the files that I know are bad or whatever, that you could actually sniff check an image, like do a remote inspect and look for that image or get the file system layout, interpret the file system tree before you ever unpack a TAR archive or whatever it might become one day, like peer to peer. The work that I'm doing is on this repo, obviously there's other things like CACyncMTree and otherwise. I do encourage to stay in a communication loop so that these, if you're either outputting or interpreting those manifests, they are compatible. Like I said, I've added support and I didn't have a picture up there but for extended attributes so that you could have SELinux context or whatever preserved on the files or otherwise. Because obviously if you set a file capability that makes a difference. If you've added Capsus admin or whatever to like user bend time, that's a problem. So that needs to fail a check. Yeah, and so then VBATS talks as these slides are up there as well. You can see those examples and try them out later. Do we have time for talks, any questions? So any questions from the audience? Have you? Yeah, it's on. Yeah, have you ever looked at FSVarity? FSVarity. Yeah, it's like the invert you went on the file level. Really? Yeah, it's pretty cool. It's like a work in progress from what's its name, Ted. Is it Ted doing it? No, I have not. I had only looked at the invariate and ruled out that it's a narrow use case. Basically you write down the content of a file and then you do a Merkle tree hash on it and you add that at the end of the tree and then you call like a file control thing on this thing that seals the file, makes it read only and hides the Merkle tree from the user space. Is it like in like an X-hatter? No, it's in the file. So you write the entire file plus the Merkle tree and then you seal it and then it looks from user space as if the Merkle tree isn't there but it verifies it every time you access the thing. Really? It's pretty cool. Yeah, I'll take a look at that. It's like work in progress. I haven't tried it, but. Sure, it seems even I'd be interested if it could help and do some kind of attestation or if there'd be like a callback cycle there. That's cool. Cool, nah. Yeah, Liz. Does that have any relationship to the contents of the manifest in a OCI image spec? This is completely independent but I very much think that it would be something like an additional manifest or MIME type descriptor that would just be shoved in so like you'd build the image and before you'd push it at a time where you could sign it that you could append some kind of artifact like this in it so that when you go to pull an image later that you would see it in the list of MIME types and you could fetch the first piece like what the Scopio remote inspect does. You just get like the first bit of JSON and then if your tool knows that that MIME type like, oh, I know how to read that that you could then fetch that and get a better idea of the contents of the image and kind of yay or nay yourself on what you do next. That's a hope, active hope. All right, I think we have a break next. So thank you, everybody.