 So let's try to catch up who started late. So Amir is here. Thank you for coming. Amir, we had a bunch of questions about the overlay FS. How here do you want to summarize? Shoot. Just in summary, that among they will have this discussion that we previously had an idea to overlay our BPL file system on top of other pseudo file systems, such as a group of SIGUFS file system. This could be used for using BPF to extend the interfaces of SIGUFS. You can imagine that the overlay FS is a combination of the underlying SIGUFS file system. And the top layer is the BPF file system. So you can pin in BPF objects in the top layer. And the underlying layer is the SIGUFS. So which is a built-in association with the BPF objects and a particular SIGUFS. This is a way for extension of the pseudo file system interfaces. Similar to that, we could also do not only for SIGUFS file system, but also the PROC file system if we can do that. So we can associate a pinned BPF object to maybe a task for a task in the system. So basically, I was thinking, if possible, we can do this. I tried this, have some RPC patches, basically allowing BPF file system as a top layer for the overlay file system. But the problem I have is that in my use case, the underlying file system is a SIGUFS file system. It's a mutable. So in a use case, the users of this infrastructure may change the underlying created directories in the underlying layers. So this is not an intended use case from the document I read. Not an intended use case of the overlay file system, because it assumes that the underlying file system is a mutable. It is read-only. And every update should come from the top layer. So I tried that. But I do notice that there are some problem sequence of operations that could create some undefined behavior or create problems if I tried it. It's going to weird places when you try that. But we need to ask you more about the specific requirements of the use case. Overlay FS can merge in upper directory and lower directory, file that exists in both. Is that your use case? Or you're just trying to attach subtrees into, you understand my question? Are you trying to overlay specific files or populate a directory with files from lower and upper? Can you give a bit more specifics of the requirement? So pretty much with how I said that, to be able to extend PROC FS, C Group FS, with other files that are not there today in C Group FS. Like in VPF FS, we can pin an iterator program so it will look like a file. But it's powered by VPF on the other side. So you can cut it and it looks like file. It looks like file. There is a checked out with it when people cut it. So the file is specific directory to add new files. Multiple directories, multiple files, but they will not overlap. They won't replace the file that already in C Group FS. It will be a new file. OK, but inside a specific directory, there will be normal files and virtual files? Yeah, so in the same directory, you will do a less. And in the same less, some of the powered by C Group FS and underlying mechanism, and some of them were like added by VPF FS. So we want to create the directory in the VPF FS, but then create an overlay so that to the user, it appears like a single unified layer. OK, and what are the semantics that? I understand that you require the merge functionality of the FS. Yeah, it can be a new file system, but I don't care. The reason, I think, well, you know that there was a previous implementation of AUFS. And AUFS does support mutable lower layers. The conception is that AUFS is not fixable. Not fixable in a way that it could crash. It could oops. And there is no way around it. So overlay FS avoids several of those races by not allowing mutable lower. But the other functionality of overlay FS is copy app. And I don't know if you guys need that. What is copy app? Copy app is you start with a lower layer like an entire image of OS. And then you modify specific files, specific directories. And those are copied up to the upper layer, which starts empty in a container. So that's a major concept of overlay FS. And I don't think you need this. Fast, yeah, we don't need any of it. We just need two different file systems to merge. Maybe we need a new file system, merge FS. My intuition, without having looked at it, is if you would just, for example, create a mount option that avoids copy app completely. I mean, a file can only be in either of the upper or layer and never transitions from being lower to upper. Well, first of all, you can throw away 90% of overlay FS code. But if you still want to use overlay FS, I don't know that there's going to be many complications. Because this just the functionality of merging two directories is in one file. It's in the file readdir. And it's just merging two lists. I don't think there should be any issues there. It's just simple merge. So it's readdir for merging the readdir. And there's lookup. And lookup is complicated if the state of the entry can change from lower to upper. But if it cannot change, then you either find the object in the lower file system or find the object in the upper file system. And then you construct the entry, which is either lower or upper. Yeah, this should be mutually exclusive. The files that are in group FS, BPFFS, will not touch. My intuition, without looking at it, is if you disable the copy app, all the copy app. Is the option already like this? Or we would need to hack it a little bit, right? No, there is no use. Yeah, so we need to introduce like a mount option or something. Yeah, I don't know if it will be accepted, but I'm just saying that from architectural point of view, overall FS has two distinct functionalities. One is merging of the upper and lower. And the other, which is a little bit more complicated, is this transition from lower to upper. And then all of the things that you want to white out, like make something that an object in the lower disappear from the merge by placing a white out in the upper. Yeah, we don't need any of this. And I mean, 90% at least of the code is irrelevant. And there's a little bit of code there. So either you make a good case for all the FS maintainer to just do a mount option that allows you to use mutable layers and disables copyup. Or you factor out this code into a library and do a new file system. It's not a lot of code. Well, yeah, look, at the time, overlay FS is what is referred to as a stacked file system. It's a part of a family of stacked file system. In the kernel, there is eCrypt FS, which is a stacked file system. And just one in. OK, so there's just one, two entry stacked file systems. It's supported in the VFS in several places. Like there is the SB as a member of Stack Depth, which introduces the concept of a stacked file system. And there's a maximum of two stacking. So you can do overlay FS over eCrypt FS or something like that. And at the time, I wrote another stacked file system. OK, I wrote it and it was used internally in my employer. And I was considering doing some sort of obstruction of doing a library lib stack FS, which overlay FS and eCrypt FS could use because, oh, and shift FS. How can you forget? There was an attempt to upload shift FS. And it didn't happen. But one of the things that I said in the review is, OK, there's a lot of code duplication here. And I've been working in fixing bugs that you guys now have for a long time. So it makes sense to share. So just either Miklos accepts the patch to allow this configuration of mutable layers with no copy up. I don't know if there's any complication. I did look at it. Or we do something like that. We say, OK, we have a concept that is inherited. Stacked FS and overlay FS is inheriting from stacked FS. And we leave the implementation of merge. We take the implementation of merge for overlay FS and put it in somewhere common. That's the second option. But try the first option, right? Miklos, as I know him, he would be open to anything that is useful. That's his criteria for accepting patches. Something that is useful for users who doesn't break anything is not picky otherwise. Sounds like a plan. So at least if I got you, there's nothing. Once we disable this copy up, we introduce this restriction that files either in one file system or another. Then it sounds like everything should be all right. And then it's mainly the technical question whether it's a mount option for the overlay FS. So it's a new file system where common pieces are factored out into library. Yes, and I will give 50% chance that I'm forgetting something crucial that won't work. But let's. I have another. So I have a stupid question. Like how would it be possible? I mean, like for the BPFS, it would have to merge. Or it would have to attract the directory structure of, I don't know, procFS or C-group, right? Right. So it would have to recognize their way to not having to do it. So you don't have those races when. I didn't get it. So like for C-group, you have a specific directory structure, right? And if you want to merge those, then BPF would also have the same structure. So you can put the file from BPFS into it overlay it, right? The intention is not to attach a BPF program to a directory, but to the file system as a whole. That's the intention. I get your point, Daniel. So basically, what is being asked here is that in BPFS, you sort of have to mirror the directory structure of C-group FS to overlay it on top, right? Is there a way that you can not do that and say, look, this directory should go there in the overlay without having to mirror it? This is the directory that I'm creating in BPFFS and not have to mirror all of the structure of C-group FS. You want to load the program? Yeah. That's been triggered on C-group creation, right? C-group movement creates a directory. Some files appear in that directory, right? This directory is appearing in BPFFS currently, but in a root, like BPFFS, there is some structure that begins forming from here onwards where these directories start appearing, but we don't want to mirror the C-group FS. So we don't want to do this FS, BPF, C-group, and then the whole thing from there on. Yeah, I understand the issue. I need to think about it. Yeah, that's another limitation factor that overlay FS right now, if you make a modification inside a DPSy3, it needs to now create the entire skeleton in order to just write this one file, and that's what's happening today. It's not that big a deal. I think this is more doable, but it's just from a user friendliness perspective. There's a concept. There's a concept that I added for NFS export in overlay FS, a concept of index. So if you enable the index feature in overlay FS, which is a prerequisite for NFS export, then what you get is that for every directory that has been copied up, meaning that it's part of the skeleton, there's an index file, index by the NFS file handle of the lower. So basically what it gives you, and it was initially created to fix a problem of overlay FS of how to deal with hard links, how to know to associate hard links on different paths when the upper layer is some sort of skeleton. So the index can be used to, like I say, you're doing a lookup of a directory, and you don't have the skeleton. But if the BPF program populates the index with something that tells the lookup, oh, I'm now looking up in this directory so you can somehow use that. I mean, it's not a complete answer, but in OVL lookup, besides doing a lookup by name in the upper and then in the lower, it also consults the index. So it's something that you can play with. So I guess the BPF program will need some functionality to modify that index. Well, it's the index. Look, the upper file system, when you create an overlay, you need to provide a lower upper and a work deal. And the index is in the work deal. And the work deal has to be in the same file system as the upper. So one of the issues, I don't know if you have tackled this, but one of the issues is the upper file system needs to have some methods that not all file systems have. It used to be that it's very restricted to very few file systems, like it needs to have, in order to operate optimally, it needs to have support for rename whiteout operation, extended attributes, all sorts of things. But you don't need that if you don't need to support rename. But yeah, so I'm just saying that you will see when you try to use an upper Fs, maybe there are some restrictions that you will need to relax. But one of the restrictions is that the work deal has to reside on the same file system as the upper deal. So work deal is where the index deal is created. You could create the indexes on the fly. Same mount. It's not a real restriction, but it's in the code, but it's not there. But we could create the index here then on the BPFFS if it's the same mount. Yeah, I think. Yeah, I think. I mean, that's what the code does. It just goes to a backing file system and writes the index there and looks up the index there. So I guess you could. I guess if I understand quickly, the index in the working directory, right? I think I already have an RFC package. Very simple to extend the current BPFFS to be making this working directory working, so making the BPFFS as an upper layer. I can send out this patch maybe today or tomorrow. Everybody's the same. Look, it's not what I'm proposing is not. It's going to be frowned upon, right? Because index is something that is interlamb to overlay Fs, and overlay Fs, writes it and consults it. But I'm just saying that there's a concept that maybe you can make it work and make it not a hack. Because I also consider the fact that overlay Fs needs to have the entire skeleton in order to represent a change deep inside the tree as a limiting factor. So just for example, for containers, if I want to create an image, the change is one file. It could be useful if there was a standard way to say, I'm creating an image. That's the file ID of the directory, and that's the file I'm changing, and you don't need to create the entire skeleton. Just use a similar concept. You may be able to work around this problem. I don't say, I'm not saying it's going to be easy. And also, I don't know which file ID you can use for in C group Fs. And the whole thing needs to be persistent at all. No, so I know number is fine. So basically, overlay Fs uses NFS file henders, which means it uses the standard operation and code FH. You know that? You guys know that? No. So there's a standard operation in VFS. It's called encode Fs. Because NFS has a protocol, when you open a file, you get back a file handle, which is binary blob. And you can use that to write to the file, even if the server has rebooted since. So the VFS has this concept, and the file system needs to support export Fs in order to be exported to NFS, to implement the operations of encode FH and code file handle and decode FH. However, so overlay Fs sort of latches on on this capability to implement the index. However, the complicated part of implemented export Fs, which is complicated, is the decode part. The encode part is trivial, and it actually has a general implementation. It just takes the inode and the generation number, which are there, and that's the blob. So essentially, every file system, I think, you can call even the system called name to handle, name to handle at, and you will get this blob. I'm not sure it works on C group Fs. But I don't see why not. There's just the system call will not work, but the underlying function encode Fs and code FH does work. So you can still use generic code as overlay Fs uses it. For all, like for all kinds of, for example, imagine, like, in the ProCFS structure, we want to add custom statistics. ProC is going to be very problematic because it basically creates and deletes dentries on demand. And you will need to pin the dentries of the lower. I'm not sure. What is the problem? What is the challenge with ProCFS? I was wondering about it. If you create a new, if you look something up in ProCFS, the inode and the dentry doesn't persist. It basically creates and deletes files on demand. So otherwise, you would have a problem because you would keep millions of dentries and inodes around and ProC would be megabytes, gigabytes large. So there are two things regarding that. So a network file system as a lower, like NFS, dentries can also go away on the server. So for that, there's the operation revalidate, the entry revalidate that the VFS does. I mean, when you look up some file, if it's in the cache, if the file system may invalidate the dentry, then the VFS calls the revalidate to see if it can still use it. So it's not a full solution to the problem, but OverlayFS does call the underlying file system revalidate to check if the dentry is still valid. That's one thing. The other thing is readdircache. OverlayFS does have readdircache. So if things change underneath, the change may not appear in readdir, unless you invalidate the readdircache, which OverlayFS doesn't do because it doesn't expect lower directories to change. Yeah, I think that's exactly what I saw when I was doing experiments. So basically, I create a, when it directly created in the lower, and then you pin object from the upper, and works fine. But then if you delete it from the lower, and then the upper layer, from the upper layer, you still see that the directory exists, but you can't access it. It says that it's invalid. I think that probably have some caching that's invalidate the entry. That's probably what you are saying here. I think that's in the 50% that I said. The issues I didn't look at, didn't think about, but... Can we disable this cache? This is a simple use case, not big files. Can we disable the cache? You can, if you don't care about caching readdir, I suppose. Sure, that's the solution here. Apart from that, I have patches that I am going to try to upstream, not sure about the prospect of accepting them. Patches to use the FS notify to detect changes in the layers and inform overlay fs on those changes, right? AUFS supports that. That's how AUFS deals with changing overall. So I have patches. I'm not sure how far this will go, but again, for this issue, for the readdir issue, it's not relevant. We'll support you on those patches. For the revalidate issue, I also had patches for that. I need to find them. Because overlay fs already does revalidate, but it's not working so well if you're changing local file system. So I need to look at that. So I think the next steps is for us to hack around a little bit around. Yeah, try something. Try the concept of disabling all the copy up and readdir cache and start with something. It will have problems. Post it, and I will help you if you need. Point you at patches if I have. Awesome. Thank you. That's great. Are you going to be at Plumbers? I think I'll attend virtually. Thank you. Thank you for coming. Thank you so much. Thank you.