 We're all ready. I don't know if we can head to our seats. All right, I'm going to get started here. And hopefully, everyone can hear me. And I don't have any particular slides here, so we're just going to go through things. All right, yeah. We've got till 10.30, so I just want to make the most of the time. So sorry, everyone. We advertise Matthew, and then pull the bait and switch on everyone. You get me for negative dentaries now. So I'm sorry about that, but I'm Steven. And I think a lot of us are familiar with this problem. I'm going to go over it very briefly. I just want to be able to lead a discussion. So imagine you've got a big server, lots of memory, and a workload that's not using all of its memory, say 500 gigabytes and only using 100 of that for whatever your workload is. And your workload's also looking up, say, unique identifiers a few times a second in some directory, some file name. And it's doing that for months on end, a year on end, something like that. The issue we get is each of these lookups creates a negative dentary. Each time, frequently, those don't get used again. And what you end up with is a cache of dentaries full of negative dentaries that may have only been used once or not at all since they were created. And normally, we would see that those things are going to get freed up by memory reclaim when we have memory pressure, but that doesn't happen if you're in this situation where you've just got tons of memory and you don't know what to do with it. Linux will never say we need to clean these up. So to convince you that this is an issue, there's a couple motivating examples that we've had. Soft lockups, when you're iterating over the list of children of a dentary, but there are others. Page fragmentation can happen as you keep on using more slab pages, and then even more concerningly, I think, is the idea of slab fragmentation. Say that you have 500 million dentaries and then you end up deleting some directories, freeing these up. Now you have so many half full slab pages. There are a lot of issues that come with negative dentaries. And so what I'd like to do here is kind of follow up on the discussions we've had on the mailing lists and talk about a couple goals that I think we'd want to have in a solution for this. And yeah, we'll go from there. So some of the goals I want to, I think that we've all talked about, one is a generic system for managing these caches. We have a lot of LRUs in the page cache, in the file systems area, and they are interesting. We, in the modern idea of an LRU, I think of it that you move something to the end of the list once you've used it, but instead we mark it as referenced and we leave it up to a shrinker to decide to move an item to the back of the list. So some problem that I think this is generic to all of these LRUs, I know LRU, Dentry LRU, we don't actually do anything about rotating items to the back of the list until the shrinker runs, and at that point we're in memory pressure, and so two problems there are with that. Number one, a referenced dentry might have been marked referenced a year ago if we had no memory pressure, and it's not referenced anymore, it's not useful, but the shrinker's gonna have to go through that anyway to just to move it to the end of the list before it can even free it up. And similarly, well, those are the two issues in a nutshell right there. One, really old dentries and a lot of extra work for a shrinker. So one of the things I wanted to talk about, I think Dave Chinner was especially interested in this, that what we want is an API that is generic enough for multiple different caches to use, and I don't know if anybody wants to, I'm happy to take comments here at any time. What we want to do is, something I have at least is some work on the list LRU side of things, and this is something I've been thinking of as the best way to go forward. Say every time, I think Matthew had also discussed this, every time we add an item to an LRU cache, we should consider rotating through that LRU, something that the shrinker would have been doing. So go through one item in the LRU, if it's referenced, move it to the back of the list, and in the case of maybe a negative dentry, we can apply some sort of cache aging rules at that point as well. So this would be done at the list LRU side of, the list LRU add side of things, but there are some other options as well. We've got some information on the Kmalik side of things. We can, when a dentry is allocated, we now know what list LRU cache it belongs in, and we can do things like track the number of dentries that are currently allocated, the number of dentries that are currently in the LRU, and we can use that information. So, mm-hmm, yeah, yeah, yes, okay, good question. So to restate that, what is the problem here? We have bound dentries that are positive in the LRU, we have unbound negative dentries. Is this a problem of redoing the shrinker infrastructure, or is it a problem of looking at negative dentries in particular? So, yes, negative dentries, in the systems I've looked at that have this problem, you dump dentries on those systems, you see 99% of them are negative dentries. So in that sense, yeah, this is a negative dentry issue, and we don't need to be, we don't necessarily need to do this in the shrinkers, but what we do need to do is have something that is addressing the issue, specifically with an aim towards negative dentries. However, I think that using the shrinker, the existing LRU framework, and the existing shrinker framework can benefit that quite a bit. Matthew, yes, yeah. And there's something to be said for, so what Matthew's saying is it's, negative dentries are one of them, but inodes also have a used once problem. And I think there's something to be said for, this is what LRUs are built for, but the ones that we're using don't do this well. When we mark, when we decide to use a dentry, when we use a inode, all we're doing is setting a bit on that data structure. We're not moving it to the end of the LRU. And so that means that this list of used once dentries is pretty evenly scattered throughout the LRU, despite what you would say is at least recently used cash. That's not what we have right now. And so in the case where there's no memory pressure, you have some unreferenced dentries sprinkled throughout. You have used once dentries, maybe used once inodes, just throughout this list. And that makes it hard to say two things. First, if you're in memory pressure, I'd like to clean up as much as possible, and those are the best items to clean up, things that were used once and never again. So that's an important question. But separate from that, you have the issues of workloads that are creating stupid amounts of negative dentries that don't need to be there. They've been used once and won't again. And so outside of that, when things are spread throughout the LRU, it makes it much harder to find them. If we had a way to keep the LRU better organized, keep the lease recently used things at the beginning of the list, even without memory pressure, then that might go a long way to solving our issues with negative dentries. So that makes sense? Mm-hmm, mm, no. Yeah, and I don't think it's excellent for a lot of workloads because you don't want all that contention on those spin locks. I think that's why that optimization came into place. But one issue with that, I love this idea that was just brought up of not marking items referenced the first time but waiting to the second time. One thing that that's great and then the next problem that comes up from there is say that you do have a great list of unreferenced, ready to free items. Then we have a different question. How much is too much? With memory pressure, we have a great signal to say we have too many dentries and we need to clean these up. But with negative dentries, the problems that they can have those soft lockups, I'd estimate they occur around 100 million dentries in a single directory, give or take. I imagine you could use some magic with a Bogo MIPS calculation to figure out where your issue would come up to. But I'd like to not see us using some sort of magic threshold, but we need a mechanism to start shrinking at least these one use items. These things that you can have a debate over whether how frequently you rotate the referenced items to the beginning of the cash, this cash management policy. But for these single use items, maybe there's a better way to start the process of just getting rid of these single use items before. Yes. I've heard this, and this is an interesting thought, the bound cash, essentially thinking about managing positive dentries or dentries that are useful or bound to inodes separately from negative dentries. And that's a neat idea that I actually haven't considered until this morning. Amir brought that up to me. Yeah, one thing you can consider about, yes, easy, special, yeah, yeah. And I don't know, memory management folks can correct me, but I don't know that we have that sort of feedback from SLAB right now. We don't have a way to say I would really like to relocate one object in particular because, or a few objects in particular because they would free up this page. Matthew, I've seen these patches, yeah. That's a really good point. So you're saying tracking this by PID, by task struct and by PID, and hopefully trying to identify a bit somewhere that says, hey, I'm a negative dentry creator and my, or so something, yeah, something to point out about this is that this isn't necessarily a fast problem. We're not necessarily talking about a task that's every allocating negative dentries at a rate of hundreds, thousands per second. This could be a slow trickle, talking about system D services starting somewhat regularly, but we're not doing thousands and thousands per second, but over the course of a years up time, if you're doing one or two per second, you're adding up to hundreds of millions still. Yeah, and yeah, there is some other, we create dentries in a couple of ways, right? We have them for lookup, we create a negative dentry, turn it into a positive, or we just leave it negative during lookup, but say I delete a file, then we're going to have two options. Either if there are other users for it, we keep the inode linked and we take it out of the hash table, or if the file is not being used by anyone else, then we just convert that dentry to a negative one right there. We set inode to null, we leave it in the hash table, and we use that for negative lookups in the future. So we've got a couple places where workloads are different, just because something is creating dentries a lot by looking them up, another workload might be creating them a lot by a temp file that gets deleted quickly. And so I see tracking these things, tracking these things, oh gosh, probably not a lot. I actually don't have that number, but I know that's our favorite benchmark in this group, but I don't have that number. What I have are customer systems where I see the problems start happening around 250 million, and that's a count of all dentries, but at the end of the day we're, like I said, we're looking at 95 or above percent of these are negative. And yeah, in the back here. Yeah, that makes a lot of sense because with dentries, just to restate your point for the microphone, I know we're a little crazy on that right now, but to restate your point, this is one I wanted to bring up. Linux believes all memory should be used if possible, but there's a cost to reclaiming this memory when we do need it and we're not always great at doing that. And while it's pretty easy for a negative dentry, this is actually a really happy place to be and to have a problem with a negative dentry because we just get rid of it, but any improvement along that line will help, yes. True. All right, yeah. Let's see, what I'm looking for you guys is really this discussion that we've had. We've had some really good ideas between tracking negative dentries, separately between trying to improve these caches in the way we maintain them, improving the way that we migrate or making it possible to migrate some of these data structures. What I'm looking for is more eyeballs on patches that I send out and hopefully some people who care about at least solving this problem in a generic way that can start helping these other caches, these one-off caches while still getting rid of the negative dentry problem. So, certainly the dentry cache has some constraints with RCU and with trying to be as fast as possible for the common cases. I will say, having worked in it, I haven't found it to be the biggest constraint. There's locking ordering that's very difficult. There's RCU is an interesting world to live in, but we can kind of manage with some of those. At least that's my opinion, though. I see that it is a pain and it does add constraints compared to, say, inodes or something else, so yeah. Quite familiar with that code and have looked at it and reviewed it and we're using it in some stable kernels. So what I think of it is we don't want more CSTL knobs unless the user really understands, sees a reason for it. This is something that the kernel should know how to do. It should know that some negative dentry that I looked up on the server a year ago once isn't important and the 100 other million of those that I did the same thing with aren't important and it shouldn't take a CSTL knob for the user to say, don't keep those. So, yeah, on one hand it's a great fix and it's something that sees use, but I don't want, I don't know that many of us want to see something like that upstream and I think this is the last one and let's try to wrap up here. And we can, yes, yes. I think you, that's a good point and that's probably my favorite point that's come out here is that we can consider some negative dentry specific caching, some handling. There's a lot of options that are outside of the box of the way we do things now and we shouldn't let that be our constraint. Can I, you think it's best to take anything else out to the hallway track or it is break time? Yeah. I'll stay here.