 I get asked, what's be cash fast targeted at? And the answer is everything. My longstanding goal is to be reliable and robust enough to be the XFS replacement. That's something that I've been taking seriously for a long time. And it's been a few years since I've gotten to present. So I'm gonna cover what's been added, what's new. Got reflink and then after I did reflink, Dave Schinner messaged me and asked me, what's going on with snapshots? Oh dear, I guess I do have to do that. But then what I realized was that the reason I hadn't tackled snapshots for the longest time is that extents and snapshots are a difficult combination. But some other things that finally reworked in the way the B-tree code handles extents meant that snapshots were now a possibility. So snapshots are not done and they scale beautifully. I've gotten up to over a million snapshots in a test virtual machine. There's no scalability issues in FSEK, it all just works. Snapshots are basically the same model as ButterFS. There's sub volumes, same external interface, but completely different internal implementation. More recently, there's been an allocator rewrite. So the background is that B-cache, B-cache FS is sent in from B-cache, which was kind of the prototype, skip everything working but in a minimal amount of code. And B-cache had a number of algorithmic scalability issues. It was targeted at just being a cache when SSDs were more like 100 gigabytes. And now I've got users with B-cache FS on like 50 terabyte arrays. So things that you can do things with 100 gigabytes of data that don't, no longer scale at 50 terabytes. So I've been methodically one by one working out all the rewriting, all the algorithmic stuff in the allocator. And that's now done, completely done. Got persistent data structures to replace things where we used to have to periodically walk the world. Persistent LRU for caching, back pointers. That's like the biggest, the last big thing on my to-do list to accelerate copy GC. So copy GC now no longer has to walk the world. This is huge for like zone device support, which is gonna be a little ways off, but that's coming. And then when is upstreaming happening? So for upstreaming, I wanna be able to not go insane and still be able to write code and that happens. So I want to work through all the, it's not so much anymore about working through the to-do list because the to-do list is always expanding, but all the really big pain points are pretty much out of the way now. And my big exciting thing has been this. Now I've got a public to-do list. This is what makes me think that upstreaming is now close and I won't go insane when it happens. So I've now got a thing that I can point people at where there's nice clean to-do list items that are organized and it's long, but it's not like gaping missing functionality, it's enhancements and stuff that is not a massive pain point. Hey Kent, can you say something about where and how it is being used in production? I know that it is being used in production. I don't know anything about how many sites because I never find out until I get a call. Oh, at this customer site that actually has been used for two years, which I find out because when I go to look at the version that it's running, but it's been in use in production, mainly in video production houses for several years now, where they need to be dealing with multiple 4K streams uncompressed for editing multi-camera workflows. So the reason they found me was they needed something that was higher performance than better FS and at the time I was the only game in town that had the feature set that they were looking for. So what do you have left? So I'm looking at this, right? And there's a lot of just internal Bcache stuff. Can you call it kind of, I remember when we talked about this at Salt Lake City, which I know has been a while now, like it was basically get the interface stuff, right? So that like, because we're not gonna review Bcache FS code, we're not Bcache FS developers, but like the file system guys are gonna like look at interface, where you- Yeah, we were talking about the IOCAL interface back then. The IOCAL interface is like, hasn't changed in a while. Now I'm thinking about getting the on-disk format settled down, I've still been pushing out on-disk format changes pretty regularly. Back pointers is gonna be another on-disk format change. But after that, I don't think there's any more on the horizon. The big thing in my head has been just getting the, I don't know about the recipe, but I can't multitask. If I've got bugs coming in or users that are trying to get up to contributors, trying to get up to speed on the code, it's hard for me to do good focus development work while juggling that, having my brain state wiped every time I have a bug report coming in. So I've been trying to get all the deep like rewrite stuff done. And with back pointers getting done, that's feeling really good. So what do you think like end of the year or like how much work do you think that you wanna do internally before you talk about merging the upstream? And like what do you want from us, right? Cause like, I think that's, like we said before, I, for merging a new file system, it's really just about like how you interface with the rest of the PFS and MMS stuff. And like, from what I can tell, at least in Salt Lake City, that was pretty clean. So kind of what's holding you back right now? I wanna be able to not go insane. Well, anytime like the number of users jumps up, I'll be getting more bug reports. That's the big thing that's holding you back. I wanna be able to actually respond to all the bug reports. And the other thing that I know from painful experience is that it's about 10 times quicker to deal with and fix a bug when I encounter it in the course of my own development on my test virtual machine setup versus someone else tripping over it and then trying to work up a way to reproduce it. So if there's still bugs that people are gonna find, I wanna wait and work through that stuff. But also actually the other thing that I should talk about is debugging, the overall debuggability has, that's been something that I've been working on, like focusing on for the past several years and the past six months, that's been paying off in a big way. The allocator rewrite went extremely smoothly. One tool that I should probably show off, there's a list journal tool and this is actually, I'm gonna be talking about this more tomorrow with the OM debugging session. The pretty printers that I wanna use for OM debugging came out of working on improving log messages in Bcache FS and converting all that to common infrastructure. Stuff that I used to have to debug with a debugger, I can now just debug with grep. Well, so I think maybe along the lines Joseph was talking about, I wanna sound a lot more selfish than he does, but our FS has a bunch of warts and how we interface with the VFS, partially because our I-node numbers are effectively huge, partially the ref link interfaces, the compressed diactyl interfaces, stuff like that. And so I'm really excited whenever someone gets to come in and share my warts because it makes us less special and it's more pressure on finding better ways to solve these problems. Yeah, yeah. So I think you've got pretty good context on a lot of the interface things that are awkward with ButterFest and I was curious where we might be able to team up on that. We do have some of the same, the I-node number thing that comes up with NFS, that's also gonna be an issue with Bcache FS. So I was following that discussion, I didn't have bandwidth of the time to really think hard about it or come up with anything useful to say, but yeah, that when whatever solution that ButterFest uses, I'd like to make use of too. I think what happens with the I-node number solution is every eight months or so someone pops up and tells Joseph it's stupid and easy to fix. Yeah. And then we have to prove all over again that it is stupid but hard to fix. Yep. And so things like that are what, all of this stuff is really cool. I don't wanna diminish all the work you've done but the part I'm most excited about is the shared problems. So what Chris was saying, it's a lot easier for me to argue like, okay, ButterFest is not the only person with this problem. Let's change interfaces. Things like what I wanted to do for the I-node number thing which is add something to a stat X to give us a better identifier, right? And things like that, when it's just ButterFest, like there seems to be allergic reaction to making any sort of interface change to make it easier for ButterFest to better describe what it does. But if we've got two of us, it's a lot easier. Yeah. Two things, I was just doing some quick web searches on the CacheFS status and I don't know how up-to-date some of the pages I found were. But two things came to mind when I was looking at that. One is before you have an order of magnitude more users, I would strongly recommend that you take a look at what your FSCK repair capabilities are because users will have hardware problems and if they start reaching out to you to like manually fix things. I noticed there's a page that talks about all the things FSCK checks and those things that it delegates to the kernels BtreeGC to check. But I didn't see anything about repair. Repair is all there. Okay, that's good. This is something that, yeah, has been focused on for the past like two years. And that stuff is- How to sheer self-defense, that's a good thing. And the other thing is I noted that you were saying that you were using a combination of XFS tests and K tests for your testing and that at least on the page I was looking at that you didn't have an automated test runner to test all the different combinatorics of various file system configs and whatnot. There are multiple file system testing rigs out there. I have one, Luis has another. And I think both of us would be happy to work with you to get your automated testing story improved. So you may want to track us down in the hallway track afterwards. Okay. You or Joseph mentioned and you mentioned that you wanted to add something to Statex to allow you to give a more useful iNode. No, I just want to, this is something perhaps network file systems may be interested in as well. This is a whole issue. We don't need to take up Kent's slot with it. But yeah, it's mostly for network file systems and also for user space, but in a less, you know, less important way, right? Because NFS gets ugly because you see the same iNode number and that confuses things. But with, at least with local file systems you get the different IDs. So you know it's a different file system, different, but yeah, I'll put it on the, it's really going to screw up my like no swearing thing, but I, so debugging tools is actually, and this is the thing that excites me the most, except, oh, we're not gonna do anything for none of this. This is how I do a lot of my debugging these days. Is this your pretty print thing? Yeah, yeah. So I've gotten the habit of making sure that every single type has just a generic to text method. And this code is shared by both user space and kernel space. So the user space list journal and like show superblock tool uses these printer printers, same printer printers that I want to use for OM debugging. And because every type has this, it makes it really easy to just wait in a log message, just dump the, like the full keys. And then you've got something you can grab for. So like debugging, like allocator inconsistencies where we find some inconsistency in consistency in the free space B tree. You just grab for that key through the journal. And I've got in the journal annotations for transaction commit, like the start of transaction commit, including there's a start of transaction commit. You can see what code path did that commit. I've got a bunch more stuff in Sisyphus. I haven't had to even drop to a debugger for debugging deadlocks in a while because all my state has pretty printers that make it really easy to dump it in Sisyphus and debug FFS. So something gets wedged and find out, like first we're off where I got wedged by looking at PROC PID stack. Then you know what state you're looking for. And then I can just look at like what the journal pens are and then backtrack that back to say the B tree node right that got wedged. This is all information on cash B tree nodes. This is like everything. So the state of this stuff is what, yeah. This is what I'm excited about. You might want to look at integrating some of this into dragon. What's dragon? So dragon, oh, Omar's in the IO check. Dragon is Omar's Python based live and after a crash debugger. And we use it all the time pretty much. Every investigation that we do in production involves poking around the running system with dragon. And so you could teach dragon your printing stuff and then it can easily walk every super block and every eye node and all of the things you might want to look at a little bit more easily than Sisyphus, I would say. But still using the same infrastructure which is obviously really good. That'd be cool. Yeah, you can teach dragon the format, like where to find the logs and then you can even have dragon search the logs themselves. I do this with one of the last Plutter FS the logs that I had, like I just have a bunch of helpers that like, okay, I know what pages, like go find all the locked pages, walk back, figure out where the bio was, did the bio complete? Okay, the bio didn't complete. Why didn't the bio complete? And like it can walk back and find the process that submitted it and dump the stack. Oh, wow. And so like, you can do all this because like, so dragon is just giving you the interface, right? You're writing Python. You said that's working on crash dumps? So you can do it on live systems. Like I was using a live system. So it just uses PROC-K core to pull the memory contents of everything. Interesting. So where does the code live for like parsing those data structures and dumping it textually? So that's what dragon does. Like dragon like reads the dwarf stuff. Oh, I can talk about this better than I can. It like reads the dwarf stuff. It can load the debug info stuff and process all that stuff, which is really nice. Like you'll see sometimes I like write random code to like change the defines to enums and that's so I can easily like just say enum and I'm a variable name because dragon can resolve that for me and I don't have to like look up the code to find the magic number. Like dragon just can find that, what the value of that enum is. And so it's been really useful for like debugging multiple versions of kernels. Like I can find page deadlocks really quickly because I have a helper that just like finds the bit that I'm looking for for the kernel and spits out page table entries. Okay. So it can also symbolically show you variables in the stack trace. So if you're deadlocked and you know that like three functions in is submit bio and you want to find out the I know that was submitting the bio, like you can just say show me I know the variable and it'll work. Okay. Yeah. Tell me more later. So the other thing that I want to talk about, oh, look at the test passed. It's always good to know. Where did we've got a user manual and it's up to 25 pages and it's going to be getting fleshed out a lot more organized by feature. So there's getting to be user documentation. I'm sure it also could say about this option list. Tell often I get up here and present. Anyways. Cool. So I like it. I understand a lot of the hesitance and stuff and you don't want to take a little longer to upstream. I think that's reasonable, right? But, you know, clearly we have our own selfish reasons for when you wind upstream as soon as possible. What do you only get one chance to make a good, good first impression? Yeah, that's true. So like I'm not trying to rush you, right? But I think I'm still kind of getting back to the like, what is your timeline, right? What I'd like it to be within the next like six months based on the bug reports that I'm seeing coming in, I think that that's probably realistic. I guess I should mention now, it would take about roughly maybe two months to create a proper baseline with high confidence. So just keep that in mind. Like I'd love to help with creating a baseline for your file system with high confidence. So if that helps, you know, just wanted to throw that out there in terms of your timeline, you know, I'd love to help with that. This is for testing? Yes. Okay. Let's talk more after. When is the rust rewrite? So again, the last, When is the rust rewrite? There's already a rust code committed. The mount tool for mounting multi-device file systems by UUID, that's written in BcacheFS. And one of the contributors has actually been working on integrating that to the same binary. I haven't tried this out yet, but in theory now we should be able to call back and forth between C and rust, both directions. That's only in user space, but as soon as rust lands in the kernel, I want to make use of that. There's so many little quality of life improvements in rust that just iterate our tools. Being able to write a real proper iterator instead of writing crazy for loop macros all over the place. Let's all be a little bit noisier about that. Make sure people know that the demand is there. I think that might be my half-tower slot.