 So, BPF on block devices, I always wanted to have some mechanism to do an efficient error injection in block device or in block IO in general. Because the block layer has so many error paths which are really, really, really hard to trigger as they typically rely on things going gravely wrong and that's the one thing which you try to avoid in the block layer. And it's typically, the error paths are typically triggered when there is a hardware issue, which again is really hard to trigger if you don't have any hardware error injection stuff. So the idea is whether we can't or to investigate whether we can't use BPF on block devices. The challenge here is that we always have to cover it both ways because block device or block IO is essentially a command reply structure. So any command we ever send will have to be returned. So if we do an error injection, that error injection will need to actually terminate the command. We just modify something and then hope for the best but we also might be needing to complete the command from within the BPF. So and, yeah, no, the question is, so first question is do we want to do something like this? Is there, is it just me who sees the value or is it something which is of general interest? So I definitely looked into this and I had mentioned as part of a topic as well, scaling error injection for a block of us too, so I'm really interested in this too. I did kind of do a bit of the homework here on looking into BPF methodology of the doing error injection and let's see. It was further generalized from K probes and all one really needs is just to sprinkle the allow error injection, right? So have you guys used that? That's kind of like where I was going. It makes it much easier instead of having a kernel BPF program to load BPF file, for instance. So it seems much easier to use that given that you basically just have to echo things as root instead of actually compiling sort of program loading it. Now there are some caveats with this though. One of them is that you essentially are limited to only that macro and where it is that you're basically added it, right? So a while ago I basically did this for the whole add disk shuffling for adding error handling for that. The big question that I had was like, why shouldn't we be adding error injection there now that we're actually adding, returning errors there. So the first path of course was just sprinkling the error handling the old way, right? But of course that doesn't scale because then we would have to be adding sprinkles of code everywhere we want to check. If we do the BPF method we basically are compiling stuff, right? But the alternative of course is that I'm indicating is this macro and the limit to that is basically now you have to go and add this to every single function that you're basically enabling dynamic tracing. But I think that that might be easier because it's either one or the other, right? You either do this with a program or you're basically just sprinkling the macros. I didn't get any feedback on that so I'm personally a big advocate of this. So I think that right now the only thing that we do have that's like clean is that macro. But this is something I really don't like because having to sprinkle code at positions where we want to do the error injection. Oh no, no, no, no, that's not it, right? I definitely agree with you. We had reviewed this a long time ago and I looked into that and I agree with you. There is an alternative that basically is just a macro called allow error injection. Have you looked at that by any chance? Unfortunately not, no. So pretty much this was added after the BPF got support for doing error injection. So what this does is basically instead of you having to load a BPF program, you basically just sprinkle this macro at the end, kind of like an export symbol. So when you want to now allow dynamic tracing, all you have to do is basically just add a macro at the end of the function, kind of like an export symbol. If you do that, you're basically allowing the kernel now to do dynamic error injection using debugFS. So all you have to do now is just go through debugFS, echo a few things and then enable it. Yes, but that again would mean we have to sprinkle each and every function. Yes, that is precisely, that is a lot of churn. And the thing is what I found during long and bitter experience, the errors occur there where you do not expect them. Can you say that again? The errors, the errors occur there at that position where you do not expect them. Something expected errors is trivial. But things go wrong when you have an error which you do not expect because, well, that's when things go wrong. And hardware has a habit of doing precisely that. That suddenly whatever a register isn't available, vanished into thin air, returns weird stuff or just kills the HP altogether, something like that. And these kind of things are, well, out of nature, naturally you can't protect against it because not when you would expect it, but you can't really expect it because you don't know what's going to happen. So sprinkle things in the code amounts to the first thing that we know, all right, at this position something will happen so I better add a statement that I can trace it. But the point is we don't know where this will happen, which is why I vastly prefer the EBF approach, which means that we can potentially attach it to each and every function which are having to change much in the code. I agree completely, but I'm not sure if we have the ability to do that right now, do we? That is why we're here, right? Of course we don't have the ability, if we had, we wouldn't have this talk, right? Great, great, yes. And so doing error injection is actually one thing, which is okay, yes, you can do. But the other possibility here is that you might be able to actually steer IO with EBPF. No. All right, well, okay, okay, tell me why, oh, tell me why not, that is not an argument. Get the microphone, Christoph, please. So we can't hear Christoph well because he's not using a microphone, but he loves the idea of using BPF to steer IO. Yeah. And we'll go ahead and implement it. But in all seriousness, could you describe what you actually hope to accomplish with BPF steering? Yes, so that is, there had been an attempt from a company called VM for doing precisely that, meaning steering IO to some other things. This ended up to become some block layer thing, STOF, which essentially will allow you to redirect IO. That's still very vague. No, no, no, of course it's very vague. So they used that for their, actually for backup, because they wanted to dynamic backup and redirect IO to whatever backup they had, because VM is a backup company. And also the other use case here was security audit that you could redirect unwanted IO to somewhere else. To me, the mechanism to redirect IO seemed like the device mapper being reinvented. Exactly. It is precisely the device mapper approach, but then using device mapper means that your system or your block device had to be on a device mapper device in the first place. If your device is not a device mapper device, you have zero chance to do this online. You have to remod and redo everything. It's not my use case. It is what they came up with. Really? Okay. Good. Okay. So I mean, and that was basically my idea, because I never particularly liked that patch set. But if you say it's okay, it's okay, because the alternative would be re-implementing that patch set with EVPF. You're saying you didn't like it, so why did you not like it? Because it feels so wrong. So you essentially have to program a redirecting map for individual IO into the kernel. That feels so, so wrong. I'd rather have a BPF program where I could just do the whole thing by the creator program which does exactly that, lower into the kernel, and that's it. I don't need to come up with a ... Yeah. Yes. I mean, as compared to the block layer thing, that is traceable really for individual IO? Okay. Good. I mean, that was an idea because as I said, I didn't particularly like it. If you don't like it, the BPF one, then okay, right, so I guess we won't be doing it. But that doesn't mean that we couldn't use BPF for error injection here. So if we were to do it just for normal error injection, but then it can't be that hard doing it. Can't it? I mean, I've dabbled in it somewhat, but then I can't claim to be an expert here. So question to the expert, to the experts here, would that be a difficult task? What is your stance on that? Should we go for the BPF approach or should we just go for the error injection stuff? The lower injection thing that Lewis was referring to already lets you put an arbitrary BPF program on there, which basically can only modify the return value if I'm correct? Yes. And that is the thing because that is precisely why it's not going to fly with us. Because the return value, you will just get a return value if the function call itself failed. You can't modify the bio status to figure out or to simulate that the hardware returned something odd. And that's what we need to do. Well, if you pick the functions you're injecting cleverly enough, you can say this, factor out this function that where we're getting the BI set is from set the lower injection and that and you can do whatever you want to that return value. So yeah, and so you say it should be, would it be better with doing the lower error injection and sprinkle the entire stack with those functions? If you just want to do it with, if you just want it for error injection, I think the mechanisms are mostly there. Now, I think the fancier stuff you're talking about was too vague and I guess since you're kind of describing someone else's use case through your own interpretation, it's kind of hard to have a conversation specifically about that, but if you have specific use cases that you have or that someone else here can talk about. My use case here is indeed error handling and validating that the error handling we put in place actually does something sensible. So in my use case, for instance, I want to be able to easily write a block test, for instance. Yeah. And use shell to basically just ensure that when I make the block layer do something stupid, it actually behaves the way that we expect it to. And I want to do that in shell and as little lines as possible. I don't want to be compiling some BPF program. Okay. So jumping in for what we can do with BPF that isn't possible with the other methods that we have right now is BPF gives us the ability to insert injection, insert errors into very specific points like in the transaction commit path or in discard handling or in places where we want to make sure this exact thing happens and we want to make it happen over and over and over again. And I think BPF is the easiest and best way to do that and just because we can't do it with shell yet doesn't mean we shouldn't be plumbing it into the kernel. Yeah, it just means that you're not going to have as many block tests written, that's all. That's all. It's just pick your battles, right? So I just think it's a lot easier to write a few lines. It's a shell for testing errors and you know, writing and requiring writing BPF programs. It doesn't preclude doing all the things you're talking about too, right? Well, do we have consensus that it might be acceptable to, you know, you sprinkle those macros for instance on some block layer functions? I'm happy to do so if that's the consensus. All right, so what is it then? Well, Bart, hey, yes, here, I remember you asking me specifically, you know, when that error handling was being added to AddDisk you specifically wished for there to be error handling. So you know, the question here is what methodology do we want moving forward in the block layer? It doesn't seem like we have an answer. If we really don't want, then fine, let's admit that and never look back. I'm totally in favor of testing error pads. Yeah, so everyone is. The consensus is Christoph is just wrong. I'm sorry. This is a dramatically useful thing to add and we can make it less ugly to the point where it's acceptable to the kernel infrastructure, but it's a huge feature. Yeah, and we don't need to sprinkle it gratuitously. We can, you can pick the right place to put it. I'm really unclear here on what BPF actually brings us for error injection. I did something a lot simpler a while back. Maybe I should resurrect the patch. I thought upstream was doing their own same thing with their injection, but maybe not. Dynamic Debug is this cool trick where it turns all your, makes all your PR debug calls dynamically controllable by a debug fest. So I just took that idea and used it for injecting faults. There's a dynamic fault macro that you pass the string and anywhere you call it, it'll return true if user space is told it to return true. That's all you need for injection. And you can control it by the shell and you can list all the point, all your injection points by a debug fest. So maybe there's some, some really. Big clear advantage using BPF here, but I'm not seeing the connection. So I guess here is one example and let me know if this is something that you can do with, with your thing. We wanted to test a failure case where for whatever reason the disc failed to write the butterfly super block and we wanted to make sure that we didn't lose anything. So we were able to write like a BPF filter that says if the bio sector is the sector where we were writing the super block fail, the IO, otherwise pass it through. Okay. So you're like matching on. You can do like very, very specific things with the BPF error injection. Well, then you need to be able to modify the return value. And I think that's what BPF hasn't allowed because the point of the allow or injection macro is that you pick places where you're not going to cause catastrophic failure by just being able to arbitrarily, arbitrarily overwrite return values. You pick this function that we know can fail in a well-defined set of ways and that's, that's where you're allowed to probe. And we, I don't think we could overwrite the BI status from, from the K probe right now. We have to define these error injection points manually anyways. Why do we need to do any crazy overriding of kernel memory from a, from a K probe? You just, from kernel code, if you're, if error injection is on, then you. Well, why, why can't you make that dynamic? Because you, you're doing this stuff based on code coverage analysis. And it doesn't take it adding that many error injection sites to, I mean, if you're doing this in a reasonable way, you're actually looking what, what you need test coverage for. I think we just have to try and see how bad it was before we talked about either finding ways to overwrite BI status or, you know, which trace points, which error injection points we need to add. So in case you're not aware, the, the block layer already has at least one error injection point, which is should fail biome. That covers a lot of cases like this one that I just mentioned. So I guess find the other ones that you care about at them. And if it becomes too messy to have them everywhere, then we can figure out some dynamic way to do it. But I, I agree that I don't think it's going to take that many to make it useful. Yeah, because I, I wasn't quite sure whether it's a good idea to sprinkle stuff on any position, but I think we should be able to do something. But yeah, I mean, it's like trying to code to add an error injection point. I don't think that's, that's a problem. Okay, that's it.