 Can I start? Okay, thank you. Hello, everyone. So the topic is NME Pass-through and about making it more useful and we would be talking a lot about Eurin command as well. So this whole thing is at high level, it's a combination of Eurin command plus NME Pass-through. Maybe for the people who haven't had a look at the patches, maybe this is a bit of a overview of things, just stuff. So Eurin command is the facility which is about eliminating the Eurin capability for any command, exposed by maybe any underlying component. Could be Driveware, could be file system, could be maybe network. And with NME Pass-through, we anyway allow any arbitrary command. And with the generic care interface in NME, which is DevNGX, with that, we just make sure that this interface is always available. But then, currently it is driven by the sync iOctl. And the idea is to change that and have it up, have it done with the Eurin command. And that's going to be the first use case, but I have been hearing people are talking about other possibilities. I heard one from the user space blog driver folks, and I think that the big CQE may unlock a couple of other possibilities as well because we would be able to return additional results. So maybe if you look at this picture, the elements are we are having a regular SQE, which is 64 bytes. What we are adding, the new thing here is the big SQE that Jen's already added. And this one is a combination of two regular SQEs, so overall 128 bytes. What this gives us is 80 bytes of space, which you can see over here. 16 is coming from the first one, and 64 is coming from the second. And we can use all of it to place the sync command in line into this particular big SQE. And then the application is going to submit it by using a new code, which is iOctl.opEurinCommand. And then when we talk about the communication between iEurin and the underlying layer, which is gonna be happening with this callback eurin command, the file op. And it would be taking struct iEurinCommand, that's another structure, but this is internal to kernel. And then yeah, and we will do what it does. The thing here is that we wouldn't be using the existing opcode for the sync one, rather we would be using a new one, and maybe eurincommand iO at this moment. And then yeah, whenever, so the submission path would be done, just after the submission, and whenever the iO, whenever this particular command gets completed, and we may be supposed to call iEurinCommand done. And when it does that, we would be returning the normal result that we need to CQE, the first CQE anyway, but we also needed to have another result. And for that, we are talking about having the big CQE. So the secondary CQE would be having 16 bytes of space. So that actually means that we have two eight bytes of fields, currently we would be using only one, but probably it can be used for some other use cases as well. And if you look at the code structures, maybe currently this is the same thing that, but probably in the form of some code here. So the iEurinSQE, it's going to take a command op, and it could be the operation that we want to turn into a sync. So in this case, it's gonna be NMEME eurincommand iO. The command length, I think I would probably be deleting it, but then if we want to be a little more generic, probably we would say that this is the command length, and it can be anything under 80 bytes. And the last one is the part where we say that we are going to place the command over here. So this is the starting offset, and we can place the 80 bytes of command into this. And yes, so iEurin is going to prepare the right-hand side structure, which is iEurincommand. And once it builds it, it's going to call the file op as in command. And in NMEME, we would be, for this particular handler, we would be supporting the eurincommand iO. And currently, I think we are taking the pass-through command 64, but probably we can discuss maybe whether it makes sense to even change that and probably reduce it by at least by eight bytes and call it something else, probably struct NMEME iEurincommand or something like that. Khan, John has a question on Zoom. What's the difference between a big SQE and a fuse command? Sorry? What is the difference between a big SQE and a fuse command? A big SQE and fuse bit? And a fuse command. Ah, no connection, I mean no connection. Fuse command, so big SQE is, we are talking about at the iEurin level and fuse command is at the NMEME level, so currently no connection at all. So the big SQE construct is at the iEurin level. And yeah, coming to the stream plans, so I think we probably can have, as far as 5.19, probably we can take big SQE, big CQE, eurincommand, and then we may have a pass-through, maybe the non-vector and vector version, and once that is merged, I think we can plan the bunch of other things which came, some of these things came during reviews. And I think, yeah, we can line it up later, polling, admin pass-through, multipath, maybe biocast or PMAP buffers. And that is all, that is all I have. Questions? Yeah, that, yeah. For multipath, so today, for example, for SCSI pass-through through DM, pass-through commands don't get retried on other paths due to a path failure. So what were you planning to do here? So the VICs? There's no implementation for that. All that it does is it picks, essentially the last known good path and sends the command down, but it won't retry it on failure. So I was curious what you were thinking of doing. Yeah, so at this point, at this point, so if you look at the pass-through interface, can it even with the sync one, we do not really have any retry mechanism at this point. So if you send the octal, it basically does whatever it can, and then if it fails, it fails. It basically goes to easy user space and the user space can retry. And I think in the first round, you'll probably go ahead with the same model and yeah, maybe subsequently we can see whether we can try, we can do something about maybe retry or dequeuing and stuff. So what I would suggest is if you don't intend to do any retrying, if you could put a comment or something in the code to that effect, saying that this is a deliberate design choice until you decide to implement something different. Because it turns out a lot of people thought that commands were getting retried on scousings, but they're not. I don't think you ever want to do a retry on pass-through stuff, right? The thing is that you occasionally don't have a choice. Because depending on where you submit your IO, it might or might not be retried. And we only have the notion that commands will not be retried and upper-lies will have to handle whatever the fallout is. Turns out upper-lies do not handle it. Well, there's no upper-layer here, right? That's the application doing the IO, so. This is the definition of the pass-through interface. In SCSI, in NVMe, and we're not going to add any new semantics here. This is going to mention the semantics of the existing synchronous path trialkal. And because we really need to match the semantics, we need to do multi-pathing for 2019-2. But it should be trivial. It really should be trivial, because you basically just, honestly, I don't know how you get away without even doing the pass-election. And that's all you need. Yeah, no, I think that's fair. I think my main point was just, I would hate for the driver to try and attempt. There's no defined semantics for retry of pass-through, like, you know, what's the command? It depends on that, so. But yeah, for the multi-path selection, if it's trivial to do, then might as well just get it done. For the admin pass-through, isn't, what is the reason for not including it in the first pass-through going into the kernel? I would, I mean, if I, one of the main use cases I see of this is attaching SPDK, and then use the SPDK, and then use it as a backend. But I would imagine that SPDK would love to have the admin pass-through. Just use the sync. Nobody here cares about SPDK, so we don't want to enable that use case. I forgot that. Boo, this man. Yes. Just use the sync IOXL. Usually just provisioning stuff for most other things. Nobody really cares. So, Anshin, do you happen to have any performance numbers? Against what? Against the regular IO? Yes, yes. I think I do not have it right now, but I posted some in the cover letter. And what would you like to see? Would you like, I mean, what do you expect? I mean, do you have, so the way I see it, I think the numbers are good. This whole thing, if you compare it with the sync, definitely it's gonna be better. But when we're comparing it with the regular director of IO ring path, it is comparable at this point. Of course, it doesn't have all the features. I mean, all the features of the regular path. And, but the numbers are good at this point. So, Tia, I think the point is that it scales. I think it scales linearly, yes, as normal, but it read and write, so I think that's what you want to say. So for if I'm not doing SPDK, doing some other application, then in that case, if I don't have admin password, I would have to open another like the char device and then get a password. I know, but if I didn't want to, for which part? You're saying we would need always to open both. So this one currently, the one that you see over here, that's for the namespace, and this is for IO pass through. And for admin, we would be having a separate handler. It's basically simple. It's just that, I mean, it's a conscious attempt to probably make it lightweight so that it's easy to review. And I think if I probably borrow the words of the store, set the basics right first. And then, yeah, so I mean, the attempt actually is not to make it super heavyweight and go nowhere, but it's feasible. I mean, it's actually very easy to wire up admin pass through. I mean, you can always just, if we assume that the stuff you have listed there gets staged for 5.19, nothing stops you from just posting feature passes for the other stuff, right? And that gets looked at and reviewed independently and maybe some of it will make 5.19 too, right? Or maybe it waits for 5.20. It's just, I think, important. I like how you split it into like the core bits, right? This is core functionality. This is the bare minimum we need to do. Take a look at that, get a review, right, get that staged for upstream and then take a look at some of the more advanced or different features. I mean, even 4.19 were kind of running out of time, so please get a new patch set out to be reviewed as soon as possible. I'm pretty sure we will do at least another cycle on it based on review feedback. So get that out quickly. Do the admin commands right after. They're kind of not critical, but at the same time, useful and trivial. The rest we can slowly work on, I think. Yeah, I think for, so the bits you had on, if you go to the next slide, I think the polling support might be nifty, but clearly that can totally wait. And the same with the BioCache and Pre-Map buffers, right? Those are like, we'll look at those later. The other two are more important, but I agree. Just get the patch set, the new version, post it. We can do another round review and hopefully that'll be it. If we can queue it up next week right then, we can still make it. If it misses next week right then, then it's all gonna be 5.20, I think, at this point. Shoot. I have a question about security. How will it be prevented that someone modifies the firmware of the drive using pastoral? Or someone who shouldn't do that? I think, could you elaborate the question? Did you say security? I, I think I... We've passed the law, you insist. Yeah, there's the same issue with SCSI. I mean, you can do weird things there, too. So short answer, security model is exactly the same as the existing pastoral actals. The security model there right now is, it's all CAPSIS admin, unlike SCSI, where we have a wide list of a bunch of commands where we don't require it. So if you, assuming we're implementing the admin pass through, which is gonna happen, I mean, you need to be privileged to send any admin commands and the firmware updates in NVMe, totally unrelated to what we do on the pass through session, by the vendors are usually, unless someone really fucks up, required to be signed. So you need to have a privileged process and even that privileged process, so any of the pass through interfaces could at best update to another signed firmware from the vendor. And I know a lot of vendors have some kind of downgrade protection to deal with a case where you want to reintroduce a security bug. The important for this patch of it, right, is the same security models we already have, right? There's absolutely no difference. It's just an async model of the same. Can you do bit buckets with this? Bit bucket? The partial sector reads. I don't think it would actually work, but it's... I never came here, I never saw this line. I don't know what subsystem. Thank you, thank you, thank you again. All right, at four o'clock, the file system people are gonna be joining us to talk about unique file system identifiers and be identifiers.