 So, welcome to this talk, which is about a facility which entered into the kernel last year in 5.19. It's called URING command. While the title also says that it's about revamping IOCTL, but it's a bit more generic than that. If you are looking for building a new user interface for whatever reason, perhaps this is something that you might want to consider. I'm Kanchan and Anu is the co-speaker, but unfortunately he could not make it. So yes, let's start with maybe roughly dividing the user and kernel communication into two categories. The first one is what we do with well-defined system calls and that's what I label as structured communication and whatever else that does not fit into those well-defined system calls that goes into the bucket of unstructured communication. And we have that because building syscall usually requires creating generic abstractions. So you may be writing some kernel component which could be a driver or could be a file system or could be some other kernel component. And at times you might want to communicate with the user space application. And if what you are doing is well-known or has been done before, you can use system call and things are okay. But then if you are doing bit specialized operation, you may find it bit hard to make it generic. And that's why it becomes bit hard to create new syscalls. iOctel could be, it has been used as one of a way to basically represent the communication which is otherwise hard to be shaped as a generic API. People like it for some reason and people dislike it for other reasons too, but maybe that's not what we are going to talk about today. If you look at the communication, this is at the kernel level, you start with encoding your interface into a command. You define a opcode for that, which is just a number. And your kernel component is supposed to implement something called unlocked iOctel callback. You could see it in the right hand side. And then whatever you do inside it, that's going to be your command-specific operation. And as far as application is concerned, application can use this particular opcode and it can supply that using this iOctel, which is a system call over here. Once this is done, once the whole execution is over, your results are also available. Generally, this is seen as a blocking operation, synchronous operation. Now, if you look at how widely it is used at this point of time, so if you try to search for this keyword unlocked iOctel, you would see that there are around 303 providers at this moment. And that includes all kind of drivers, file systems, and stuff. But of course, these are only the number of providers, right? In terms of number of iOctels, if you want to count that, that's roughly at this point 6384. So it seems that while we have system calls, but there has always been to communicate in a way that wasn't imagined before. And if that's true, and if that remains true down the line too, you may feel the need to think of maybe using iOctel. But then if you do that, you cannot expect efficiency. And that's because it's semantic-wise, it has always been seen as a blocking call. And one more thing which happens is that while you do this, you end up doing copy from user and copy to user. And yes, so these two things are exactly the kind of thing that iOctel loves to get rid of. So now before we go about what is the alternative of it in iOctel, let's see some fundamentals of it. So iOctel, if you look at this particular figure over here, the left-hand side, you see a figure here, iOctel is operating at the boundary of user and kernel space. So similar to VFS you can think of, it's providing the scalable asynchronous infrastructure at that boundary, works for storage, works for network iO2. And communication backbone here is that you go about creating kernel, creates ring buffers which are shared with the user space application. And that's how you do the communication. And efficiency is the core part of the thing. So yes, it goes at length as far as reducing the number of syscalls are concerned and those copies are concerned. Few syscalls, iU ring setup, iU ring enter, iU ring register, but there are so many knobs that you might want to know. And if you look at the communication protocol, so you start, I mean applications start by setting up a ring. When it sets up a ring, you basically get something called submission queue, a ring buffer, and you also get something called completion queue. And within this ring, now you would want to submit your command. So what you do is that you fetch a SQE, you populate it with your command-specific operation. It could be read, it could be write, it could be anything else for that matter. And you can repeat this whole process. You can pick one more SQE. And if you are doing it, what you are doing is that you're trying to batch. And then at some point you want to, you want to tell to the colonel that I have, I have prepared these many commands or one command. And once when you do that, you're going to be calling something called iU ring submit. So that, that basically corresponds to step number three. If you look at, look at the code here. And, and then yes, you have submitted. Now at this point of time colonel will do, you know, iU ring will do his job. It's going to be calling whoever is implementing the read, write or that particular specific operation. And as an application, you don't care about it. At some point when you care about it, I mean, you would want to look at the completion. So you would say that now I want to look at the completion. So you will fetch a SQE from the ring. So the step number four, while it looks like serial, it doesn't have to be serial, you choose when you want to do it. And when you get a CQE, you hear what, what we have done here is that we are waiting for a completion. And the moment we, we get a CQE, we look at the result and then we come to know what happened to it. So here, CQE result is telling whether the read got failed or passed. And once that is done, you need to do something called iU ring CQE scene. By this, what you are telling is that I'm done with this particular CQE. I no longer need it. So this is, this is the existing communication protocol. And this is, this is, you can think that this is generate. This is for any iU ring operation. By the way, if you, if you develop question at this moment, I'm happy to take those. So continuing the basics here. So let's, let's, let's, let's talk about what are, what could be two ways to turn a synchronous operation into a synchronous operation. And I have labeled them as first one as pseudo async. Another one is true async. What I mean by here is that the first one is about you, you have a synchronous operation and in order to turn it into a sink, you, you decide to do it, you do, you decide to upload this to a separate thread. And when you do that, you, you, you start giving the impression that the whole thing has, has become a sink. And actually from some meter point of view, this is how it is. And the advantage of this scheme is that most of the operations can be turned async, like any, any of the existing syscall which is sink, this can be turned async. But the, the problem here is that this thing is not going to scale. Because yes, you, you, you need for every call you, you, you are here. You, you, you basically require, require a thread. And it's just about shifting the responsibility. When we, when we are talking about true async. So this is about not using any worker or thread. This is actually about decoupling the submission from completion. So in this case, we have to submit in a way that the submitted does not have to wait. This is fast scalable, but yes, this requires a bit more work. If you, if you, if you go down this path, now, which model are you doing implements out of these two? The answer is both. So the first one is, is not the default mode. I mean, I even tries to do the true async first. And if it, if that doesn't work in some situation, it can fall back to pseudo async and for known blocking operation, like things like imagine if you are, if you're doing fsync, or if you're doing MKDR, if that's what you want to turn async. Yes. So if, if this is a known blocking operation, so right from the beginning is going to be using a worker thread to execute that operation. Now this is existing part. Now, you know, when we are talking about a sink, I tell, we can choose one of this model. And if you go with the pseudo async, yes, it would be a bit easy, but then those problems would be there. So at this point, the attempt is to aim for efficiency and pick the true async model. And that's what urine command tries to do. So this is also known as urine pass through. So this is a generic facility to attach any urine capability to any underlying command. So any urine capability is important. It's not while I'm talking a lot about asynchronous thing, but this is not the only capability that our urine has. There are other capabilities. I listed some of those submission batching, submission offload, completion pooling, registered file, registered buffer. So all that is really possible. And it's about attaching all those to any command. Now command providers, of course, it remains the same. We saw so many providers of Ioptel. So, so here, I mean, those are very much, you know, eligible. So it, whatever that the command provider is, it needs to collaborate with the eye urine to implement non-blocking submission and the completion. If you look at the user interface, so you have to start, you have to use a new opcode called eye urine OP urine command that's going to go into SQE. This is, this is just like the way you use eye urine OP read or write. This is the new thing that we have to use. Um, the interesting piece here is that the command that, that you, that you need to supply, you don't really have to allocate it externally. You can get it from this SQE itself. So if you are using regular SQE, you will get 16 bytes of space. This is a free space that you can use. You can think that you don't really have to do a malloc and you are having a memory. If you are using, if you want more than that, you say that I want big SQE and then you will get 80 bytes of space. Big SQE is one more facility, a facility that got, got added along with this. So, so both, uh, both got developed together. And, uh, yes. So as an application, you will place your command inside the SQE and eye urine doesn't care much about, you know, what is placed inside it. Uh, this is similar to eye octal that way. Uh, so your application will, will, will supply the provider specific opcode into, into the command OP and that is all. And as far as result is concerned, so yes, as you saw in the example, normally result arrives into CQE and, um, and that's, that's the case now also, but then you saw that in the CQE, we had, um, we had the space for one result. There are cases when, um, when colonel want to inform or tell more than one result. And if that's the case, you want to say that I want big CQE and that also got, uh, added the whole infrastructure got added along with this. So big SQE, uh, if, um, if you wanted, you will have to supply a flag called eye urine set up SQE 128 while you are setting up the ring and that's what it is about. And this is helping with zero copy submission. If you are doing it, you don't really have to do copy to user. The big CQE is, uh, is, um, you can, you can ask for it by setting up the ring with eye urine set up CQE 32. And this is, this helps for zero copy company completion. If you look at the right hand side picture, I think that's what we talked about here. This is a, this is a kind of a loose mapping. So if we translate the eye octal into this scheme, uh, the FD goes into SQE, uh, the, the SQE is represented in the green color. The opcode will be eye urine, OP urine command, the provider, uh, specific opcode, which is OPX for eye octal, that goes into command op and actual command that goes into, into SQE, as you can see over here. And if you want, uh, more space, you will ask for big, big SQE. And I think the right hand side is about SQE and the, the first result goes, goes into regular SQE. And if you want more, you can ask for big SQE. So that was the, the application or user interface, but yes, I mean, if you are, if you are writing a kernel space component, you, you might want to know how, how would you communicate with, with the, with the eye urine. So this is the second part of it. Uh, this is how this, this is relevant if you are, if you are a kernel space provider and you need to communicate to eye urine to implement, uh, your operation this way. So the command provider, uh, is expected to implement, uh, a new callback called urine command. So, uh, you can see in the struct file operation where we had unlocked eye octal, now we have something called urine command. And, uh, eye urine, of course, eye urine would be the first one to, you know, when we are talking about submission, eye urine would be the first one to, to receive the SQE. So what eye urine does is that it prepares, uh, something called struct eye urine command. That's a, that's an internal structure that it prepares out of SQE. And this structure is used for all the communication between, uh, eye urine and the, the, the kernel provider. And if you look at this particular flow chart over here, so eye urine invoked the provider using the urine command callback. Now your provider will, will do his job, uh, it will, uh, it will try to submit the command or do whatever it does. And once the submission is done, it will say that I'm done by, by returning this particular code called EOCBQ'd. And that's, that's all about submission. And at some point, uh, you, you are done with the completion. So you will, you need to inform it to eye urine by, by calling this API called eye urine command done. And here, again, you are using eye urine command, uh, and two results you are able to return. So this is, uh, this is the, the, the simplified communication model. Sometime you would want that, uh, if you are a provider and if you want your completion need to be, need to happen in, in a task context. So there is another API called eye urine command compete in the task. Uh, and you can use it. You can provide a callback function and that particular callback function is, is guaranteed to be invoked in the task context. So this may be useful at times. So now let's look at who, who, who are the people who are using it in the kernel at this point of time. So, uh, as I said, it, uh, this was, uh, this was added in the kernel last year and, uh, NVMe driver was, was, uh, it was developed. I mean, it was changed, I would say extended to use, uh, the urine command, um, along with the whole interface. So, uh, NVMe driver already had, uh, I octal based pass through operation. Um, so you could see, uh, I have shown in the right hand side, NVMe NS care f ops. And if you look at dot unlocked, I octal, so this is how, uh, the pass through operation used to, uh, used to happen. Uh, here you see that now it has, uh, urine command callback too. So this is, this is the NVMe, uh, urine command pass through. What this enables is that any NVMe command, which may be beyond read and write, uh, can be used efficiently. And particularly in NVMe, if you, if you guys know, uh, you know that NVMe is, is, is, um, there are newer ways to interact with the store is, uh, so, so this problem of, uh, coining new system call is, is, is there. It's, it's very much present over there. And the second, the second collaborator, the second user is the U blog driver. Uh, this, uh, was also developed last year. It used urine command from the ground up. If you look at the right hand side picture here, you do not see any unlocked I octal here. It never used, uh, my octal, all the communication it did, uh, with the, uh, with the urine command. Um, and at this point of time, uh, fuse is looking at using this. And in the networking side, sockets are looking at using this so that there, there, there are discussions happening and we might see something down the line. Now, here is a quick example, uh, of, uh, turning this is my enemy. Me. So on the left hand side, you see a quick example of I octal pass through. Uh, we open the enemy. Me device handle. And then we call I octal. And here we use, um, octal code called enemy me out. I was 64 command. And that's it. I mean, simple, right? With the urine pass through yes, code is a bit, bit, uh, bit, bit more. Again, you open the FD, uh, you set up a ring. This time you ask for big SQE and big CQE by specifying these flex. Then you get, you get a SQE. I think important point is, is, is that within the SQE, I think I wrote a comment over here that you extract the command from the SQE, right? You don't really have to allocate it. And this command is, is this command, uh, for enemy me pass through, it requires 80, 70, 72 bytes of space. And, and yeah, it's, uh, uh, it's, it can be, it can be, you know, uh, in this case, it's like freely available. You prepare it, you submit it. And once you receive the completion after doing our urine weight CQE, you can, you can examine both the results. So enemy me really required, um, big SQE and big CQE in order to do it efficiently. Now, as I said that, uh, there are other capabilities in the urine, which, which, uh, which, which are available to the other operations. So, uh, urine command, uh, can, can, uh, is also able to leverage that. So once his facility is, it's called fixed buffer in our urine. So, um, this particular facility is about reducing the pariah cost of mapping the buffers and unmapping those. So, um, there is something called our urine register buffers. If you look at the right-hand side, so you can register, uh, your buffers a friend. If you have got N buffers to do the IO, you can register them a friend. And if you do that, um, during the IO, you can actually use those buffers. You can specify the index, uh, for any of those buffers. And, uh, the result would be that you will be having a slightly better, uh, or a lot better, depending on, you know, how your workload is, uh, efficiency and the reduced CPU. And this is available for regular read and write operations. For urine command, also this is edit. And the interface is, the first part remains the same. Application is going to register the buffers in the same way. That part does not change. Uh, the part that is new is that in the SQE, urine command flag, you can specify something called our urine urine command fixed. If you do that, uh, you will start using this particular capability. Uh, and then you specify the buffer using the buffer index, some number and, and that's all. And as far as your provider is concerned, your provider can, uh, can use this particular pre-map buffer by, by calling this particular new API called IUINN command import in fixed. So this is about, this is a kernel space API. At this point is named this, but you know that it can be changed on the line. If you want to, um, look at the examples. Uh, I added a FIO, uh, use a space example. And for the kernel space, the send me, me, uh, example here. Other capability is IOPOL. So IOPOL is about, um, not waiting for completion, but polling for completion. I mean, if you're, if your operations are, uh, are going to be completed very fast, probably you wouldn't really want to wait for it. Uh, you may get a bit, bit, bit better efficiency by, by polling, uh, for it. You decide not to basically sleep and, and, uh, undergo the contextages and all. So this is, uh, this is, is, is, is anywhere available with, um, with the, you know, regular operations, read and write, but for URINN command, also this is, this is available now. Application only needs to set up IUINN, set up IOPOL, and that's all. This is what application anyway used to do. As far as kernel is concerned, if you are writing a provider of it, uh, submission is same, no change for the completion. Yes, you may have to implement a new callback called URINN command IOPOL. And again, examples are available over here. Now this is an example of, of, uh, uh, efficiency. So what I have shown over here is that, um, that, uh, for NVME. Um, so NVME pass through, as I said, that it has been turned into URINN command base pass through and how it performed after that, this is, this is trying to, uh, show those numbers. But this, we are not really comparing with IOPOL. In case of IOPOL, we know that it does not really scale. Uh, so as I said, this is not the best example. But, uh, here, this shows, uh, the IUINN based pass through is slightly better, uh, than IUINN base block IO or direct IO. So, uh, we are comparing with, with, with a, with a, uh, most difficult competitor here, uh, not, not with the sync pass through. So what you see over here is that it's maybe the last one. If, uh, if you look at the last graph, uh, this shows that the peak performance is 11% better. Uh, these words, I mean, if you say base, by base means this is like, this is the, the performance without having any knobs. Um, base plus FB is the base plus fixed buffer. Fixed buffer capability is, is like hiking the, uh, the speed of both the, uh, both these things. Uh, and base plus pole is IO pole that I talked about. Uh, again, the IOPS are improving with that. And the last one is, is like combining all the knobs and see what's the peak performance. Uh, eventually it gives 5 million IOPS and that's the device limit in this particular case. This device is capable of giving 5, 5 million IOPS, uh, for 512 by random read operation. And, um, and that's, that's possible, uh, at this point. If you look at the upstream status, uh, I unit command initial support was added in 5.19 last year. And along with that, and we have a unit pass through as a user as a collaborator was, was developed. Big SQL and big SQL facility was also added at that point of time. The second user, which is UBLK that, that, uh, that got added in 6.0 and the efficiency knobs I talked about, uh, I only mentioned two, but some more are there. Uh, I think I only mentioned those which require, uh, bit of a space change. So, uh, those were added in 6.1. And, uh, and I think we talked about there are other users which are, which are being developed. So, so that's all, uh, that's all I have. Uh, happy to take if you have any questions. Uh, uh, actually, I'm quite new to IEU ring, but I want, uh, my impression is that so it's IEU ring in overall, in a general sense, is trying to execute system course actually, uh, without, I mean, without the overhead of this course and using ring buffer. So, you request in ring buffer and corner somewhat excuse the object in some issues and complex. And so, uh, I wonder, uh, not every system score can be uh, converted to, uh, IEU ring, IEU ring. So, what kind of system course are, uh, replaceable by IEU ring? So, at this point, there, there are, there are many operations which, which IEU ring supports. And I think the number is only growing. Uh, it's just that, uh, some operations, uh, may be more efficient because they have been implemented using true async manner. While some operation may not be that efficient, uh, because they, they just, you know, they are, they are hard to be converted into like in a true async model. Uh, so yes, you have, you have good number of them. And if you have something new or something that, that does not really fit into existing system call, this is, this is the way I talked about. Uh, this, this is something that, that may be useful to, to build. So either you can think of, you know, changing your existing system call, uh, into async using, you know, using the regular model or you can think of using this, uh, and, and see what may fit. So I saw the, the traditional, like the auto model, it's always synchronous, right? But now I owe you ring is asynchronous. So let's say some subsistence, they adopted out your ring and make the alkytos working with, uh, make the alkytos working with our urine. But then the user, when they switch to used out urine, then they're basically switching from a synchronous model to an asynchronous model. So the application the layer that calls into that would have to adapt. Is that, is that right? Yes. I think if I got your question, right? Yes. I think if you are looking at our urine, you are, you are looking at moving from, from synchronous model to a synchronous model, right? It's just that we didn't really have, you know, in, in case of our urine, uh, we didn't really have, uh, uh, you know, any, anything that was, that was mapping to iOctel. Uh, if you were a, uh, you know, if you were a canal provider, existing canal provider, and if you had things which, you know, which you were doing using iOctel and Amazon, right? But now, if you cared about efficiency and, um, so you can think of using this. I think, no, this is what I'm trying to say. Otherwise, yeah, overall, I think if I got your question, right? This is just about, yeah, moving from synchronous model to asynchronous model. Yeah, I was just thinking like if you switch, then if you just do the simple stuff, like you call iOctel ring, and then you have the same thread to wait for that completion that you don't really save anything. But you have to do a more, like, higher level refactoring on the application. Are you talking about, are you talking about the, uh, the first method, which I talked about here, using a worker thread to turn your sync operation into a sync? Um, can you say that again, please? So, I think what you said was, was about, um, so one of the easy way to convert a synchronous operation into a sync is, is just use a worker thread. You can do it as an application or someone, someone in the kernel can do it for you. It can create an internal kernel thread and it can turn your operation into, sync into a sync. You can do that. But that, that's, that's, that has scalability problems. You can have, uh, you know, uh, finite amount of threads, right? All right. Just to maybe answer the other part of what I thought your question was, if you use, IOU ring in essentially synchronous way, like you were saying, of Q and operation and wait for it, then yeah, you will get essentially synchronous behavior because you've written a synchronous program still. Um, I mean, you have to write applications in a different way to use IOU ring. Um, I mean, you'd mentioned switching I octals over to IOU ring. Of course, nothing in kernel will switch like that in a way that would break user space, because that would not be allowed into the kernel. Um, you could add something to it. I would also take issue with the idea that I that I octal is inherently synchronous because there are an awful lot of asynchronous octal calls. Yeah. In the kernel. I agree. It doesn't have to be that way. Yes. And I think sometime it is seen as a hack, but yes, at one point, that's how things were. Yeah. I think now this is more clear that if you are using Uring command, the intents that the whole intention is very clear that uh, we really want to do, uh, you know, a sync model. I think you're right. I think it's a SE operation I think that's how it is that the I tell is being used in a way that and that it is trying to do a syncopation. Yeah. Yeah. If you look at some like the video for Linux interface, right? That's all asynchronous and it's all done with the octal. I saw the the I already in the face to the user. Um, so you will have, uh, a week. I don't remember the function call. Basically, you will wait for the data when it's available and that will be blocked. Right. Uh, to you submit a job and then you you can do whatever you you want to do. Um, and then eventually you have to wait uh, for the notification from Colonel that's you. I have new, I have data for you. Right. Let me see. Okay. Go to the API. Yeah. The maybe step number four. It's number four. Yeah. So, uh, what about if at my application needs multiple uh, rings? Um, and then um, in this model, I have to figure out which which queue which rain I should wait first. Is there anything like a pool? We so if one ring is ready, so, so you're in this case, you only you only have one ring and and generally you are not, uh, you're not bothered about. I think what you if I got your question, right? Yeah. So, um, your point was that that uh, you have to call our using wait, CQE at some point of time. What about if I need and you rinse and if you do not really want sorry. If I, uh, um, I know this is the interface is for one ring. I'm thinking like in the current like pool uh, system call, you can wait for a multiple. Um, uh, at these, right? So, I'm thinking about in some scenario, maybe I, my application, I need to communicate with Colonel through multiple rinse. So, in that case, how should I do the waiting? So, if you are having is different ring, you will be using, um, you will be using are you in wait, CQE and you will specify that particular ring as a input to it. Um, so, if you, if you decide for some reason, if you decide to use multiple rings, yes, you will be, you will be doing that. But, but, but, but it's like this. I suppose that, uh, in this particular example, yes, this is seen like, you know, sequential step, but, but you can actually do, you can do multiple submissions. And what I have not shown here is something called user data operation in SQE. So in SQE, there's a field called user data, um, what, what is the completion that you, you know, that, that you got. So, when you do, are you in wait, CQE? This means that you are okay to wait. If you don't want to wait, are you in provides API, which, which basically, uh, allows you to just kind of, you know, poke or just have a, have a, have a look at it, whether any CQE is available. If any CQE is available, uh, you will get it without having any weight. You'll, you, you know, you'll not be blogged. And if you, uh, uh, if, if the CQE is not available, you will, um, you will anyway get the control back. So the weight problem is sorted over there. But at some point of time, yes, you would want to know, uh, whether you got the completion for whatever you submitted or not, right? Otherwise, there was no need to submit, right? And, and now, if you look at the CQE, you would, or you would want to look at the user data of CQE, to know what was, uh, what was the corresponding submission. And you map that and you know that, okay, now I receive the completion of this particular, I know this particular IO. Now this particular thing is, is, is, uh, you know, you can apply it on single ring or multiple ring that, that, that is, that I think does not matter much. So does that answer your question? Oh, I think maybe part of, perhaps the part that's being missed here is that having multiple rings would be a rare, you wouldn't normally do that, because you can communicate with multiple devices, multiple file descriptors to a single ring. So you can submit a whole lot of operations to many files or many devices, all in the same ring. Right, they'll go in, they'll come back in a different order. You have to keep track of them and that's what the user data that he was mentioning is for, so you can keep them straight. But it would be kind of strange to use a second ring. There's not really a reason to do that. You don't need to. Right. I think that same thing can be seen in the code as well. If you look at SQE FD, so you are populating only one SQE in the whole ring. You can get one more SQE. You can give a different FD into it. So this is how multiple devices can be or multiple files can be operated with the same SQ. I mean, same ring. It's like, you know, you just have to create your populate different SQE with different file handle. In the face here. We started with setting up the ring. Right. That's the first thing we did. And after that, we picked one SQE and we decided to prepare it. And when we prepared it, we specify one FD. You can get one more SQE. You can give a different FD into it. I think that's what was explained. Right. I mean, just know. That's SQE. Sure. Definitely. Yeah. I mean, this is an example meant to show the usage of this one specific device. Right. It's not meant to be a general usage example for IOU ring. There are there are more complete examples and documentation out there. If you look for it. Indeed. Yeah. So do you expect that Ioctal interface will be deprecated at some point in time and we will completely switch to your ring? No. I I I I I I I do not know that it just that I I feel that yeah, if if you if you need to develop I Ioctal and if you also care about efficiency, maybe this is something that you might want to look at. I mean, yeah. So like it's safe to add new Ioctals to new drivers, for example, in the kernel instead of using your ring in some single cases. But thank you. Anybody else? So I think so then thanks for thanks for attending and listening. And if you have questions down the line, feel free. Once you look at the examples or whatever. So thank you. So it's more like for Ioctal ring in general. So I was wondering now that we so you have a ring and then you have the user space and kernel sharing the same memory. Is there any security risk or maybe there's already some ways? I don't know about. So what if somebody the user supplied a payload and put it in there into the shared memory and then the kernel starts to pick it up and working on that. And then the user can basically change that payload at the same time. Would that happen? So as far as kernel code is concerned, so all the code is reviewed and all the codes like you can't have a random thing. I mean, you know, whatever. Well, all these providers who are looking at SQEs, you know, this is this is I mean, all this need to be reviewed, right? People are reviewing it. That's why they know that's how that's how things have been added. So so I mean, it's not the case that that kernel is going to pick something which is which is, you know, which is which is going to bring it down. Yeah. Yeah. I mean, I mean, what he's saying is absolutely true, but it's really just like any other system called interface, right? When you always have to pay attention to what user space is passing to you. And the kernel will always copy the entire SQE out of the ring, at least and then operate on it so that it's out of. Yeah, exactly. The SQE will be copied, but that's 64 bytes. That's I mean, if you're doing an IO operation, that depends entirely on how that IO operation is implemented, but it may or may not be copied. You know, if you're doing Buffered IO, it's going to be copied into the page cache or something like that. But but the command will be the command will be copied out if it is implemented properly. I can just quickly add that for you block. There is a separate shared memory for a kernel and user space that's used for the IO so that only the descriptors go into the ring and then separate shared memory for the IO buffers. Yeah, yes. Yes, if the if user space is changing the memory while kernel is operating on it or you block is operating on it, that's not good. So don't do that. Yes. But I mean, it's IO data. You're not going to have any control pending on what's in the data. You're just going to move it somewhere else. Yeah, leave it alone until IO completes. It's the same if you're like writing stuff like to a file and if you change the buffer that you're writing to the file from another thread at the same time, it's a bad idea.