 Hello, everyone. So, this presentation is about exporting virtual memory as dm above. I think 2 to 3 years back we had dm above concept presented in the same ELC. Now, this is just a different angle to it just trying to export virtual memory as dm above. So, I will just start with some introduction from my side, would not work ok. So, hi my name is Nikhil Dev Shatwar, I work at Texas Instruments Bangalore India and mostly I work on the kernel part. So, the key areas that I work on is video subsystem, camera drivers and recently I have started working on the Linux and RTOS implement RTOS integration. So, mostly for the automotive use cases there is a trend that you want to run Linux and RTOS on the same chip. So, I am working on that. So, I have not much done contributions in the open source community, but a few contributions from my side are done for V4L2 drivers and in the devices compilers ok. So, this is the outline of the presentation. So, what we are going to talk about is we at TI faced a problem when trying to integrate the Linux and RTOS operating system applications together. So, we had some problems we tried to fix it in some way and then I think that the solution that we have taken is generic enough that it can be applied to the community use cases. So, we will go through the specific solution what we dealt with and then we will try to jump to how this can be applied to a generic level and then we will go through the implementation details and then if there are any security concerns or if there are any questions I can take. So, I will just point out that the solution that I am trying to try to propose is not meant for all the use cases it is only meant for like the embedded use cases. So, there are some lot of assumptions behind this I will clarify as and when we encounter them. So, this is the typical use case I would say in the case of integrating Linux and RTOS applications. So, typically you will have like the memory divided into two parts. So, most chunk of the memory would be used by Linux and you will have a dedicated memory region which is carved out for the RTOS. Now, you may have different types of applications accessing these memories, you will have like typical infotainment applications running on Linux, ADAS applications running on RTOS. So, this is stopping talking from the automotive point of view. So, that is why I have mentioned ADAS applications, but in the middle you see that there is info ADAS. So, what I mean by that is this is informational ADAS. So, basically these are the applications where you will use the HLOS features from Linux, but at the same time you want to utilize the hardware acceleration and the algorithms that are running on the RTOS. So, if you look at that application, so these kind of applications would be accessing the memory on the Linux as well as on the RTOS side. You can see that the buffers that it is using is actually red and blue which is which indicates that it is using the Linux and RTOS memory. Now, the problems when we are trying to build these kind of applications is that the typical architecture does not allow you to like have access to other memory which is carved out from Linux. So, what we want to have is we want access to the RTOS memory at the same time, not just accessing that memory, we want to give that memory to other drivers like typical use cases for these kind of applications would be let us say you will have some camera capture running on RTOS application and there are some algorithms running on the capture and then after everything is done you will have the video data coming in and that video data you want to display on to Linux. So, because you want to use the HLOS features you will have the fancy graphics GUI and all the stack that is running on the Linux and the standard Linux drivers like the DRM and standard application frameworks like Gstreamer, Wayland will be used to you know consume those buffers coming from RTOS. Now, the requirement here is to be able to access the RTOS memory which is not really part of the Linux that memory and share that memory with the standard Linux drivers that we have in place. So, I will start with a short note on what DMA buff is. So, DMA buff is generic mechanism to share buffers across the driver. So, Linux kernel has this whole framework design in such a way that you can have one driver which is allocating the buffers and then you can import export that buffer into just a simple anonymous file descriptor. Once the anonymous file descriptor is available at the application level you can pass this descriptor all the way to different different drivers and each of the driver would be able to find out what that descriptor means. So, from that descriptor you will the driver will be able to figure out the physical addresses and then the corresponding driver functions can be done. So, if anyone has any questions on DMA buff I would like to take that now. In this case there is only one Linux only one OS I am showing the previous diagram. In this case these two OS are running on the same hardware, but there is no hypervisor these are actually running on two different processors. So, RTOS will be running on a dedicated. So, this is a heterogeneous architecture kind of thing where you will have two processors one which is let us say ARM A15 series running the Linux and you will have a typical low latency processor design for the RTOS kind of application. So, there is no hypervisor involved here these are just two separate CPUs running two different OS and for different use cases yes yes. So, this is a typical embedded use case where you will have common DDR there is no separate GPU memory or anything of that sort and all the memory accesses is driven through a common emf interface. So, as I explained that the DMA buff allows you to share buffers between drivers and all that, but all of this is specific to Linux because there should be a Linux driver which exports the DMA buff and then of course, there are lot of DMA buff importers which can take those DMA FDs and then figure out what is the physical addresses and access the memory corresponding to that. So, what we have done is to solve the problem that I face like we want to access the RTOS memory as well as share this memory with the other drivers. So, in this case what you see here is that on the left side we have the RTOS stack. So, what I have here is that the RTOS application has some memory, but this memory is mapped into the Linux space by whatever means like currently you can simply use Dev mem or you can have a driver the more typically the driver which is handling with the RTOS loading and remote prop driver let us say you can map the memory into the application. So, once you have mapped the memory you will get the virtual address. Now, this is what I mean by exporting a virtual memory as DMA buff. So, typical use cases in Linux you would have seen that all the drivers that do support DMA buff export are most of the time are allocators of the buffers. Like if you take V4L2 for example, DRM for examples only the drivers which allocate those buffers are capable of exporting the buffer as DMA buff. Now, in this case what I am trying to demonstrate here is that we have a virtual memory which is mapped into the application and then you have a completely different driver which is no connection with the memory that we are talking about, but you give the pointer of this memory to the driver and it exports that memory chunk as the DMA buff DMA buff and then you can get that as a file descriptor. So, the way yes right yes yes. So, there will be a common bus and there are multiple processors and peripherals accessing the memory through that bus. So, at the same time Linux and RTOS can access the memory, but of course, the memory is divided divided as in logically divided. So, if you have like 4GB or from 3GB will be given for Linux and 1GB or maybe half GB given for RTOS applications something like. No my point is like the memory is there 4GB it is up to the software configuration how much you want to carve out for RTOS depending on the applications need you may decide like 128MB all the way to 1GB depending on the application and use case that you are running on RTOS. I mean this is a specific solution, but it does not have anything to do with the idea of exporting the virtual memory you had some question. So, in other words you have not like in virtualization it is common actually allocate large pages to avoid page pulls and then other you can actually have your host to be Linux OS running with standard 4K pages. But that present problem I am trying to understand the underlying assumptions here with how to make this work. So, to answer your question the concept of paging applies mostly in the case of Linux. Now, in case of RTOS you will not have like like multiple layers of levels of translations and RTOS application is as simple as you can just map a address and you will start writing into it because they would not be lot of paging or translations because it is very simple the processor that is running RTOS is going to be a very simple processor. So, you would not you have the paging paging facility there. So, you would be it would be a direct access to the physical addresses. Is pre allocating and pinning buffers you are not going to have any paging or any other considerations. No, no, no. This is a dedicated always in memory resident contiguous range of Yes. More than 4K or ever many. Right. So, in the device tree we typically carve out some memory region and of course, that will be contiguous memory region and you specify those region values at the start of the boot up of the RTOS processor and then all the RTOS applications will be able to start allocating buffers from that region. So, I know I have started with like lot of complicated things, but if you have any questions I would like to clarify them because I am going to build on this. No, the virtual address of course, will be for the meaning the mapping of that buffer is done in the application. So, the virtual address is part of the application meaning user space. So, the point here is that you can map the memory and get the virtual address and the way this is the solution is designed in the way is that you get a virtual address and from that virtual address you give it to a new driver the one that is mentioned in the green box VMM exp I call it virtual memory exporter. So, this VMM exp driver is the one which is newly created driver. So, what it is doing here is that it just takes the virtual address of any memory and then it finds out what is the corresponding physical addresses corresponding to that memory and then once you have the physical addresses you can export this as a dm above you need to implement a bunch of dm above ops operations and once you have implemented those ops then you can give this dm afd to all other Linux drivers and from that point onwards all the Linux standard Linux drivers which are capable of importing the dm above will be able to access those buffers using the dm above ops that have been implemented by the VMM exp driver fd no so it says that when once you get the iuctl with the virtual address all you get is a dm afd. So, that is just anonymous file descriptor representing the buffer. So, it is much similar to what would happen let us say in v4l2 if you let us say unlock a buffer and then run an ioctl to let us say vbuff ioc underscore export you get a dm above fd. So, that is just a anonymous file descriptor which is again representing to the memory. So, in so concept of dm above is that user space need not have physical address or even virtual address you just get one file descriptor you can map it to get a virtual address, but in general when you export you just get the file descriptor the same case is happening in this case also. The only difference is that VMM exp driver is not the owner of the buffer it just has been given a pointer to the application memory and it is trying to figure out the corresponding physical addresses based on the memory. So, this was a specific solution now we will try to apply the same solution on to the to make it generic. So, what I am trying to explain here is that what we have done here is to take a virtual memory and then export it as dm above the advantage that we get with this is. So, now you can remove the RTOS in from the picture because RTOS was only introduced for the problem that we saw. Now, you can have any memory that is there as part of the application and you want to share this memory with the other Linux drivers that you have. So, to work with other drivers which are capable of dm above import you still want to have that memory to be represented as dm above. Now, this driver enables you to convert any virtual memory chunk into the dm above. So, the way that so, the use cases that are enabled with that is you may have the memory mapped directly from let us say slash div slash mem device you may have that memory from file descript file or you may have this memory mapped from a different driver. So, the possibilities here are countless you just need to have the virtual address pointer once you have the virtual address pointer you just get the fd and give it to other drivers and from that point the use cases seamless. So, the ABI here is simple character device you will have like a slash dev slash mem exp device and then currently there are like only 2 to 3 IOCTLs defined. So, it is a character driver with custom IOCTLs defined you just need to there is a stand there is a data structure defined where you need to pass a virtual address once you pass the virtual address to so, as it says in the diagram before you need to pass the virtual address to the driver and once the driver is able to do all the translations it will return dm afd which application can use to pass on. So, as you pointed out how would you able to find out the physical addresses because to work with dm afd you need to have the physical addresses the only way all the like as I mentioned typical dm afd exporters have been the allocators. So, if a driver is allocating the buffer that driver surely has the physical address because that is the one which is allocating the buffer. So, in case of v4l2 and drm that is very easy, but in this case all you have got is virtual address and you need to convert that virtual address into the physical address to actually find out the different pages that the memory points to. So, the way this is achieved is by doing a software page walk. So, I will go through what exactly that is. So, basically you have a virtual address and then you need to find out what is the what are the different pages which are mapped by that virtual address and then once you have the list of all the pages that are pointed by the virtual address you can export them as dm afd. So, yeah just highlighting one more time the features of these drivers are like you can export any virtual address. So, by emphasizing any what I mean by that is this is not specific to the RTOS application you can have memory mapped by a different driver which does not support dm above export like you will have lot of CMA drivers like for example, Texas instrument has this driver called CMM which is used for handling the CMA regions continuous memory allocator, but that driver is not capable of dm above exporting. So, you can just allocate the buffers, but once you have allocated the buffer you cannot give it to any other driver simply because the driver does not support dm above export and this is just one example I am sure in community there are lot of drivers which are capable of handling the CMA buffers they are capable of allocating the buffers you can map the buffers, but that does not have the support for dm above export and essentially you cannot use this with the full chain of all other drivers which can only work with dm above import. So, that is one use case which is possible by you just map the different memory and once you have the memory mapped into the application you can export it as dm above. So, when I say page table walk so, this is a somebody had a question alright. So, this is a typical page table walk diagram that you would see like this is an example for 32 bit ARM processor where you will have like a 32 bit address now that address is divided into different parts like you need to go from the process specific page directory from the PGD, PMD, PTE and then you will have the offset. So, in a typical Linux kernel depending on the architecture you may have the size of PGD, PMD, PTE depending on the architecture it may vary this is just the example given for the 32 bit ARM processor. So, the logic here is that the address is divided into multiple pieces and you just need to figure out the right page pointing to that. So, you find out the process specific directory and you it is an array with the contents pointing to a different page. So, basically you just need to take the PMD and index it with the PGD and then you will get the PTE. Once you get the PTE then it is again an array of the pages you need to index the page table and then you will get the actual addresses. So, this is how the typical paging works, but most of this is actually done by hardware. So, of course, all the architectures I mean at least the ARM architecture, Intel architecture have support for these in hardware. So, everything is happening in hardware, but in this case what I am what I am doing here is a software page table work. What I mean by that is I am not really accessing I am not really interested in accessing the memory pointed by the virtual address. What I am interested is I want to know what the physical address is pointed by that virtual address. So, what I am doing is I am just trying to go over the kernel data structures trying to find out where exactly these virtual address maps to. So, typically the virtual address is in contiguous space, but in DDR or in the physical space that might not be contiguous. So, if you have like a range of 1 megabytes of virtual address range that may not mean that or the physical address is actually contiguous you may have pages of 4K scattered all over the memory. So, you will end up finding out a scatter gather list of addresses for the memory chunk that was contiguous in virtual address. So, this is a typical use case like in the case of contiguous virtual address it need not be the case that the physical address will also be contiguous. Any questions on the page table software page table? Of course, yes, yes. So, driver did not do the full page table. So, there are in Linux kernel we have utility functions for page walk exactly and the driver uses those functions. Yeah sorry over commit what exactly do you mean by that? Yeah. So, that is a really nice question actually. So, later in the slide show I would show that in this implementation the current implementation we are not triggering the page faults. So, the question here is to that how are we handling the over commits in the sense like sometimes application may request that I want to allocate 1 megabytes of memory. Now, in this case kernel may not actually allocate the 1 megabytes of memory. So, there are virtual addresses for which there is no physical address. So, the only way the physical pages will be allocated is when the application actually starts using the virtual addresses. Now, his question is that how will you find the physical address if there is no mapping set between the virtual and physical addresses. So, the only way that can happen is if you trigger the page fault. So, generally the trigger the page fault will be triggered only if the application is trying to access that memory and that is generally handled by hardware because hardware the memory management unit will just trigger a interrupt and the kernel will take care of making sure that there is a page associated with this. Now, in this case you are not accessing the memory you are just trying to find the mapping, but you want to make sure that there is a mapping before you try to get the mapping. So, that can be done by triggering the page faults manually that is the theory, but it is not currently implemented. Sorry, what do you mean by special mappings? So, this part is done in kernel. So, once the application gives the virtual address all of the next all of the next procedure is done by kernel. So, kernel will get the physical address by walking through the page table. I am not able to hear you clearly. Yes. So, this is just an example depending on the architecture it will be a different PTE PMD, but in kernel the APIs that are there are generic enough that the same code that we have will work on different architectures depending on the different sized PTE and PMDs that we have. Any other questions? All right. So, I have been talking a lot about you know trying to convert the virtual address into the physical address. So, but very unlikely we will use this to the extent that I have been talking about. Typically, we will use it for the buffers which we really intend to share. Like for example, there is no point sharing a non-contiguous buffer with the driver which expects a contiguous buffer. So, let us say if you malloc a buffer of course, that will be contiguous in virtual address, but it will not be contiguous in the physical address space. Now, you can export that buffer as a DM above you will get the scatter gather list, but if you get the if you get this DMFD and give it to a driver which expects a contiguous buffer it will simply fail. So, the driver that we are talking about enables you to export virtual memory as DM above, but that does not mean that it solves your problem of you know it acts as a MMU it cannot act as a MMU. The device or the peripherals are not smart enough to work with scatter gather. So, if the device expects physical memory, physically contiguous memory you got to pass the buffer which is actually physically contiguous. Even if you pass like a virtually contiguous buffer you will get a scatter gather list with let us say 100 entries. So, the imported driver will simply fail because the physical buffer is not contiguous in space sorry it is not contiguous in physical space. So, even if the application crashes when the application crashes you will have all the file descriptors associated with that application will be closed at the time of killing the application or exiting the application. So, it is the kernel's job to make sure that the DM above ref count is decreased by 1 every time descriptor associated with that is close. And once the DM above count ref count reaches to 0 because when I am talking about ref counting because the same DM above will be used by multiple drivers. So, at one point there would be a case where when the application has crashed all the DM above FDs will be closed one by one and at one point of time the ref count of the DM above will go 0 and then the kernel kernel callbacks gets triggered in and at that point of time you can do all the cleanup that is necessary to handle the scenario. Does that answer your question? All right. So, this driver supports both physically contiguous buffers or scatter gather buffers it is generic enough, but it is up to the usage of how you want to use it. Second point I want to mention here is that I have been talking about pages. What I mean by that is if you give me a virtual address which is not really page aligned. Now it is a question about how do I share this chunk of memory with another driver because I will typically share a scatter gather list with let us say these are all the pages that are part of this DM above. But I can include offset, but most of the DM of imported drivers that are there currently they do not respect the offset. So, even if you give let us say 10 pages, but you say that the first page offset is let us say 2k the driver generally or typically ignores the offset and starts using the pages from the first byte itself. So, it is generally recommended that you try to use this method only for the cases where the physical or the virtual addresses are page aligned, but it will still work for the non non page aligned use cases, but it is not recommended because other drivers do not respect the offsets. No, it gets the DMFT, yes correct. From the access to the physical again in the kernel space. In the kernel space. Yes. So, this diagram here explains it. Do you give a virtual address to the DMFT? Correct, no, no, yes, yes. Virtual address to a very specific physical address that you want to have access. First thing in very rare use cases in embedded applications you will not have most of the swapping in place. In case you have the swapping in place, we will have the control in such a way that whenever you are accessing a specific virtual address and exporting it as DM above, the driver make sure that the it pins the buffer into the memory. So, that at any point of no, no, no, this is all happening from the Linux. So, let us say if you have a virtual address, you give it to the kernel driver. Now, the kernel driver make sure that this page is pinned into the memory, so that kernel does not swap it out. If the kernel does not swap it out, that virtual address also remains to be same. As long as the DM above in question is in use, correct. How does this exporter driver know that when an application gives it a virtual address of dead B, that dead B corresponds to physical address A, B, C, D? By doing a page table walk, but once you have done the page table walk, you find out all the pages and you pin them. So, you make sure that the swapping does not swap out. Sure. Yeah. Oh, sorry. I have taken a lot of time. Okay. I will just run through quickly. So, yeah. So, basically to make it generic, what I am trying to say here is that you can map a memory, you can map a driver handle, you can map a file and for each of these use cases, you will have a virtual address and the corresponding physical addresses. So, depending on what happens in the page fault handler, you may be simply accessing the memory, you may be actually reading from the file or you may be reading from let us say driver specific callback. You just trigger the page fault handler and then get the memory done. So, the way this will come up in the use case is that you can somehow map the memory into the application, get the virtual address and then pass this virtual address to the driver and then start sharing that memory. So, yeah, as you said, now we will move to the part where how does this apply to, you know, generic use cases. So, I talked about the regular Linux frameworks like G-streamer and Wayland trying to use these, use these memory. So, a typical use case, what you would see is, if you allocate from DMM of exporter, you need to export using the same driver. The first thing you have to do is allocate the memory and then you have to export it. But in this case, the advantage you can get is, you can allocate first, sorry, you can allocate from any driver and then you can export. Like in the typical embedded use cases, you will have let us say 10 drivers, but out of them only one driver supports DMM of export. Typically, the DRM driver supports the DMM of export. So, you allocate from DRM and then give it to all other drivers. But in this case, you can actually allocate from a different driver. As I mentioned, there might be a CMA driver that you are interested to allocate buffers from. You can allocate the buffers from there and then simply export. So, this avoids the dependency between the allocation and export. So, another point to look at it this way is that from G-streamer use case perspective, you will have like multiple pipelines, multiple elements in the pipeline. Now, the way some of the plugins or elements are written in such a way that they allocate their own memory. I mean, for example, let us say video test as I am just giving an example, it might not be correct. Let us say video test as I say it is a software element which is just to generate a test video data. Now, it does not have any special buffer requirement. So, it will just allocate buffers on its own using let us say malloc. Now, you cannot use this memory to share it with other drivers because simply it is not a DMM buff. So, the only way it will work is you need to allocate buffers using DRM, share those buffers with the source element and then generate the content into it and then use it for the all whatever operation you are supposed to do. So, there is a dependency on content generation. So, even if you want if the element wants to generate content by its own, it is not allowed to allocate buffers of its own. It needs to ask somebody else to allocate buffers for itself and then write content into it. So, with this what the this dependency is not there because you can simply ask the element to allocate memory from and then you can share it with the other driver. So, second part is about the memory sharing. So, in typical compositor applications you will have like graphics applications which are acting as a clients and then there is a composition server like western or x11 valent. So, basically the communication between the client and the server happens via socket and the memory is shared using the socket because these are two different processes you need to use shared memory to access to share buffers from let us say a client to server. So, a typical graphics example if you take a valent application. So, in such case if you have a texture or shader which is run from a client application now that needs to be allocated in a shared memory. First you need to allocate a shared memory and then render all your content into that one only then you can give that buffer to the valent server for it to be displayed in a zero copy manner. If that does not happen then you will let us say allocate in your own application space and then when you give that buffer to valent that will not happen. So, the way internally valent would do is like it will allocate a shared memory copy your buffer from your application into the shared memory and then start using it. So, by doing this you are actually essentially sharing one processes memory to another process by the means of dm above. So, I think this diagram will be helpful here. So, it is simple diagram where you have like two processes process one and process two and let us say the middle level is the virtual address space and then the middle level is the applications virtual address space and the final level is the physical address space. You can see that each virtual address may be mapped like different addresses you may have some shared pages which are common for like two processes. Now, if you use the virtual memory exporter driver and if you want to export one of the memory chunk from let us say process one to process two. The way it can be done is you can get this virtual address give it to the virtual memory exporter driver and then you will get the dm fd. With the socket fd passing you can pass this fd to the different process and then map it. So, once you have mapped you are actually accessing a different processes memory as a memory from process two. The way this happens is now you can see that after the export and map the same page which was earlier part of dedicated for process one has now become a shared page. So, in this case you are not allocating shared memory and then utilizing it you have already utilized a memory you already have a buffer which is part of an application. Now, by doing this you are sharing the existing buffer into another process. This is quite useful when you have to integrate let us say open source applications like you may have lot of. So, the second point here is about components allocating own memory for buffers like you will have lot of G streamer plugins, custom shaders and textures. Just for prototyping purpose you want to utilize them as is. Now, if it is allocating buffers of its own you cannot pass it to a different process and the use case for passing it to different processes is also a value use case like in case of valent or any display compositor application you need to pass it to a server and which is going to be an essentially different process. So, this will help you to solve that problem without doing any buffer copy yes, yes DRM does yes, yes, yes. So, now DRM supports importing of DM above. So, as long as the buffer constraints are met I am not saying that you can you can export any memory, but you cannot consume any memory. So, if the memory is contiguous and DRM is ok with that contiguous memory and if it is if it satisfies all the constraints which DRM expects for importing it will be able to share that memory with 0 copy yes. So, we have not done any performance analysis, but surely in the case of embedded use cases you will not be doing lot of maps and unmapped in the sense in case you want the in case you want the access of that memory from CPU only then you will map in a regular use cases you want the the device or the peripheral to access it. So, in such cases only DMF buff would suffice only if you want a CPU access in that case you will do a memory mapping and as you said if you do map you need to make sure that the caches for that process is updated. So, that the new mapping are reflected, but of course, if you want to do a CPU mapping you need to make sure that the CPU the cache overheads are I mean you need to plan for that. I think last two points is like yeah export as DM above and share across process I have already talked about that. So, basically this is almost like share memory, but it is in a different way that you are actually sharing an existing memory. So, you have a memory from one process and after some point of time let us say you think that this part should have been a shared memory. Now, the only way it can work is you allocate a new buffer which is a shared memory between two processes and then copy the contents, but with this driver you do not have to actually do any copy you can just simply use the DM above pass the DM above using socket and then once you have done this passing you can map it to a different process. I think that is it yeah. So, I have already talked about the use cases of course, the first thing that I would say is this solution helped us to integrate the RTOS applications with Linux. It enables you to use some of the drivers which are not capable of DM above export you just map the memory and then use this driver which essentially provides you the DM above export capability. Then I mentioned about the shared memory map from different drivers. There is another interesting use case like now I am talking about GPU and display specifically here only for this specific use case I am talking about. So, GPU typically has a MMU. So, it is not like a dumb device it is a smart device which has got a MMU. So, it can handle scatter gather pages. So, with this driver I am able to actually allocate a buffer from using malloc and then convert it into a DM above and pass it to DRM. So, before this DRM never supported to accept any user pointer kind of use cases, but with this you can just simply allocate a buffer using malloc and then pass it to DRM with DM above. So, that was one of the new use case we were able to achieve and then share a processes memory to other process I have covered that. Yeah and last point is about the G streamer integration where you have the software components lot of software components which are allocating memory on its own. Once they have allocated the memory you can simply export it using DM above. I think yeah ok. So, when we are talking about you know converting from virtual address to physical address there are lot of security concerns here like should we be doing the correct ref counting, should we be doing the should we be making sure that the patches pages do not get swapped out from the memory. You do you need to make sure that let us say application says that this is my virtual address and I want like 4 MB after that virtual address, but you have actually allocated only 2 MB. So, application may give you a wrong sizes, but it is the kernel's job to find out that there are no segmentation errors, but that generally typically happens in the case when you are doing page work you would try to find out for each page and at some point of time you will find out that there is a page fault. Now, the first question that came here was how do you handle the over commit, but yeah that can be handled by doing by triggering the page fault. So, that the mapping between the virtual and physical addresses is created before you try doing the page work and then there are some open questions like should we should we restrict sharing of certain memory because we now we can share any memory. So, we can share like data segment we can share code segment. So, there might be some security concerns because you do not want to share a specific segment of a process to another process. So, should there be any constraints on to you know you want to share only the data segment or let us say you want to not share a specific segment of the memory. So, these are the questions which are like kept open and I hope that community would give me the right responses to that. Yeah and as also one gentleman mentioned that you need to handle the races between like if the application crashes or let us say different there are multiple users and how do you handle between the race case of the unmapped and close. I think that is it these are all the references if you have any questions. So, this has been implemented on 4.4 kernel, but I think I am planning to put a proposal and then I am not really sure how much time will it take, but at least the idea will start I will start rolling out in the mailing list and depending on how many rounds it take for the different patch sets I cannot commit, but I will try my best. Any other questions alright thank you then I am done oh sorry sorry can we little out depends on who is owning the buffer. If like typically the RTOS is not smart enough like it cannot reconfigure the memory address space to have a different visibility. Typically if you have given let us say a specific memory chunk to the RTOS it will only have access to that memory chunk. Now if you want to give a Linux memory to RTOS, if you want to give a Linux memory to RTOS you need to have a new mapping created for RTOS processor because it will not understand any memory which is given to it while at the boot up. So, only there is less static mapping done by RTOS because you have only limited memory access and the even the address space that is there for this RTOS processor is limited. So, it will not have access to like full 4GB memory. So, if you want to share Linux memory to RTOS memory again it has to be from a specific region you cannot randomly take any memory and say that now I want to share it with RTOS it has to be from a dedicated piece which is already mapped into the RTOS region. So, DMF concept is only useful for Linux all the Linux drivers will use the DM above. If you are taking a buffer from RTOS and then converting into DM above you can do all the operations using all the Linux drivers once all the operations are finished of course, the physical memory remains the same. Then you can give it back to RTOS without doing anything because the physical address remains to be same. It is DM above is only Linux means of accessing that buffer from user space. So, once you get the list of physical addresses you need to implement this bunch of ops which is called DM above operations. So, there would be like map and map and all these operations. So, every time imported driver tries to access the actual memory it would call attach, map, detach all those APIs and those are the APIs which needs to be implemented by the DM above exported driver. In this case this driver has implemented all those APIs. So, that anytime an importer calls those APIs the corresponding actions are taken and that is all DM above is all about. Yes currently it is being taken in such a way that it maps for bidirectional. So, at any point of time application wants to read or write it will make sure that the DM is always bidirectional. Yes so, the typical the intent behind doing this use case is not to actually do any CPU access in this at least specifically in this application. You do not see application working with that virtual address it is just map and then converted into DM above, but there is no CPU access happening because all of the operations are actually hardware accelerated. So, there is like a GPU driver there is a encode driver and all of these are peripherals which will which are not part of CPU. So, if you just give a DM above that is it CPU does not have to access it. It is just that you are creating a virtual mapping, but you are not really accessing it. If you do access it you will have to do the penalty of I mean cache penalty. Exactly. So, I mentioned about I mentioned about the ABI. So, there will be an there will be an IOCTL which asks let us say an application willing willingly wants to let us say flush cache or invalidate cache. So, you will have specific IOCTL for that. So, you can once you open the slash dev slash VMM exp you can either export the buffer and then you can use the second type of IOCTL where you can say that I want to sync cache or I want to invalidate cache. So, that is an API provided to the application in case application wants to specifically handle caches on its own. In general it is considered bi-directional. All right. Any other questions? All right. Thank you then.