 Thank you very much all for coming here at this for me quite early time my name is Wolfram I'm the current maintainer of the iSquashy subsystem in Linux and I want to talk about today about something I learned the hard way when I was preparing my subsystem for proper DMA usage so and I found a few things out I want to speak about I want to get known and I hope you can spread them and we can together raise awareness and hopefully somewhere settle this topic because if DMA goes wrong in unexpected situations you can imagine that can have quite drastic consequences so about iSquashy and DMAs if some of you say why would you ever use DMA with iSquashy then I'm quite kind of with you the usual use case is that you send lots of small messages like having a register and a value you want to poke into that register so when iSquashy entered Linux long long ago DMA wasn't really considered and so there were also no rules how buffers should be should they be should DMA take and be into account when you set up the message buffers because of course we have inside the struct iSquashy message we have a pointer to a buffer where the actual data is and one thing well you all know Murphy's law if you don't have clear rules and give a chance of things going wrong things will go wrong in all possible ways you can imagine when I finally had when you notice that they have issues with sending buffers via DMA and found out well well do the buffers come from they really come from everywhere heep stack read only data came up memory and whatnot so if you if there's no rules people go creative and do all this kind of stuff so when when thinking about it more or trying to get a clear rule oh no here sorry with some discussions after some discussions on the mailing list I thought it would be still best to have a clear rule which says DMA is optional I don't want like other subsystems do that they require their message buffers to be DMA safe but I didn't want to go that row for iSquashy in retrospect for the reasons I've given there I was afraid of regressions because there were like this zillion of drivers getting buffers from everywhere and to which to drivers for hardware where we don't have access to and I didn't want to mess convert that and so it was optional this the default case is we don't care too much about DMA but we want to have some guidance if you want to use DMA and of course that's why I'm here I want to speak about this that this is the case so people know and can handle their buffers properly so great now I decided that DMA is optional for iSquashy that but that means we have two potential code paths so we have a buffer which is DMA safe or we don't have and we need to go in the bus driver sending out the data to go this or that direction but how do we decide which direction to go ideally we would do this in a perfect world we would do this at runtime we just look at the buffer and say oh yeah that's good we use DMA or no it's not good so let's just you pulled it out and yeah that's the time I wondered why isn't there a function for us like is DMA capable for a buffer or what and the the fact that such a function does not exist already of course should raise a hint that this is not a very easy topic and it probably does not exist for a reason but I wanted to be brave and find out on my own and just how should the function look like if I want to find out at runtime if a buffer is DMA safe so I started looking what kind of functions we have to check that and then you find like maybe this address and this function and then you find out well it's not enough there's there's a may might be a better one or you should take out into consideration this or objects might be on the stack that you need to check out as well and by pointing out this this is what Greg through a hard man so the USB subsystem also needs to deal with that problem DMA safe or not and they have I think they mainly she used to check with that function and then somebody said oh well maybe we need to add this check as well and I don't want to bash Greg I just want to make out it's really complicated to or hard to know which functions there are inside the so I didn't know we had that macro to check it so it is complicated and if you think about cash line alignment and this was the point where I said okay no this is not going to fly I understand why there is no such function yet and this check at runtime I won't go this way there is this DMA oh it's renamed by these days I'm sorry this file name is not correct it it has been moved but it's basically the same there's a debug kconfig option for DMA and if you do development I really really recommend to to have it on all the time because it will find a lot of things it's made it found a bug in a driver for for a renaissance hardware I I'm looking after which we could find before it appeared in the field so it's really good but it's not good for production kernels of course so you should really do this during development and so okay I should also I I've also talked about it about this topic in honor at a conference in Japan so if you see some strike through this is the information which has changed since this talk and I think it's quite worth noting that the DMA debug code increased by one kilobyte since this summer so it's really a delicate topic so we have that but we can't use it for production kernels it but it's another idea how subtle this and complicated this issue is so what I did for I square C is I have an opt-in approach so with within I square C the messages can have certain flags and so I just added a new flag which says yes I know this buffer can be handled with DMA it's called DMA safe so opt-in which means that a lot of driver need to be manually audited and added but I think this is when you look at regression this is the same thing to do and to make it easier for drivers I have a small API which helps you to if you want to do DMA you just use this helper function and the message you want to process and it will check for you if the buffer is DMA safe based on the flag and if it's not DMA safe you get a bounce buffer so that whenever the when this function returns and you have a non null pointer you have a buffer you can use with DMA and you don't need to care where it comes from and there's a cleanup function for that as well which as you see got renamed to get the API a little more proper this is super still thinking about I scratch she is super simple most of the messages will not require DMA so this is a one-to-one mapping bounce buffer mapping if you have if you're not happy with that you can do your own pool based approach based on this DMA flag if you want to do that but I think for the generic approach this one-to-one mapping is okay and on the I square she client side we have two new function calls so the standard function calls to just get and receive data are called I square C master receive and send and now we have that extended with DMA save so that you can use this new flag from from the client side and have have all that properly working it's worth noting that if you you send messages from user space that we have a user space a dev interface they are all copied those messages are all copied copied from user space to kernel space anyway and then it's guaranteed that those buffers are DMA safe so happily we don't need to fix user space that's that's a good good news and they're also quite a lot used in the kernel are not directly I square C transfers but SM bus transfers to kind of a fallback for more limited hardware and this I square C controllers emulate these SM bus calls and everything which is emulated will be also DMA safe because there's copying around so actually most drivers should be okay with this change there are some which are directly using I square C messages which need to be adapted as I said opt-in every driver needs to be checked and the bus master drivers need to handle that flag also the problem with I square C again is that it's not only used directly but also well like RatchMap is a huge user of I square C especially accessing codecs and stuff like this and so I needed to sync with Mark Brown how he could access this new DMA safe functions and by the way how handle I asked him how a RatchMap handles DMA at all because it doesn't use not only I square C but SPI and slim bus and whatever buses sometimes lot buses I've never heard of and they what's the common ground of this and that's what he said we pretty much assume everything is DMA safe so it's an assumption there's also not a clear rule and when saying how yeah I can't really think of a particular good way how to handle all this and what I propose to him doing DMA safe is not particularly appealing but might be the best we can have I'm pretty aware that this is just what I came up with is an a solution only for I square C it's not very generic but I really want did want to well get this part of the kernel more robust because people were seeing crashes because of that and this is what I could come up with in that short amount of time and he really nailed it when I asked so should we do it like this I really like this phrase it's hard to summon enthusiasm but yes without changes to the DMA stuff it's probably as good as we can do which is exactly my opinion so I implemented that and I'm happy that we improve the situation but I'm not super happy about it because what he also later said we need like annotated buffers it would be super super great if we have a buffer object or whatever which could be annotated so we just can check if this one is DMA safe so I put this on the I square C level but it should be really on the lower level but we don't have that yet and as you can imagine this is a big task yeah and because red map is a user of other subsystems I checked some other subsystems and they're like for sure slimbo's SPMI and one wire which do have buffers with they in the same situation that I square C used to be they have some structs to pass around messages these structs have a buffer point pointer to a buffer and they have no rules set and nothing is mentioned about the buffers being DMA safe or not and so it could easily happen that somebody is pointing a buffer which gets someone used with the DMA capable controller and these things can go wrong really bad from from that point the nasty thing is most most of the time it will work but when you look from it at a safety purse perspective most of the time is not good enough clearly not good enough it needs to work every time SPI is another subsystem I looked at it's a lot better given given the documentation side of things because it has clearly documented where I said where follow standard kernel rules and provide DMA safe buffers in your messages so this is a clear rule it even has helpers to do the mapping and unmapping so you could think great this is a subsystem all cases solve DMA is not an issue for them but I I found a thread where there was an issue with doing an UBFS file system on an SPI nor flash at well obviously connected to the system via SPI and UBFS for reasons which I haven't really dug into but in that message that was said are okay use a formal of buffers at some times and there's a whole threat how to fix this issue I am not sure I haven't followed up but I'm not sure if it's fixed by now but there was a lengthy discussion about how to do things properly so even with these clear rules there can be still issues so if you okay well okay aren't sad that spinor is completely separate from SPI in the kernel so that might be that the rule is not affected but it's still a point in the kernel where the problem I want to point out here is not maybe not really well handled and well it could be a root file system which is on it right and a similar problem it was was also with SPI and went to do cache flushing which also started as a simple patch and turned out into a lengthy discussion how to do this properly and that also showed that sometimes DMA handling is based on assumptions which were more most work most of the time but not always and it's worth but to be aware of it and then to audit your systems if you're affected or not the also the subtleness of this issue can hit people in some unexpected ways for example if you if you get a buffer inside a struct your pretty have pretty good chances that you don't meet your cache line alignments so that is what some people go wrong so if you get your buffer just the buffer the kernel might handle the cache line alignments for you if you do this inside a struct you get the memory for the whole structure and the buffer is somewhere in that you might not have the cache line alignments this is just the introduction some people get this wrong but some people know this it gets more trouble if you use def mk malloc instead of k malloc because if you get just the buffer with k malloc things will most likely go right but if you use def mk malloc you still get this buffer you can use but internally it gets pre-panded it gets it is put into a struct as well and the cache line alignments are mostly broken again so as a rule of thumb it's pretty good to if you want to use dma capable buffers to not use def mk malloc and I think this is largely not no I didn't know that before it's very subtle but I think it's a rule of thumb which needs to be told or to be fixed or to be fixed yeah and then we have other issues like people work with buffers from stack which is pretty I wouldn't say common but it's not like a no-go because for example on arm it mostly works like a charm but even if it works on your architecture it's definitely architecture dependent but even on architectures would could make it work we have this config vmap stack to for a virtualized stack so that will break for sure yes that's what I said so okay aunts aunts view on this it doesn't work it just works accidentally but that's what people see they just do it and it works and so on the driver so okay so you could walk okay well I liked it I like rules of thumb don't do that even on arm you might get data corruptions on the stack yeah and then one other issues is you really have to I put it in brackets but still I'm rules can be overlooked ignored by people just hacking around and not reading so if you have rules like this you should really make sure they pop up everywhere and people are aware of that so I'm not too bad so so my conclusions from trying to get my subsystem proper with DMA and their buffers proper with DMA is I was kind of shocked to see how how much DMA transfers in the kernel are based on assumptions which are true accidentally true or whatnot but are not really rocket proof yeah rocket proof I don't know is that an English word but you know what I mean I have dealt the last month a bit more so I was in this so to Linux workshop which deals a lot with safety and with that interest reason I think this is this situation is not only bad but becomes especially bad because if you if I see all the devices where Linux should go into I really want the DMA to be proper because things go wrong horribly if that's not the case I learned there's no easy way to detect this is run at runtime yet the debug option I mentioned everyone should have that when on when developing to find bugs it's not good for production production use so it's current state is it's not really detectable at runtime yet if how what what kind of buffer you have so we need to audit and be very aware of what what's happening there the good solution is a big task as I as I said already annotated buffers would be awesome which as much handling as possible from core kernel subs memory subsystems so users don't have much choice to get it wrong at the last talk somebody mentioned if the DMA buff from video for Linux could be an option I didn't check it that time I checked it a little bit but I think it's too much design for video for Linux I don't think it would be efficient I think efficient would be a really for that targeted solution which then DMA buff might convert you to hop onto or so some but I don't think DMA buff is a good solution we need something which is a big task of course if you want to do that I hope I could demonstrate that the problems are can be very subtle I think about the K malloc versus def mk malloc problem for example or the fact that when I was talking about the UBFS thing accessing the SPI nor memory that the UBFS layer can be bring bring in problems where the SPI nor itself might be okay so stuff like this and some of the bugs are pretty long-standing as I said I don't know if this UBFS problem is actually solved by now for subsystem maintainers I really can recommend just write out rules be it whatever DMA require your buffers to be DMA safe or not whatever but just make a rule other as you found out with I square C things go in all directions and keep an eye on this during review so I sadly I didn't have so there's a new subsystem coming the I3C subsystem I sadly did not have too much time at looking these patches but what I said is hey you didn't say anything about your DMA buffers being DMA please please do that and they I think they decided that their buffers should be DMA safe but it's it's written out now so this is a good thing for developers I hope I said it at least three times by now there's config DMA debug when you're developing please use it okay thank you it's config DMA API debug like I said it fixed a problem for me before it turned out in the field where the the length of the buffer was an ahead and off by one bug and if you're touching code anyhow for whatever reasons and it uses DMA just take these extra minutes to double check that all the DMA related things especially when it comes to buffers are really proper and not based on assumptions and yeah this is for developers and for everyone this is more like also if you think about managers or organizations like let's say the Linux Foundation if you care about safety pay attention about to DMA because this is we're not in a good in my opinion the state of that is given the quality we want to have and the Linux Colour we're not not good at that and to do that we yeah spread the word document what's wrong and collaborate on how we can fix things and this is a big task it needs a lot of eyes and a lot of thought and but we want to have I am quite sure we want to have it so yeah this is basically what I wanted to talk about I hope you're halfway as shocked as I am if you if you care about good working devices with Linux that's good we have like ten minutes more for questions and do we have a mic for that or should I I shall repeat the questions probably right good then do we have questions there so so you were suggesting to you so there are DMA pooling functions and your suggestion is to use that kind those those functions which which access the pool and so we can have a guarantee that DMA will correctly work oh can we have that microphone this one actually to DMA APIs what you are referring to is the streaming API where we do explicit cache flushes when we have to what you asked about is the coherent DMA API with the DMA a lot coherent and so on these are not compatible if you use DMA a lot coherent you get a buffer that is not acceptable to be passed into a driver that expects a buffer to be used with the streaming API so this is not only not sufficient it's actually actually wrong you cannot use this and enforce things like i2c accessing all solutions with dealing with pooling even if you have a custom one I think we have a tendency that this is over engineered because DMA is rare so you might I think other questions what if you want to update you get a microphone so why do you say DMA because some hardware allows only easy DMA DMA transfer this is one of the reasons yeah yeah in hardware world everything is possible there are indeed the i2c controllers which can only do DMA awesome mark you had a question meeting lots of people great it's good for your health yeah so just on the spine or stuff it's supposed to be fixed I don't claim to understand it's officially solidly that I'm 100% certain but it's supposed to be fixed the issue is that you can allocate memory with V malik as well as other memory allocation functions and if memory is allocated with V malik you need to map it differently because this is helpful and useful but fortunately you can actually there's a function you can call to check if a given buffer is V malik which makes that tractable but yeah it's a pain yeah one more thing that you the one more trap that you can fall into so a buffer can be DMA safe for one device but not for another device because of the location and physical memory some devices have a limited view of the physical address space so you can have a buffer that requires to be in the in a lower address like within the first four gigabytes while another device can access data beyond that or within the first 24 megabyte 16 megabytes or something like that so came in lock alone so there's no generic way to check whether a device whether a piece of whether the pointer points to something that is accessible using DMA unless you also say which device should access it that I might be confused because I'm on stage but that would mean we don't have a way in the kernel to get a buffer which is surely DMA safe you have a way to do that but you have to pass a pointer to the device structure for this and which for every device we can get get a buffer that is accessible by that device if it can do DMA which function is that it's complicated so that would go back to the DMA log if you do if you use the DMA log functions then you can get a buffer that is accessible by that device it is also slow to access from kernel if you don't have cache coherent memory yeah cache current DMA well yeah just to say that I've worked on a couple of spy devices where I try to use make them DMS safe because UBI fs was passing we lack buffers so it turns out that there doesn't work well with VIVD caches which are there on old arm platforms and also on platforms that have LPA enabled where DMA can access only within 32 bits but with LPA you can access physical memory even at higher address and yeah those type of buffers still fail with whatever checks that are there and I don't see any way other than maybe if DMA map single was to somehow provide a bounce buffer find out whether this is here yeah it does so DMA map single should if the memory is DMA capable it will map it possibly using an IMU if there's no MMU and the device is not DMA capable then it should allocate a bounce buffer this is currently not fully implemented on arm32 because nobody has done it it's not hard to do I've talked to a number of people who volunteered at some point to do it but nobody has ever completed I don't think it's more than a couple of days work actually but until that is done you cannot pass a high mem buffer like a buffer beyond the first four gigabyte on the 32 bit arm system so this is there for arm64 is it not for arm32 that's what I'm saying the I mean DMA map single providing the bounce buffers it's there for arm64 but not for arm32 is that what I'm saying that would be correct yes on most architectures it just works because the architectures either use the software IOTLB or they are guaranteed to have an IMU an arm32 we do not guarantee this at the moment because most people don't have that much memory and it's very rare to actually run into this I'm so happy that you guys all basically confirm my basic feeling which is like on most architectures it should work like I think this pretty much describes the situation we're in yeah but maybe a good good snow is still a few minutes but it's nice to see in the I2C subsystem that bus master drivers are catching up that they really say oh we had this oops because the message buffer came from I think it came directly from module read only data because it was a firmware and then they oh yeah cool you have this flag and then they could correct things to work so a bit of success yay more questions two three four sold