 So, let's get started. So, I'm here to tell you about a stable device tree ABI and that it's actually possible because at last years or was it two years ago at ELCE there was some talks about having a stable ABI for the device tree being impossible and that's some kind of a trigger for me. So, I thought I need to stand up and prove it's actually possible. So, sorry. This is Guy telling you all those nice things. I'm Lukas Stach. I'm a kernel and graphics developer at Pengatronics and what we do is we're a consultant company and we help customers build Linux based on the mainline kernel. So, constantly updating systems using recent mainline versions of Linux kernel and yeah, we help them reduce the maintenance cost by bringing things mainline so we don't have to maintain them for a customer but have them maintained in a wider community. So, at my day job I'm mostly dealing with the idolamics stock from FreeScale now NXP which puts me in kind of a good position to claim stable device trees are possible because those are mostly processes that are used for a really long time in the industry. So, they have really long lifetimes and yeah, that gives us a bit of time to actually get things right then yeah, if you have to deal with mobile chips you're probably in much more of a rush to get things out and get things working and not take the time to actually do those things. But what I want to do today is get you motivated about why we want to have stable device tree AVIs at all and then give you some examples at how you can do it or at least reduce the risk that you need to break the device tree AVI. So, why should we care about a stable device tree AVI? For now, most device trees are still inside the Linux kernel if they are upstream they are in the Linux kernel so most people just consider them part of the Linux kernel and as long as you change things in lockstep so changing the device tree and changing the driver in lockstep everything is fine, you're not going to break anything. But now there are other projects like other OSs that are using device trees like FreeBSD or whatever. They're secure word firmware running on the same system which is probably using the same device tree they don't want to reinvent the wheel and there's even boot loaders now with Uboot starting to make use of the device tree to probe devices inside the boot loader itself and bear works I think that ability is a bit longer so we use the same device trees we use in the Linux kernel in the boot loader or other parts of the system so it would be really nice to keep them stable so other people don't experience breakages just because Linux people decided to change something so one big thing we see a lot is that you have interactions between the boot loader and your kernel it might not be that you're doing calls to the boot loader from the running system but even loading the device tree the boot loader probably knows a bit more about the system than a fixed device tree you're having in the Linux kernel something like what we regularly see is that we have systems with different memory sizes so we don't want to have it in a fixed device tree but have the boot loader detect the memory size and write memory size in the device tree that's only possible if the device tree is relatively stable so for a memory it's not a problem you have defined binding and everyone just uses this and it just works but there's other cases like detecting touch screens or displays in the boot loader or firmware and then providing that information to your running lib or the Linux system you're about to start so you're changing the device tree and that's only possible if it's stable otherwise you would need to have to update the boot loader for the firmware and another big point not so much in embedded but with all those single board computers being out there it's a matter of user experience so people just try to update the Linux kernel on their devices like they're used to from running a PC so updating the kernel on your PC is something you do regularly because it's low risk, it just works mostly but people start to expect the same thing in embedded or boards they think of so it's actually a nice thing to be able to have a working or starting system if you're updating your kernel so let's get into the details about how we get a stable device tree ABI or freeze it at some kind so I think most people in the room might agree that a completely frozen ABI is infeasible so did you ever try to design an ABI or an ABI and never again change it I don't think that works so if we would aim for a completely frozen ABI everyone would just fail at some point and not bother with it again so let's try to get a definition of stable ABI that's actually possible to achieve and that's one way compatibility so let's look at the Linux kernel and the user space ABI we're claiming the Linux kernel user space ABI is stable because we don't break old programs running on the Linux kernel but we don't claim it's stable so you can run your program that's using new ABI or changed ABI from the kernel and run it on your 2.4 kernel we don't do that so if we try to transfer this model of being stable to the device tree the one thing we care about is firmware and the kernel so most people are running newer kernels on all the firmware at least unembedded the one notable exception for that is Enterprise Linux that are probably running a pretty old kernel and new systems with new firmware is coming out but the thing most people care about is if we push the device tree down into the firmware I want to be able to run a new Linux kernel on old firmware and I think if we define that as a stable ABI that's some goal we can actually achieve with some work so for those of you not familiar with the deep process or bindings or whatever just a quick example so a binding defines how you define your hardware in the device tree and make it available to the drivers that are actually talking to the hardware so it has to include the necessary information so for buses that are nondiscoverable like most buses on the ARM platform you need a way to let the drivers know what hardware we can find in the system and the one thing that we're doing this in the device tree is having it compatible so the hardware block is compatible with something and the driver checks for the compatible info so there are required properties in binding so they must be present in the device tree and there are optional properties that you might have in your device tree to better describe the system or have some kind of configuration or whatever so if you're describing your hardware leaving out any of the required properties your device tree is not compliant to the binding obviously so how to actually define stable bindings or bindings that you are able to support as stable a really important part of that is you go look at the hardware and provide an exhaustive list of all the things you can find in the hardware block might it be in separate chip or some IP block on the SOC you really need to look at the documentation and provide everything that's needed like voltage inputs, clock inputs and whatever as required properties probably if you're describing a whole system and you have a lot of new IP blocks or lots of new hardware where you need to define new bindings you're probably in a sense to just cheap out and only push things into the binding that your system uses so if you have some chip attached to a voltage rail and it's always on in your system you might be inclined to not describe the voltage rail in the binding then another user might come around and who actually needs a description in the device tree for the Linux kernel to be able to support that system so once you define a binding and it's accepted and used upstream you can't change it in a way that you add other required properties to it because you would make existing users non-compliant to the binding so what you can do is add other optional properties so they might be present but this again has a maintenance cost in the long run because now the driver has to deal with yeah, a regulator for the water trace may be there or it may be absent and you have to deal with all those complications of yeah, maybe I need to switch that on now or maybe not and there are some abstractions in the Linux kernel which make those things a bit easier but in the long run those are always a maintenance burden if you have to support additional optional properties if you can just say in the probe of the driver okay, I need this, this, this and this and if it's absent I'm not going to work with the device it's a lot easier than doing all the things at runtime right, it works but it's a maintenance cost in the driver so better avoid it and if you can avoid it or the way you can avoid that is actually go look at the hardware and try to be precise as possible in the description of the device so now I'm going over some things we've learned with the i.mx platform in the past years so if you go look at the i.mx platform you might not find everything I'm talking about here consistently in the platform so it was a learning process for us and I'm trying to give some guidelines where we found valuable to try to keep things stable and one big thing is if you're defining bindings for something be precise and exhaustive in the things you need but infer as much as possible from either the compatible or the actual hardware so if you have hardware registers describing your address space or whatever don't push it in the binding so things that are in the device tree or the more properties you need or have on the device tree the higher the chance that someone actually gets it wrong and if it's wrong you're breaking someone so if you can infer something from a compatible do it, don't push it in the binding so okay I'm calling names here but yeah that's something I found valuable to have it present to you so that's actually a good example with the Tecrac controller it's a mix platform but some example I found in the upstream kernel there you have a compatible for different generations of the devices and then in the driver there's a list of things that this compatible tells you the device actually has those quirks you can't use it so it's not in the device tree it's describing all the quirks of the devices it's inferred from a compatible so if you later on find some flaw in the device where you need another quirk or something you can actually look at the compatible and add it in the code so you're not breaking the device tree API to fix things so that's a good example one of the bad examples is the network controller and TI chips where a lot of the things that are just hardware so the number of DMA channels the RAM size that's integrated in the controller and whatever is in the device tree so everyone using this device needs to describe those things in the device tree so if the guy describing the hardware in the device tree gets it wrong the only way to fix it is fix the device tree and if you need to fix the device tree you're breaking the stable API so that's one example of how you shouldn't do it so what we've learned is to always use new compatibles if you're dealing with new chips or hardware blocks or integrations even if the driver doesn't match to that compatible yet write it down in the device tree for new chips so what we do with new chips is so we have that SPI driver on the MX6 Ultralight and we have a compatible for that the driver actually only matches to that compatible because it's basically the same IP block so we're using the compatible of the first chip that used that IP block but we have a new compatible in the device tree in case we find out or later find out that something is broken with the controller and we need workarounds in the driver so if we ever find out or find ourselves in such a situation we can just check the new compatible in the driver and infer the needed quirks from that so right so what if you actually forgot to add a new compatible for the thing you're integrating so if you're fixing or you might fix the device tree if there's something obviously broken in the device tree and you don't want to spread it further you might do that but before you're going and fix the device tree you fix the code in the kernel so one example same SPI controller and this is where we have gone wrong we have an SPI controller that is actually found on a quad in the dual light versions of the same chip and we thought oh that's the same IP block so we can use the same compatible and only use the compatible of the first chip in the device tree and later on we found out that IP block in the system integration had gone wrong so DMA with the SPI controller is actually not usable on that chip so now we don't have a new compatible in the device tree to check for that but we have to work around in the driver code by checking the compatible of the machines or the SOC so we just disabled the DMA functionality inside the kernel code so even if you're running an old and broken device tree that still claims that DMA is usable on that device the driver won't try to use it so always fix things in the kernel first so what do you do if you actually need to break a binding for some reason so you've got it totally wrong and you need to break it to actually support all the functionality of your hardware or something so there's some questions you have to ask yourself is your platform stable? so it's something or someone depending on things being stable and there's a trend in the kernel community to cheap out here and just put something like okay my platform isn't stable in the documentation and that's a really cheap way to get out of all this stuff so don't try to do this and you might have new platforms where a lot of things are still in the works and you might actually or your users are expecting things to break still because you're an early enablement of a platform and you may break things at that stage but it's not nice to your users and then the second question you have to ask is the number of users depending on that binding non-zero and if you can't answer that with confidence yes you'll probably have users that are using the binding and they're depending on it being stable so we've had one situation on the IMX platform where we've got binding wrong for the power domain controller and it later turned out that with the binding we've had we would only ever be able to support a single power domain but the chip actually has multiple power domains so we wanted to support that and we needed to break the bindings for that so what we did at this time is actually getting a comparatively layer inside the driver so you can still run a new kernel with reduced functionality with only one power domain being available on an old DT so there's code for this in the kernel and obviously code in the kernel is a maintenance burden so really think twice if you really need to break a binding can you make it work any other way don't do it so what we learned and did in that example is you really want to split out pausing of the old bindings in the driver in a separate function or something so once you've changed the binding you'll revert all your device trees to use the new binding the testing of the old binding will certainly decline so there's fewer users using the old binding and you won't have as good testing with the new or with the old binding so you probably want to push pausing of the old binding out into some parts of the code that aren't changed regularly so you're avoiding the possible recreations coming from someone adding another optional property and the new binding and just changing the things and then your old pausing breaks and some year later you actually realize it because someone is still using the old binding so just split it out that's how it looks in the Linux kernel today so we have the probe function and we go look and look if we have a child node with this name which is only present in the new binding so we can say okay that's the device tree that's using new binding and if we don't find that we're calling a function to do the pausing of the old device tree and then go on yeah, but it might be possible so in this case it was easy to infer from the presence of the child node because we didn't have child nodes before but in a general case probably it's easiest for the driver to just have a new compatible to check and then go past the new device tree or the old device tree bindings so that's it for recommendations and I think those are actually some relatively simple things you can follow to not make your and the life of your users harder so one thing you might have noticed is that only the first thing I've given as best practice with having an exhaustive list of properties in the device tree bindings is something that is actually enforced by the device tree maintainers reviewing your device tree bindings all the other things are things that are in driver or platform code so it's probably not possible for the device tree maintainers to actually enforce the ability for your platform or your devices or whatever so don't rely on them to do that job for you so if your platform or driver maintainer it is you who should take responsibility to keep things stable so all the things I've mentioned are things that can be enforced or at least encouraged by maintainers from the platforms and drivers so you should probably do that if you're a maintainer and on the other hand it's often actually not that hard to do the right thing and not break the stability of the device tree ABI so maybe it's taking you 10 or 15 minutes more thinking about better solutions than breaking the device tree ABI that's 15 minutes well spent because it's making everyone's life easier in the end and it makes systems upgradeable with out breaking stuff in every corner you go so I'm a bit early but yeah thanks for your attention are there any questions? not yet I've taken the opportunity to have ELCE to actually get the time to put it into a slide so yeah now it's just a matter of turning it into a document for the kernel yeah that's something we've encountered with iMix changing the way how interrupts are mapped on the platform and it was a really incompatible change but you have to be upfront if you're really breaking something so we've decided that it's not possible to have backwards compatibility at that point because it would just spread all over the kernel so we've at least made sure that the kernel boots on an older device tree and spits out a big warning to the users that they might have to update that device tree to regain all the functionality that worked before so be upfront about breaking things I'm not sure how often it's really possible to even realize that the change is incompatible one thing obviously is adding a new required property breaks the compatibility are there others? I don't know it really depends on the driver I think so I've seen a lot of changes where I as a driver maintainer see something that might be incompatible with the old bindings that means I need to know my driver and how it works and relies on something in the device tree and then push back on that change so I don't think there's a general rule of things that breaks the compatibility but yeah it's really just keeping your eyes open for something you might have seen before I think most of the time when you're breaking a binding you're fully aware of it hi I'm with TI so I want to disagree with you on the good and the bad so the CPSW which you used in your illustration is a synthesizable module and each of those things well I won't defend each of them maybe there could have been some defaults there but could be different on every chip like Arn pointed out so what you're really suggesting is that we make encode into the kernel knowledge of each SOC which is the point the device tree was trying to go away from and it also your solution doesn't work for the enterprise case that you pointed out at the beginning right because there what we're trying to do is give be able to write a generic driver that will work even if the kernel doesn't existed before that device existed so it's just I understand what you're trying to do but I just want to give the counterpoint got that argument before from the enterprise people but they're in a much different situation with having enterprise kernels that are probably really really old and you might even be able to change your firmware to accommodate a certain kernel because you are able to yeah detect kernel version and then change device team comfortable ways if you really want to do that but it does apply in more situations than enterprise because you're teaching the kernel specific information right so the same information would have to be taught to the bootloader to the trusted firmware to NetBSD you know all those things but it's possible with PCI devices so you have a PCI device that generally just has a compatible it's the vendor and the device ID and everything known about the device is inside the kernel so yes it's possible to push that knowledge into the kernel and it's right but then before but then old kernels can't run that PCI device right so I'm going to make the counter argument if you're putting synthesizable IP into your system you should do what designware do and actually encode the synthesis options for things like memory sizes into a readable register so that when your driver comes along it can read say oh I've got this much RAM I've got this much this and then it can actually make an adjustment because a lot of people do this and we've had to do this with designware too so I think that I agree with both of you I think the most important takeaway here is when you add something that is configurable and different between SOCs think about whether it should be a property or not don't just assume it should be a property or it should not be a property any other question? yeah, so hardware engineers pushing things into hardware and making it discoverable makes this job a lot easier so we've seen a lot of IP blogs where you just need to know okay it's in that address space or at that position in your address space and everything else is discoverable from the hardware and that's a really nice change okay, thanks for being here