 So today we'll talk about updating and modernizing our clock drivers. My name is Chen Yu. I'm currently a kernel developer for Google working on the Chrome OS platforms. I started out as a hobbyist working on all-winner ARM SOCs back in 2013 and I've spoken a bit about bring up and maintainer experiences at ELC. So disclaimer I've worked a lot on the clock common clock frameworks but I do not consider myself an expert. So unfortunately the maintainer for the clock framework had to leave early. He's not around. So some background. So back in the day every platform would implement their own resource management and everyone would have their own APIs and so some of the developers came up with a common API for drivers to use and this is the clock API. You use it by including linux slash clock.h and this only specifies an API. So the actual code backing this is implemented by each platform under their own machine drivers and then later on there was this clock depth lookup code which gave you some sort of API to look up clock resources and then in 2011 there was this beginning of the clock common clock framework which basically implements basic clock types like dividers, muxes, gates, multipliers and initially this was done with one clock per device node. So you'd have one node for a multiplier, one node for a multiplexer, one node for a divider, one node for a gate and this unfortunately loaded the device trees a lot and you'd spend a whole lot of CPU time parsing your device trees and normally clock hardware would have like multiple clocks fitted into one register and so you'd also have like node address conflicts within your device tree and so later on the clock maintainers have asked everyone to move they've evolved a common clock framework notably they've asked many platforms to slowly move to the common clock framework from their own custom stuff they've moved everyone from having like one device node per clock to one device node per clock controller IP block there has been work to separate the consumer side from the provider side so you'll see struck clock for all consumer APIs you'll see struck clock hardware for all the provider APIs and then as more and more platforms move to device tree the global clock lookup stuff has been less commonly used and this has more or less been deprecated so now we have like a more efficient clock lookup through device trees and also recently well maybe three or four years ago there was a new way to describe your clock tree it used to be all purely based on strings you'd match your parents based on strings and now there's a way to specify your clock parents as local clock parents either through the device tree or through internal pointers so what's wrong or what can like what's what the existing drivers so a lot of times drivers are upstreamed and they are forgotten about so most of the time developers will upstream a driver and then when it's working they'll just move on to the next platform either the newer newer stocks or the newer newer architectures or anything other times if they're hobbyists or they're not full-time or something like that they might have burnt out they might have quit their job and no one has replaced them so you'll have drivers that aren't really maintained and then there are situations where there aren't any real-world users of these drivers within the community within the upstream Linux community because these drivers are actually used downstream with an LTS kernel or used with Android used with BSPs and so even if they do run into problems they might not really report them to us and there's like maybe a disconnect between the actual users and us as a community also early on a lot of drivers have assumed that they will always succeed which kind of makes sense if your clock driver does not exist your system will most likely either crash or it won't be functional um it kind of works poorly with deferred probe because they don't clean up resources and the second try you will always have something that will fail and most clock drivers if not all are not written to be modular to be loaded as a module so trying to unload a clock module a clock driver module will likely cause the system to just blow up again that kind of makes sense because like the clock driver is the most essential part of the system so unloading the clock driver really doesn't get tested also as i mentioned the there was this move to separate the consumer side from the provider side but the provider side still has these old struck clock APIs and a lot of drivers are still stuck with that and it's been deprecated for like five years so we have around a thousand invocations in the kernel tree for the old APIs and around 600 for the new APIs so clearly there's still a lot of stragglers but documentation and guidance around how to migrate your drivers why you should migrate your drivers is quite lacking which is unfortunate also previously there was only one way to describe your clock tree you would specify what your parents what your clock parents are using clock names string-based clock names and that look up would basically look up through the whole system so if you had hundreds of clocks in your system that would take some time last year someone proposed hashing the strings for faster matching which would be nice but at the time i thought that the that wasn't the real bottleneck so we'll talk about this later so in 2019 steam board added pointer and device tree based look up for parents if you have an internal clock parent you can describe them directly using pointers to their data structures if it's external you can describe them through the device tree you can list your the input clocks for your certain hardware block using the clocks property and um in your driver you can reference each different entry within that property to describe your parent your clock parents and that is a much more efficient way than trying to match against the whole table of clock names within the system so what should i do or what should we do to improve our clock drivers um so first of all we can clean up our drivers as i mentioned a lot of drivers assume that they are they don't have to do cleanup they always succeed they are probably initialized early on instead of using the proper driver model so a few things that we can do is like move the clock drivers to the proper driver model using platform devices platform drivers add proper error checking and error cleanup in the error path and also add remove callbacks they may not be exercised but it's uh it's a good thing to actually write them and then you can actually check if it makes sense and also we'd like clock drivers to migrate to the new uh clock provider apis which start with clock underscore hardware underscore register so there are a lot of instances where you have to migrate automated tools will help said or coconut those will help out and um by migrating you are really helping the clock maintainers to essentially deprecate the old api and get rid of it also you can switch your parent references from strings to internal to local clock references again either using pointers or using device tree lookups and if you're using device tree lookups try to use indexes instead of strings because strings also uh string matching even though it's just for the dt device tree entries that will still be slower than having indexes and it used to be that indexes weren't stable but now with the newer yaml based device tree schema the ordering is fixed so i don't think there's any issues with using indexes for this purpose so what have we done for for our work um so i've been working on the media tech clock driver for maybe the last six months we've added removal code and cleanup code error checking for the whole media tech clock driver library they have one library essentially a collection of functions that they use throughout their uh all their so c drivers um and we've tried to clean up at least one driver so that it uh it it's now clean and it sets an example for um the media tech engineers to follow and so as we clean these up there are newer drivers also are much more cleaner and they have fewer issues so unfortunately they still have like 10 to 15 older drivers that we have to clean up um that's going to be a lot of work we've also migrated their driver to the newer api um which involved a massive patch set uh rewriting basically all the function calls um i think this was originally like 12 patches with um bisect uh bisectability issues between them because we had uh the way we structure structured it was we had one patch for doing all the automation automated rewrites and then that would miss them some things and then we do have another patch to basically cover all the missed cases and that would result uh help with the reviews because you had all the many parts and then you had the handwritten parts but that wasn't uh really bisectable so afterwards after everything passed review then we sort of squashed everything and then sent that to the mailing list and then unfortunately that resulted in a patch that was over the size limit of the clock mailing lists so yeah splitting the the patches also helps with like sending patches out and having them actually readable so the next part we were doing um it's basically switching to local parent references were possible um i've gotten this to sort of work right now but it's not really cleaned up i have like uh unstage changes for like 10 plus files right now um and i was expecting like improvement to the probe and boot time but unfortunately that didn't really realize itself um a bit of analyzing uh i found that actually the real bottleneck for the clock drivers probing was actually um checking each clock name to make sure that this clock name when registered is unique in the system um so that's done for each clock that you register and unfortunately the media tech systems they have like 600 to 800 clocks within the system and that that really eats into your cpu time um so basically that's what we've done so far um in the future we plan to do a bit more um first of all that duplicate clock check thing we'll try to move to a hash table for for the whole system so that should massively improve the lookup time but it might blow the system a bit more also steven mentioned that maybe we could do something like have drivers opt out of this duplicate name check because the names are really only used if you're using global lookups now if you're not using global lookups then maybe you could opt out but it's more of a platform decision and also we could consider like dropping all the clock names and have something auto-generated but that doesn't help with the uh the custom names help with debugging so when you look at the clock table you will know like which clock you're actually trying to debug instead of some random um id that's generated by the clock core and um also the all the following um points are basically wish list items from steven so we'd like to improve the documentation around the clock framework uh the clock core currently doesn't have much documentation it has some kernel doc attached but there's no overall design documentation that explains like what you should implement in your driver or how each function how callbacks are used how the overall code flow is so we'd like to improve the kernel docs and maybe write an introductory document that explains basically everything also steven asked that people um add more k-unit tests to add coverage i know max seems already doing a lot of this so if people would add more test cases to cover their use cases that would be nice and there's also an issue with the get parent callback that um it currently returns unsigned u8 and there's no way to signal errors and also there's this limit that you can only have 256 parents which seems like enough but who knows um so yeah maybe rewrite this to return a pointer directly that would be nice that would be enough a massive undertaking for the whole clock tree yeah and also i think this one is the uh more realistic one and that actually um helps with things there currently are like multiple ridge map clock implementations there's one for qualcomm there's one for media tech i'm guessing there are more and they all they're they're basically basic clock types like gates dividers multipliers muxes that are backed by a ridge map however um because of like subtle platform differences you'll have multiple implementations so we'd like to look at all these implementations try to merge them and promote them to the core so not everyone has to like duplicate all the code and steven also mentioned that there's this um prepare lock issue that is basically one big block around the clock core and right now um it will block changes to unrelated clocks say you're changing the speed of the clock rate of your gpu that would likely block any other changes to maybe your sd card or your display pipeline and so the idea is to try to make the locks more granular um to or to make it just affect the subtree that's currently being modified and also lock tap lock that uh complains about deadlocks when you're using the prepare lock and then uh with clocks on some slow buses um there's this issue on if the prepare lock is uh used first versus like if a ridge map is used first and you can get into situations where one part of the driver grabs the prepare lock and then grabs the lock for the ridge map and another another part of the driver grabs the ridge map first and then grabs the prepare lock then you have a deadlock um so that's about it for for today any questions i know it's this has it's sort of more like a plumber's talk rather than a elc talk so um so the question was that uh the response time for clock patches right now is uh abysmal and um is there any plan to have a co-maintenorship of the clock tree or something else um i've slightly talked to steven about this um personally i don't think i have the bandwidth to co-maintain everything um and i think uh what steven is now doing is mostly uh having platform maintainers do the review work for the platform drivers um i think merrick has uh has an issue with he's trying to get something merged into a clock core and is basically being ignored yeah yeah so um unfortunately uh there are two clock maintainers on the list uh mike is basically too busy to do anything um steven is sort of overworked so i think we do have to come up with some sort of co-maintenorship either informal or formal like reviewing each other's patches some something like that yeah any other questions um right now it oh so the question was can we get off get rid of all the clock uh the name-based lookups for clocks um i think right now there's a fallback path for uh clock get to fall back to global names and there's also a fallback path for the clock parenting lookups in the clock core and so to get rid of all the clock names uh name lookups i think we would have to like get everyone to update their clock drivers to at least have local references so you could get of get rid of all the names in the drivers yeah and then ideally you'd have uh device tree lookups in all the peripheral drivers that use the clocks um any other questions so the the statement is that um dropping all the clock names from the the driver core would likely cause uh the newer kernels to not work with the older uh device trees yes i believe that would be a problem because um even when i'm rewriting this uh the the clock drivers the media tech clock drivers they they don't have proper device tree references to each other so they have three or four clock controllers within their system and between them they pass like 10 to 15 clocks and right now they're all based on name lookups and so if you have an older device tree that doesn't describe all the clock relations and you run it on a newer kernel that does not do name lookups um it will basically break because the the downstream um clock drivers don't know uh what their parents are they can't do reparenting they can't set calculate the rates that sort yeah so i think we can eventually get there but it's going to take a lot of time minutes or hours later doing good okay so the right so the statement was that uh rockchip has these uh clock references that are done by name only because the the clocks may appear much later um and we are depending on the clock core to um essentially know when the new uh that that parent appears and then tie it into the existing clocks and yes that is a problem that we've also seen on media tech um essentially they rely on a certain um ordering of their drivers and um the clocks appearing the the clock the clocks can appear out of order and you want them to be able to be tied together essentially and um sometimes it may it might not work when you are using device tree to describe them because then the device tree uh forces either a certain order or you get circular dependencies which right now I don't think the the clock core can actually resolve properly yeah any other questions I think we're running way early is it any me right oh so the statement was that uh christoph is it yes so uh statement from christoph is that he worked on uh getting trying to get rid of the paralog uh big big lock uh six or seven years ago and it was just an rfc series and surprisingly it hasn't moved forward yet uh I guess it's a lack of reviewers and contributors yeah um are there any questions from the virtual attendant attendees no okay any other questions or we can wrap this up are we no okay well thank thank you for your time