 Okay, I welcome everybody. Hi, I'm Daniel. I'm the co-mentainer of the Linux kernel. So graphics subsystem beforehand. I was maintainer for quite a while for the Intel driver. I'm working for Intel's open source graphics center. And I'm going to talk about everything great about upstream graphics. And for context first, obviously this title is only relevant because 10 years ago I saw. Things were most definitely not everything great. So 10 years ago, we just about a bit more than 10 years ago, we emerged the graphics execution manager, which is like the first attempt Linux has ever seen as something resembling a remotely modern memory manager for graphics, which is the kernel mode setting, which is a bit of a label swindle because there was already kernel mode setting in the form of FBDF. This was kind of kernel mode setting for kind of big desktop GPUs with OpenGL integrated. So we just started this entire FBDF versus DRM struggle on total confusion about which mode setting and whether you actually want. They were probably celebrating OpenGL for two, like five years behind the industry. So it was pretty depressing. If we scroll forward to the day, the kernel, the graphics subsystem in the kernel is like 10% of the kernel plus user space. So it's one of the biggest subsystem there is in the kernel. So from nothing to kind of the biggest thing, or one of the biggest, we have 50 full atomic mode setting drivers, which is kind of the latest and greatest and display user space APIs that we support. Which is kind of fun because I've done a similar talk two months ago at Plombers and back then the number was 50. So we do merge these drivers at a pretty brisk pace. On the user space side, we support all the latest and greatest stuff nowadays with OpenGL, Geolias, Vulkan. And I think that the most impressive part really is we've taken these desktop mode setting and memory management ideas and cut them down until they were so tiny that they were tiny enough for the tiniest embedded stuff. So now we have the smallest full display atomic driver that we have is less than 250 lines, including white space and curly braces and the licensing comment. And the biggest one we have is like 2.2 million lines of code. Probably bigger by now. I didn't recheck that number. So we have like a factor of 10,000 between the smallest and the biggest driver. So yeah, that's that smallest driver. I mean, it's not a joke. It has hot-on-pluck support. It supports the full atomic user space API with compatibility layers on top for the old legacy atomic that we've merged 10 years ago. And it supports FPDF with all bells and vessels. It uses depth and memory management so that everything just gets cleaned up and the things get shut down. It has DMA buff import-export support so you can actually like render on a GPU and then display on this. It's for a tiny little panel behind an SPI bus. Like display it there and the kernel will do all the synchronization so you don't start displaying before the thing is or copying the buffer to the panel before the thing is actually rendered and user space just queues it all up and can forget about it. I just have suspend resumed which would be a grand total of two more functions and probably like five to ten lines of code. I guess the people who wrote this just didn't need this. So I do think like the upstream graphics subsystem has fully arrived in the embedded world for like tiny displays in tiny drivers. So let's look at that. All the things it took to get there. I think one of the most important things is this new atomic mode setting API which instead of like an entire forest of IOC-TLs to set up your display you have like this one atomic IOC-TL and you say here's all the changes you want. I want today for the next frame here's all the composition changes all the color management changes with all the red buffers do it. So the cool thing of this atomic method API is also the motivation for it to really span from embedded. We have lots and lots of scan up planes for low power use cases like displaying a video embedded in a browser window where you want to use a special plane for this video and phrase everything else up to the big desktops where you have lots and lots of outputs and the problem more is like how can I configure like the three four screens and outputs that I have without getting past the hardware limits. We added like blending the added right back so you can freeze down your composition and can use your display as a composition engine which is a lot more power efficient than rendering stuff with GLE else. Color conversion gracefully handling link failures nowadays like if you plug in a display port cable there's an entire computer on that thing and it fails and we can now tell the user space that if sorry something broke please try again or you need to like reduce the resolution change your desktop or we even have like content protections or Netflix works on Chrome OS and pretty much everything else. So this is the user space API which has made a tremendous improvement and kind of helped us unify between the fb def embedded build and the big desktop build where the kind of the graphics people or at least where I started personally and managed unified as well. But all if you have such a massive user space API are you don't write this in t150 lines of code. So what we also have is lots and lots of kernel internal helpers that help you break down this complexity and allow you yet to write a driver that implements this full featured API in very few lines of code. So there's the entire mode setting helpers for this atomic framework which is designed to be very model or so for example if you're a big desktop GPU maybe to suggest that the implementation flow or commit flow how you program the whole bit doesn't fit for all the different outputs. So you rewrite that for your driver but maybe your desktop GPU is very simple playing hard there. So we just reuse that and you're good. And on the flip side maybe you have a special sequence before you enable the plane so you can add that. So it's very flexible and most drivers kind of use something between maybe a few functions from that library to just use it completely. Then on top of that and this is what allows the school party track with the tiny driver is kind of the simple display pipe helper which takes I mean because if all you have is a panel where you just upload the next frame you don't need all these features. You don't have right back. You don't have entire color management and adjustment pipeline. So you copy the buffer and you're done and then there's one GPIO to enable the panel and shut it down or maybe reset it if it's very fancy. So we have the simple display pipe helper which takes all the super flexible atomic user space API and breaks it down to you have one thing and here's your update function. This is allowed us to when you take an old FPDAF driver from the FPDAF system and convert it over to atomic with all the helpers for outputs for EDID pausing on this simple display pipe. Those drivers actually shrink by usually a factor of two to four. So nowadays kind of the subsystem that started with big desktops is better at embedded than the subsystem that started at embedded. The self-refresh helpers I'm going to talk later on about it and you get full feature of FPDAF emulation with like one line. So including B blank support so that your proprietary Mali non-open source user space driver can just directly render into your frame buffer. It all works. Lessons learned from atomic. So atomic is kind of like a database. You commit and everything goes through or nothing goes through. And the traditional way to do that is you start committing and then you roll back and someone is going to forget that one register right or something and making sure it all works looked like very fragile design. We tried to do that in some prototyping. Instead of what we're doing in atomic is the entire state update is a complete new copy of the entire driver state. So for every scan out engine for every hardware plane for every output we have state structures or they're completely freestanding. So rollback is just you free a bit of memory and you're done which makes it very safe. And it also makes it a lot easier to catch a driver box in review because in atomic we also allow user space to just ask with this work which is needed because generally you first need to think about your composition like what are you going to use the display harvest or what are you going to composite with the GL engine. Obviously as much as possible with the display because that's more power efficient. And then once you've made that decision then you start rendering with the GL engine and then you do the actual commit to hold there. And if the third step fails you screwed because you've already done all the rendering assuming it will work out. So we have this atomic check. Check only our mode which just tells you what does work. Yes or no. And obviously this is not allowed to change any driver state or any hardware state. And by having these completely freestanding state structures we make it a lot easier to catch kind of bucks in this area. I said already like making the helpers as modular as possible so you can pick and choose that has been really good. I know the thing we did. So the model in atomic is you have objects which kind of represent things in your hardware and then properties which is things you can change and then value. So it's triples. But semantically they're all strings or a lot of them are strings like enumeration thingies. But for encoding we just have triples of unsigned integers. And the encoding and de-goading is all done in the core which I think helps a lot in kind of forcing standardization. So the driver never sees kind of these funny values. It only ever sees C structures where you can use real enum types in C or where you instead of having an object ID you grab a reference for that object and you fill in the pointer. And so all that bookkeeping is not in the driver code. They were also solving the locking entirely in the core. So if the driver for example has two outputs and you can share planes between them. For example we have four here and four on the other and now user space wants to use eight on the other. It needs to grab the state from the second pipeline so it can reallocate these planes. And obviously this needs a bit of locking because user space in the second thread might also want to do something over here. And for correctness the way the locking works is the driver just asks for the state objects in its atomic check function. It tries to compute whether this is possible or not. And internally every time we grab a new state the atomic core takes care of all the locking and the deadlock avoidance which is kind of a complicated thing because we essentially do graph locking as the only subsystem in the kernel. So this state getting might fail because we can't get the lock and you need to kind of retry. They'll be not so great as we definitely need new tests. So starting in the beginning this year this is now mandatory. We have a user space test suite and we should have proper user space API specs for all these properties which we need to do sooner than later. But it's not yet mandatory. It's not yet quite clear what's the best way to do this. Case study for a helper is like self refresh and manual upload so you have a panel and it has its own frame buffer and it does nothing. And every time you want to change something you need to manually upload the changed area of your frame buffer to the panel. And a somewhat similar concept is self refresh where if you continuously send frames you switch the modes to not sending frames. And obviously the trouble is anytime something changes you need to make sure that the update gets to the panel in both of these cases and you're not just showing the same frozen screen because you stopped updating. And there's lots of entry points. There's the FBDF emulation that you might need. There's the old KMS kernel mode setting. There's the new atomic interface. And what we're doing is with the damage tracking helpers we kind of provide the driver a unified entry point. So from the driver point of view like you rower up the damage tracking thing and all these user space APIs look the same. Obviously you can overwrite them if there's some legacy compatibility thingies. And user space can even tell you like I just changed this tiny part of the screen. Please don't lay up like that if you can. The same with self refresh. I mean self refresh is essentially you shut down the entire display except you don't shut down the panel because the user is still looking at that. So we have a help which keeps track of that like switches everything back on again as soon as activity kicks in. And essentially the only thing besides like plugging that helper implementation into your driver is in your panel's enable and disable function. You need to check whether you're going in self refresh mode or not and not shut down the panel if it's just self refresh. So essentially all this complexity is all in the helper code and the actual driver implementations is like one T handful lines of code. We have more awesome stuff here. Again all motivated by SSCs and embedded systems making the drivers more modular. We have breaches which are kind of transcode of thingies because everyone uses the same HDMI transcode more or less. We have panel drivers because they all have kind of different quirks. So you can have a generic driver and just plug in your panel driver with DT and we have quite a bit of support for more general components so you can stitch together your driver with kind of the standard Linux RMSock DT approach with the device tree burnings. There's also a lot of ongoing work in this area with trying to manage the state for these bridge chips better integrated better with atomic maybe expose it or like allowing you to chain these breaches kind of just generally more flexibility. We've had quite a bit of work in making hot unblock work. It's mostly really useful for development so you unplug your panel and plug in the next one into your SPI bus like it's good if the kernel doesn't freeze and at least for display only drivers this works now or on the render side we still have a lot of like a lot more kind of data structures that are shared with other drivers like for zero copy and all that or the reference scanning isn't quite correct yet and related to that are everyone loves to use the device managed allocation and and everything else APIs. The problem is that the user space visible data structures have different lifetime rules than your physical device and most drivers get this wrong so the way it's slow in the process of providing equally simple support but with the correct lifetime rules. So that's all the display stuff. There's obviously also rendering like GLM and things and in the kernel we have an entire bouquet of APIs created over the last 10 years to make not just zero copy work which is what DMA is for but make zero copy and synchronization and user space just submits an entire queue so you could do stuff like decode a new frame on your impact decoder and video for Linux that's at least that I think it's not quite managed there yet or pass that to like your OpenGL render block do something with it and then pass it to the display and the kernel using these DMA reservation and DMA fences behind the CEO behind user space makes sure that all these operations are ordered correctly so that they don't start before the previous one has completed which in extreme cases creates a graph locking problem where someone starts here and starts locking buffers and driver states and someone starts over here and then they meet in the middle and doesn't work so we have this weight band mutex stuff in the kernel in the core kernel actually which allows you to solve generic graph locking problems they can just have arbitrary set of buffers or whatever you want and just lock them in any arbitrary away and it does reliably detect deadlocks and get you out of a bind. This motivated by Vulkan are the DRM sync objects which is kind of these DMA fences for ordering concurrent things in a kind of more modern way that that fits into the spirit with Vulkan. If you do all that sharing drivers also need to agree on what the data actually looks like there's no reasonable or useful standard in that so in the graphic subsystem we created our own 4cc standard which is officially used by OpenGL, EGL and Vulkan and we have modifiers for stuff like tiling formats or frame buffer compression so the ARM frame buffer compression is now supported by quite a lot of drivers so you can not just do serocopy but you can do serocopy of compressed data and so saving more memory bandwidth and so this is kind of all the the user space relevant pieces we also have a lot of helpers for implementing these drivers a scheduler, a TTM is kind of the memory manager or one of the memory manager which is a bit monolithic and they're seeing a lot of refactoring we have VRAM helpers so for kind of the old chips that still seem to survive which have a little bit of onboard video memory although I think most of that is moving just to the panels but not always so you can manage that and then the S-MAM so the shared memory helpers which is for all the all the SOCs that just use normal memory and as usual like on the display say batteries included now obviously a graphics stack is not just the kernel or there's also lots of lots of stuff going on in user space so the big things there is the gallium layer that we have for writing GL drivers so GL especially on the desktop GL side a bit less on the GL ES side is a very old very quirky API and this this gallium thing essentially takes that and breaks it down to something that looks a lot more modern a lot more kind of almost like Vulkan where you just have constant state objects and nothing changes anymore and the driver can just write these these commands to hardware and becomes a lot more simple we have a huge compiler framework called NER which very creatively stands for new IR so new intermediate representation anyway and this is used by all the GL drivers we have reverse engineering tools which at least for some companies seem to be better and more powerful at documenting hardware than what the real hardware companies have and Cron ask the the industry standards groups that's defining GL and Vulkan and GL ES and all these standards they're also opening up so they have an open source test suite for conformance testing you you can do buck reports so in the in the user space side I would also say lots of great infrastructure and and good movements to its open driver so specifically for SOC since this year across the kernel and the the Mesa 3D user space stack we have mostly reverse engineered but drivers for pretty much anything you can buy in SOC from the tiniest Vivante course which still don't have an IOMMU or anything else really up to to kind of the the big SOCs that IMD and Intel are shipping all all these drivers use the gallium framework so even Intel has has now switched over to that and I think one of the most interesting stories that's happening right now is there's a Vulkan driver for AMD hardware the the Rad V driver which is not developed by AMD because the AMD one is kind of closed source throw stuff over the wall open source whereas this one is developed by customers and community and OSVs are plus the ACO again very creative name is just the AMD compiler which is a new compiler based on this nerth thing it's really it's a very small team like about a handful of people all of customers who use this like Wolf is using this on their steam machines steam boxes with Linux and they're baiting AMD so just to show that that how how good these helpers and all this code is in user space or it's it's competitive against an entire company so maybe short interlude because it's it's a frequently asked questions why why do the kernel people insist on open source user space uh one is it's it's just a technical necessary necessity and other subsystems like rdma and I think media is also moving that way that create a lot of user space api agree you can't review the kernel side if you don't see the user space and the other thing is also it's a bit of cake and eat it to have your cake and eat it to your situation I mean from the upstream a customer value standardization on the if you want to do lots of or then the value at in in your close source user space it's kind of you want to have it both ways what just doesn't work uh so the the recommendation and that's that's officially okay with everyone is if you want to do special sauce vendor lock in uh in in in one driver you know in your close source driver you can just do that but we still still require uh an open source implementation for the user space api and so for pretty much all these chips for sscs there's now a dual stack implementation so if you're asked the right people in those companies with the right questions they can give you both a fully open source user space or uh like the close source kind of standard sometimes more standard all driver running on top of upstream uh which leads us to the next thing so how do you ship this because like in abstract I promise it's not just awesome upstream you can actually ship this and uh recommendation I would say is definitely just do a dual stack with close source and open source user space uh both running on the one single upstream kernel driver and then there's still the problem for back polling because upstream is just not used often enough like everyone's hanging around on lts kernels that are at least two years old on the recommendation there is don't backport the driver that leads to manners because then you first spend refactoring the driver and making it really small and using it all the all making sure it uses all the helper functions and then you you you need an equally big backport team that reacts all the crap it's just pointless so backport the entire subsystem that's pretty much what everyone does and from what I'm understanding the android kind of stable kernel interface thing the GKI I think it's called will also switch that to that model so every time android adds new kernels they'll backport the latest upstream DRM subsystem to all their supported kernels so that you don't have to like live evolved stuff or also nice stuff going on on the testing side or we have lots of in kernel unit tests so we're very much looking forward to K unit finally having landed so we can convert them over to something uh uh more standardized be using the ITG GPU test saw um test suite which is going to a cross driver test suite and user space to validate and as I mentioned for any new user space API having having validation test suites in there is now required to to kind of help make sure that all the drivers implement these interfaces the same way for validating we use CRCs from the hardware so you can render in software and then you can like put the YUV plane somewhere and render it using hardware planes and we compare the two CRCs uh to to make sure everyone implements it the same way and right now uh unfortunately not all hardware has a CRC support some have right back which is more powerful so right now there's there's patches on the review for IGT uh to add uh validation with right back uh I think we also have a pretty great uh community uh they're switching over to GitLab so mailing list of that at least for everything except the kernel for the kernel it is a bit uh stalled on infrastructure work but the idea is very much that at least pull requests from subsystems will will use GitLab long-term with nicely integrated CI and I do think we will at least run experiments with opening up GitLab pull requests kind of nice modern integrated review code change and CI solutions for for contributors our conference which is called XTC is is also nicely growing and the going full professional now we have sponsors since since last year or slight outlook or DMA buff heaps uh which formerly or still called ION is in staging is is getting destaged on I'm hearing it's happening real soon now that's I think in version seven or so so hopefully one of the next kernel releases this will have happened uh there's a lot of work going on uh and kind of user space allocators I mean it's it's nice if you in theory can share compressed buffers with zero copy between drivers but if no one can figure out that this is possible uh it's it's a bit of a tough sell on the current solution I think with Android is you just hard-coded in Growlock uh doesn't work so well on desktop and doesn't really great design for for SOCs either so there's this quite a bit going on there and another thing that's not really in patch form yet uh there's but lots of discussion going on is uh integration with kind of the media side which is solving a lot of the same problems around buffer ferments and sharing buffers and integrating in all into an overall pipeline there sometimes they have a nice solution or sometimes like the the display side the DRM subsystem side has has a better solution or kind of more experience and it would be nice to somehow figure out how we can get these two subsystems to to work together a bit more closely but the details are I think entirely up in the air how how they look like so summary uh DRM the the graphic subsystem in upstream uh it scales by a factor of 10 000 from the tiniest to the biggest uh I really think like nowadays we have batteries included for everything or for shipping I would say that the standard is a dual stack you have your one single upstream kernel driver uh for GL and GLES and you have for dual stack in user space with a reverse engineered mostly most often reverse engineered uh driver to kind of justify the user space API and uh the close sort of stack with all the value at and uh for shipping second point is you just backpull the entire subsystem that's what everyone does and that's it I think we have maybe no time for questions but perhaps if there's a very short one otherwise thanks a lot for listening