 I am Gustavo Fadovan and I'm here today to talk about the unification of the kernel graphics stacks between Android and the mainline Nux kernel. It's a work we've been doing the last past few years and we finally managed to get the first bits together. I work for Colabra. Colabra is an open-source software consulting company that's highly focused on the graphics, kernel work, multimedia. We do some web browsers as well, LibreOffs, build integration. We basically help our clients to be more close to the upstream community and deliver better products using open-source. As for myself, I've been involved in the Nux kernel for about 80 years now. In the past, I used to do a lot of Bluetooth work on the Bluetooth service system and then I shifted my focus to graphics in the past few years. One of my main works have been on the explicit synchronization framework for the mainline kernel, which is one of the pieces that I'm going to be talking today. I'm also one of the committers on the experimental Jira Misk 3 in the kernel. We've been trying to play with a different model when pushing paths mainline so we don't have only the maintainers there. We have a lot of people with commit access. I'm one of those persons. We have 10 or 12 people there these days. Here's what I'm planning to talk today. My idea is for you to walk away from this talk with two main things. I want to talk a little bit about the reasons that Android decided to create their own graphic stack in the first place. And then we are going to move to talk about what happened on the mainline side that made possible to have Android working with having a graphic stack that will be able to run on top of the mainline in Nux kernel. So let's start talking about the Android side first. Here's a very minimal scheme on how the Android stack works. I didn't add, I didn't left to all the components. I removed some of them to make it more simple and more focused for this talk. So on the top we have the Android apps. They would provide the buffers with the content that they want to put on the screen and hand those buffers to the surface flinger with the Android compositor. The compositor would get that buffer and along with the buffers for maybe your navigation bar and your status and active bar, we would get those buffers together and send a request to the lower layers to get those buffers on the screen. That request would go to the hardware composer layer which is part of the hardware abstraction layer on Android. And in that layer you can do a lot of vendor-specific configurations on the request. And that request would later go to the kernel to be displayed on the screen. And in the kernel there is a component that's the atomic display framework that Android built from scratch. After analyzing what was the available options on upstream, on KMS and FB Dev, they decide to go and write their own display framework. And the display framework, after doing a lot of core handling, it handles the request to the driver, into the vendor driving the kernel that can actually put that on the screen for you. So let's talk a bit more about AGF. So after analyzing what was happening in the mainline kernel, I think that was like five years ago, Google realized that KMS wasn't a good fit for their needs, and they decided to go with a new implementation from scratch. So they created AGF. And one of the main features of the atomic display framework is, of course, the atomic update of planes. So in Graph2, we have this concept of planes. It's basically you have your screen, and you can divide it in different regions, and those regions we call them planes. And those planes, you can have different buffers associated to it. And usually, when you want to update your screen, you want to make sure you're updating all the planes at the same time. And that was something that KMS could guarantee back then, and AGF brought this feature off the atomic updates of planes. Because if you don't have this kind of atomic update, like in KMS before, the user could start seeing a lot of tearing on the screen, and the user experience would be really, really bad. Also, Android needs some custom pixel formats, because some hardware start appearing that required custom pixels formats to communicate data between higher blocks inside the hardware, and KMS couldn't support that at the time. So when they designed AGF, they had support for those custom pixels formats. And in terms of implementation, AGF was designed with the idea of a driver-specific blob that you could pass around. So because they had the hardware composer layer, which is vendor-specific, it's part of the abstraction layer, you could have this type of driver-specific blob, because you would build your request from the user space on the hardware composer layer, hand that to AGF, and AGF would hand the blob to your driver for a scan out on the screen. That made AGF being built as a monolithic mid-layer, so most of the logic in that framework was part of the big mid-layer, and then drivers and vendors would use the driver-specific blobs to turn any extra in a specific configuration. And one other feature is the explicit synchronization of buffer sharing. So when we talk about explicit synchronization, we are talking about synchronizing the buffer sharing between two drivers through user space. The opposite of that would be implicit synchronization. That was what existed before. There was no implementation either on Android or mainline for explicit synchronization. And Android started to have a lot of problems with implicit synchronization, because basically every single vendor would start implementing their own way to synchronize the buffers they would be sharing between the hardware blocks inside the vendor driver, and a lot of bugs would pop up, and things start getting really, really, really bad. Moreover, there could be situations where your holders, the stop would freeze because you didn't know what was happening inside the kernel, because you only had this implicit type of synchronization that doesn't provide any type of information for user space. So they decided to, as they were writing the whole display framework, they decided to add support for explicit synchronization on it. And for that, they create something that's called the sync framework. And here is an example on how this framework would work, how it would do explicit synchronization. So first, let's suppose you are going to send something for rendering on your GPU, and you send that call to your GPU. Your GPU will then schedule your request and send back to your space a fence. User space could then get that fence and pass on it to the other side to ADF, to the display driver side. And that fence would be attached to the thumb commit that would use the same buffer that's still being handed by your GPU. And ADF would do the same thing, would schedule your request, and then give you a fence back that you would go to your space. And that fence can then later be sent to the GPU again in another handler job that you use the same buffer. So you create this kind of consumer producer queue with each side waiting for the other side fences to signal. So when the fence is signal, you are allowed to use the buffer. So if you have a fence from the GPU side, and you are waiting for that fence to signal before putting your buffer on the screen, the ADF will just sit there and wait for the fence to signal before going ahead with any scan out. So as I said, Google creates the sync framework to help ADF with the work of getting explicit synchronization on Android. And the sync framework was basically a way to pass the fences between user space and the kernel and use file descriptors for that. So we would create files and associate file descriptors to that and send to user space. One nice thing of having file descriptors for it is that you can then pass between your user space process and Android does that a lot. So it was really good. There is three main objects in there, the sync timeline, the sync point, and the sync fence. And the sync timeline is basically the counter that's going to guarantee the order between the fence that we are creating in a specific driver context. So every screen in our system could have a specific timeline associated to that or every GPU ring would have its own counter. And then your fences that actually are named sync points on Android, those sync points would represent values on that timeline. And they can basically have three different states. So you create sync point on a given timeline and that sync point is associated to a job is not exactly like a buffer, it's to a job. And when it creates, of course, on the active state, when the job is finished, your GPU finished like handling that buffer or you had something on the display side that was going to the screen, also that job is finished either as a scan out or a handling job. It goes to the signer state and if something goes wrong, we have the error state as well. The final object there is the sync fence. It's basically a file where we wrap all the sync points that we need to wait on. It also has the same kind of signalling and it's basically used for passing the fence around so we create the file, we associate some file descriptor to it and hand that to the space. So that's, and in terms, okay, and one nice thing of sync fences is that we can merge them together. So if for some reason you need to wait for operations on the same buffers that will happen like on a different place on your system and you have timelines on those two different places on your system, you can merge them together to make your life easier to wait for the job to be finished. So in this case, we are merging two sync points from two different sync timelines on the same sync fence. That's quite useful for a number of user cases. And in terms of APIs, it is really simple. You can, like, in your space, there is some way to wait. I also want to wait on the sync point to signal. You could use, like, pull or select as well on the file descriptor. You could merge, as I said, and there is a third IO control to get some information from the sync fence and the sync points inside that fence. So that was what I want to talk about how ADF was created and how it was designed. Now I want to move on to talk about what we did on the mainline side in order to get Android to run on top of the mainline KMS infrastructure interfaces. And the first thing that we need to talk about is why ADF wasn't added to mainline, why it wasn't rejected. So while ADF solved many of the problems that Android had back then, it wasn't like that it wasn't a suitable forming line for a couple of reasons, and I'll try to list some of them here. The first one is that ADF had a single PDAQ, and for Android that was okay, because they had only one screen in the system. So if you have only one screen, a single PDAQ is fine, but when you come to Linux there's a stop where you can have three or two many screens in your system. If you have a single PDAQ and you have screens with different frame rates, things start going a little bad because you would start either dropping frames on your screen or speeding up your other screen because you need to put everything in the same frequency because of the single PDAQ. So in mainline you need something that could have many different PDAQ for different types of display. And another thing is that ADF didn't have any atomic operation for mode sets. So mode sets would be basically changing the screen resolution, the frame rate, and the output routing. In Android there was no need to change output routing because there was only one screen in your system, but in mainline you may change how your images, how your screen are set up, you can like mirror or you can like make a primary, secondary, that's kind of the output routing. And there was no support to do those operations in an atomic manner and update the planes on that screen at the same time, so that was something that was missing on ADF. Moreover, the fact that ADF was built as a mid-layer made it like quite inflexible for mainline, why in Android for most of the drivers in user cases it was working well. That was mostly because of the drivers specific blobs in there and in ADF and Android it was possible to have this layer on top of ADF and another layer below which would be the drivers that could create and read those drivers specific blobs. In mainline that's not possible because we've been trying to make the KMS API as generic as possible on the display side, so you could have like a lot of generic compositors that just know how to talk the KMS APIs and work on whatever hardware. So that was another problem, the inflexible mid-layer with the drivers specific blobs that was creating, the proposal was to create a new and not generic user space API because of those things and that would hurt the Linux desktop and many other users that would have been using KMS for a long time. So for those reasons maybe a couple more ADF wasn't suitable for upstream and it ended up like rejected by the community. But at the same time there was already some discretion on getting some sort of atomic mode setting infrastructure on the mainline on the mainline kernel and a couple I think a year and a half ago we managed to finally integrate in the mainline kernel the DRM atomic infrastructure and of course the two main features of the atomic mode setting infrastructure is the atomic update of planes as ADF also had but in mainline we have atomic operations for mode sets and that's something that we really want to have ADF didn't provide. And that's achieved on mainline in terms of user space API by a single control that we have in there, the DRM atomic control that uses the already in place proper infrastructure. So when you want to send a new request to the kernel to update your screens you would build this request like with which screen you want to use with which resolutions frame rates, the planes that are going to be used on those screens and the frame buffers, the size of that buffers, all the configurations that you need to set would be added on this proper array that you create and then you send that to the kernel in a single IO control in an atomic manner. And when that request gets to the kernel there is a couple of phases that the request has to go through, the two main ones is the check and the commit phases, those are the two most important operations when we talk about atomic mode setting the kernel. The first one is the check, that's one of the most important part in there because as we talk about atomic mode settings and plane updates we want to make sure that the request we are going to send to the request we want to put in the screen is going to be on the screen because sometimes you can configure something wrong or the rider doesn't accept the configurations that you ask and your request would fail and you're going to make sure that the request either goes to the screen like every single bit of it, the whole configuration that you ask, we put everything on the screen or you don't put anything on the screen so it's our own approach, that's one of the reasons we call this atomic. So we have the check phase to make sure that everything is going to succeed when we go and try for real to program the hardware and update the screens. So the separation between check and commit phase is one of the core features of the atomic mode setting and that allow us to have some interesting things like the test-only flag which allows the user space or the compositor to send a request to the kernel that would as it could only the check phase. That's interesting to find the best configuration options for your system. The compositor might try some few different configurations before going to, before we're all committing in the hardware so by doing that sometimes you can hide from the user some bad screen or some configuration that doesn't work because you're only testing that, you don't need to commit to the hardware, you can keep the user with some configuration that works and then in the background you just keep testing different configurations. And differently from AGF, the main line of the mode setting, it's highly extensible to a very interesting set of helpers and the helpers like they are very minimal for very single operations like for the update plane or to set a mode, there's a lot of different helpers. And during, one nice thing of atomic mode setting is that during the time we were implemented and partying new drivers for atomic mode setting we managed to drop a lot of code from other drivers because atomic mode setting made driver's life so easy that we were able to drop lots and lots of lines of code from drivers. Because the helpers are so smaller these days that you don't have to do many different operations, the helpers are really small. So today getting a new driver's upstream that uses atomic mode settings is just a matter of feeling some helpers and now the basic support for that new hardware and you're mostly done. You don't need to write lots of code anymore. And after we got atomic mode setting upstream there was a few other things that was still needed for getting Android to run on top of the main line of kernel and one of the main things was getting the sync framework in the main line of kernel. By that time the sync framework was already in the stage 3 because there was a lot of interesting on getting it upstream. But it stayed there for two or three years with no one touching it and to the point that we decided we would work on it and try to get this working upstream with the DRM atomic mode setting interfaces. And by the time we decided to do that most of the features that the sync framework added to the Android kernel some of them were needed more because in the past few years we added the fence synchronization mechanism to the kernel. That's a way to synchronize their buffers between drivers inside the kernel and that fence mechanism it replaced what we were doing with sync timeline and sync point in Android so we decided to just remove it. And in the end we realized that the sync framework was needed only for passing the five descriptors around between the kernel space so we tried to keep it really minimal and remove all the rest that wasn't related to passing five descriptors around so we removed the sync timeline, we removed the sync points and the sync fence that were not related to passing five descriptors around. We worked this part quite a bit to fit the upstream needs. So the first thing we need there was to rename it because the name became somewhat confusing. We already had the fences infrastructure inside the kernel and then we were trying to add something that was called sync file because it was a file. We had to break the API in the process but the only user at the moment was Android. There was no upstream user for this infrastructure so we broke the API to make it more future proof. And we also provide some pets for the Android operators project to work around this API difference so the pets are right upstream on the Android project. We removed a lot of API leaving only the file descriptor specific functions in there, only the infrastructure necessary to communicate and to create sync files and communicate those sync files with your space so there is basically like two functions exported in there. When you want to create a new sync file, you have like a reference in the kernel that you previously created, that's associated with some buffer, you get that fence from the driver, you create a sync file and later you can create a new, you can ask for a new file descriptor, you associate that file descriptor with your sync file and you can pass that file descriptor to your user space. And then the other around when you receive a file descriptor from the user space, that's a sync file file descriptor, you can extract the fence that are inside that sync file, so that's basically that whole framework that existed in Android before is now those two functions that are exported on the mainline kernel. And then after we had the sync framework out of this stage three, the next step was to actually use it because we now had the needed interfaces to add support for explicit synchronization on KMS. So we started working to get the atomic mode set interfaces working under the explicit synchronization fashion. And the way we did it was basically trying to extend the DRAM properties that were already there, that we were already using for a talk mode setting. And we managed to do the explicit synchronization work entirely in DRAMCore, so every single driver in KMS, they already support explicit synchronization by default. You don't need to do anything else on the driver. By just adding your new driver to the KMS infrastructure, you have out-of-the-block support for explicit synchronization. And there is two type of fences. There is the fences that, as I showed on the diagram before, there are the fences that we are sending to the kernel for the kernel to wait on that. So we have the fences that are sent to KMS. We may be getting those fences from the GPU driver, because the GPU driver may be using that buffer for some rendering job. And we will wait for that fence to signal before proceeding with Fianna's kernel of that buffer on the screen. On the other side, we have the out fences that are created by KMS and Sino by KMS. And this work is present on the 410 kernel that was just released last Sunday, I think, last Sunday. A bit more detail on the two type of fences we have in KMS. As I said, the in fences, they are the fences that we have to wait on before putting the contents of the buffer on the screen, because the buffer is still being used by someone else. And that's done by, you send that fence to the kernel by using the in-fence file descriptor proper on every single germ plane that's on your screen, your interface. And then the germ core code is responsible to get that file descriptor, figure out what fences are in there, and call one of the helpers from the atomic helper infrastructure to wait those fences to signal before proceeding with Fianna's kernel. On the other side, we create fences. We create the out fences on KMS and Sino then. And to communicate user space, we create a new proper that's the out fence pointer proper. And user space would pass a pointer that the kernel will fill with the file descriptor number. And for out fences, we are doing one fancy per CRT-C. CRT-C is basically a description of your pipeline, which planes are connected to which screens. And you represent that connection with CRT-C. So we have one fence per CRT-C because we thought it wasn't a good idea to have a single fence per plane, because they would always signal at the same time, because we decide to signal the fences at the moment the buffers go to the screen and for every CRT-C, those buffers will go to the screen at the same time. The way you signal fences in KMS is a bit different from what Android was doing, because here we are signaling fences when we put the buffer on the screen and on Android we are signaling those fences when the buffer was out of the screen, so it's just a bit of difference. But I'll talk later on how we can work around that on Android. And the next piece we needed to solve was the hander side. The idea is the same. It's quite similar to KMS, because of the way we decided to make GPU drivers, we had to go and extend every single driver's interfaces, the exact buffer control, and then add support for sync files and fences inside that driver. That's something that's already done on the FreeDreno driver, and we've been working on i9-15, VGL, and some other drivers. There are pets, you can find pets on the main list, but we didn't push anything upstream yet for those drivers. That's what we did on the kernel side for upstream. Now I'm going to discuss a little bit what happened on the user space side to get Android to work with those interfaces, with DRM atomic mode setting and explicit fencing. The first thing I'm going to talk is the hardware composer. The hardware composer is actually only an API specification. It sits on the abstraction layer on Android, but it's basically specifying the APIs that the Android composer, Surface Pling, needs to use to put things on the screen. As I said, the way we signal the alt fences, they are different on the main line and the way they were done on AGF. On the hardware composer one, which was the one that was working with AGF, we had what Google was calling speculative fences. In this case, the fence, they signal when the buffer is out of the screen, so we scan out something in the screen and then the next buffer arrives. You remove the buffer that's on the screen and that moment that buffer is out of the screen, the buffer is free to be reused. You signal that fence. But then what we did in the main line was the opposite of that. At the moment the buffers go to the screen, we signal it and then Google had to change the way they signal fences because of this change on the main line kernel. So on the hardware composer two, they're trying to go for non-speculative fence. That means they signal the fence right after it goes to the screen. It doesn't wait for the next request to arrive and then signal the fence later because we're signal fences when they are on the screen. That means the previous buffer is free and you can reuse the previous buffer. But that's only the API specification that doesn't solve any problem yet. We still need to implement a hardware composer that actually can talk to the DRM hardware composer, can talk to the DRM hardware composer APIs. And that's the job of the DRM hardware composer. That's a project that's part of the Android Open Source project. It already supports atomic mode setting. And I think Google used the DRM hardware composer on the Pixel C device. And we've been trying to get support for explicit fences in there. It's still a work in progress. It's mostly working, but we've been hitting some issues with GL and threading. That's something that Robert Foss is working on at the moment to collaborate. Google is quite interested in having this work upstream in the Android Open Source project. We probably will be solving the issues in the next month or so and get everything upstream. And one final piece of this work is the two massive extensions that Android created to have support on the GPU side. So there is just two extensions, one to get an out fence from the GPU driver, another one to make our GPU driver to wait on a fence to signal before proceeding with any job on that rendering request. Those two extensions are already upstream on Mesa. We managed to add support for Dan. I think there is upstream support on the Mesa driver for Fridrino. And I915 and VRGL is a work in progress. There are pets out there. I did the VRGL part, but we didn't manage to get that upstream yet. But we have the pets for it. That's how I want to talk about. That's definitely a good start. I think there is some other things we need to work on on the graphics side. One of those things is happening, kind of happening at the moment. There is a lot of discussion going on about the common locator. But at least we already have a way to get Android to use the KMS interfaces to run Android, so hopefully we will start seeing devices going to the stores that are using KMS interfaces anytime soon, because the whole infrastructure is there already. And I hope we will start seeing those devices in there. And in terms of support for Linux desktop, we've been working on getting support for Wayland on explicit synchronization. Maybe we'll get any support for X11 as well. We've been discussing those features, but we don't have anything done on that yet. It's still on our to-do list. And I would like to thank everyone that was involved in this job, all those people and maybe more, reviewing code, proposing ideas and all sorts of things. Yeah. So that's it ahead for today. Thank you, everyone, for listening. Questions? Sorry. Yeah, the sync timeline is kind of independent of the blanks. Just a way to keep the order between the fences that we create in there. But the way we signal the fences is tied to the V-blank synchronization signal. Okay. ArcSys++ from Chrome OS. On Android? Yeah. Yeah, the change we made on the sync framework was basically like some APIs. And that's upstream on the Android open source products. They have the Libsync with subtraction between inside Android. So we'll just use the Libsync. You don't need to know which kernel you're using. If the sync framework comes from mainline or if it comes from Android. Sorry. It's backward compatible. Yeah. We did the change in a way. It doesn't matter which kernel, which interface you're using. It's going to work. Okay. Thank you, everyone.