 Okay, so hi, I'm Simon sir also known as immersion and I'm going to talk about a little stuff So yeah Little stuff goals is to take advantage of KMS planes. So I'll first explain what is a KMS hardware plane Then I explain what what is the belief tough and what's the current status of the library and then We'll see what the next step are So this talk will be it is designed to For mayor mortals to be to understand what what the Leave this office. So if there are experts in the room, maybe it will be a little boring at the beginning, but yeah So first yeah, what what's an hardware plane? So first it has nothing to do with actual planes So it's a hardware features in GPUs So before getting into hardware planes, let's just see how we get a frame and screen How do we choose show something on screen? so Basically there's a user space Program that wants to show something on screen. So it has a frame So here's the typical screen with a terminal and a few windows So it talks to the kernel. It's a kernel interface called KMS So it submits a frame to KMS KMS then Talks to the driver so you have a different driver depending on the GPU vendor So for Intel it for Intel it's I 9 15 for AMD. It's AMD GPU And each driver will program the hardware to display the frame and screen So One important thing is that since I don't know Few years we have a new API to do this which is called the atomic API So submitting a frame is now atomic so you don't you don't have a Partial frame on screen. You don't have cheering and things like this So it's it's much better than before With a negative API you could have corrupted frames that way so So the client is typically called a compositor Because it will take a few windows So a terminal window Calculator window document window here and it will draw all of these windows into a large buffer So it will actually perform a copy and then it will submit the final frame buffer to KMS So nowadays this is performed using OpenGL so Plains basically allows the compositor to not to copy and not to use OpenGL at all But to submit directly the client buffers to KMS So here uses the compositor will submit three buffers in one in one step in one atomic commit With some metadata. So for instance, it says terminal window is on the top left the calculator window is on the bottom right So you just submit this rule state to KMS and the hardware will perform the composition directly in the scan-out engine So You don't copy anymore And you have an API to set properties and and all of these windows like the position So why do we want to use planes? so So it's zero copier as I've said so sometimes it's very important because some hardware Needs a lot of time to perform copies Depending on the location and buffers and everything for instance arm GPUs It makes a big difference You also get lower latency if you don't do a copy Because you don't need to wait until the the copy is over to submit the frame to KMS and It it also improved power consumption Because when you use planes the render engine Can go to sleep the so the The part of the GPU that Is used for open GL is not used anymore and and can is not using a battery Only the scan-out engine which is sending frames to to the screen is still awake But planes come with some downsides too They are very hard so pretty hard to to set up and use Especially when you don't control the buffers, especially buffers come from clients So right now compositors don't really support planes There's one exception Western supports Plane pretty well But apart from that everybody just always Composites the rule image. So one little exception is the cursor plane So when I move my cursor here most compositor are able to put it in a plane a special plane called the cursor plane But that's the only thing compositors are able to use except Western So One of our issue with planes is that they are so I said planes are hard to use and That's because they came with some constraints. So for instance here are the planes. I have on my Intel machine I have three planes the first Plane here is able to display Buffers with only a few formats So for instance, it can do C8 RGB 565 and all of this in the list The second plane can do a different set of formats with why UV formats and the last plane is only able to do a RGB So you can see that I only have a limited number of planes with a limited number of formats supported on each plane so for instance Let's take an example with these three windows and the three planes I have so for instance if This window is using a RGB this window X RGB this window X RGB Then I can put everything into plane and everything's fine But if the first window is using extra GB then I can do that and I can't use planes so planes come with a Large number of constraints. So we said number of friends formats, but that you can be also some constraints on the buffer size Some constraints and the Z position so which plane is over whichever plane So for instance on my machine The first plane would be under the second plane which is under the last plane And I can change that Some hardware is able to to change the Z position There are also some bandwidth constraints. So if I use windows that are too large, I can't put them into planes And a lot of real stuff for instance And my why UV plane if I want to and my second plane if you want to display why UV buffer The position must be even I think and if it's the position is odd The position coordinates are odd, then it doesn't work That's so that's Intel specific of course and every vendor has some Vendor specific constraints like this So yeah, that's a mess And the only way we we know whether some combination will work on that is to perform What we call an atomic test only commits so we basically say I want to display this window and this plane this other window and this other plane and We ask the kernel we ask KMS will this work and KMS says yes or no, but we don't know why So yeah, that's how we use planes basically so Now we can dive into what sleep liftoff. So the goal of leave leave liftoff is to make it easier to use hardware planes One goal is also to not abstract too much to be as as as a thing as thin as possible a thin layer of abstraction to don't get into the In the way of the compositor. So if the compositor wants to use special hardware features or So we don't we want to let the composer Customize a lot how it uses a planes So we had a workshop at XZC last year and so this this presentation is basically What what we've done so far and what's what are the next steps? so the basic idea behind leave liftoff is to Expose some layers so layers are virtual planes They are the same as planes you can set the position you can set the buffer you can set a bunch of properties in them But they don't have any constraint so you can you have as many layers as you want and you can set Any property in them? It's fine So basically compositors can use layers just like they they they would use planes And then leave liftoff will See which layers the compositor has set up and then try to map them into planes So yeah, leave liftoff performs a layer to plane mapping So let's see a very basic example So it's a pretty simple API So for each GPU you can create a leave liftoff device and then for each device you have a bunch of Outputs you can create an output leave liftoff outputs And then you can create as many layers as you want Here I create a layer. I set the frame buffer ID. I can set the position I can set The scaling method and bunch of stuff like this. So these are just planes for play standard plane properties Yeah, probably a real-world example. We set up more layers like this and then The compositor can just call this function liftoff output apply and this will Fill an atomic commit Will use the planes that planes state And the compositor can just perform the atomic commit So send all the properties to the kernel and to display a new frame and that's all so all the compositor the compositor has to do is set up a bunch of layers and then This was done. This is like before like this doesn't change Compositor already doing that before So one small Problem with this is that sometimes Some window cannot make it into a plane So we've seen before that if this uses XRGB then I need to fall back to composition But I don't need to do that for so yeah for so in this example if this too Can't be put into a plane we must fall back to composition to copy them into a large frame buffer like before we open GL and If this makes it into a plane then we don't need to copy it So this is a mixed mode where you we use planes for some windows, but not for all of them So the way it works is that So yeah, what when important thing before You know we can manage this situation is that LibLift doesn't perform any composition So the compositor is still responsible for using open GL and the copy window buffers into the larger buffer so basically Compositors will have this larger frame buffer they need to mark it to tell LibLift This is the composition layer. So please if you need to fall back to composition from for some windows Please use this layer to Indicate I need to copy to perform some copies and this so there's a function for this and then Compositors needs to after performing the Calling the function that performs the mapping LibLift of output apply Compositors needs to check each layer and if the layer couldn't be put into a plane Then the the composer needs to copy it with open GL So the German status is that So the layer to plane mapping works We have some support for collision detection So for for instance if you have two windows that don't collide then you don't care whether a plane is on top of the other or not You can just put this the two windows into planes and you don't care that the relative position We have support for basic incremental updates, so if you only updates a buffer Property and you don't update any other property. We can reuse the previous mapping we had We don't need to we compute the whole thing and one important Item that we we have some unit tests I think I feel this this unit test are pretty important because it's very easy to change a bit of the algorithm And then get it wrong and it's very hard to debug So we have a mock LibD arm library which fakes some hard hardware planes and checks that LibLift does the right thing So I also focused on doing some field testing so Making sure that In the real world it makes sense to do this to do what LibLift of does So the first thing I've done is Starting a glider which is an experimental compositor using LibLift of This is just to prove that the IPI is fine and that with real Clients with well windows it works So the goal I'm working on WL roots, which is the Waylon compositor library and I'm working on Making making it ready for LibLift of so we need to add a bunch of new APIs and we factor some some things There's also valve is working and a compositor called game scope for the steam OS Distribution and it will use LibLift of as well So that's pretty cool to have some other people trying it out So we've glider have performed some very very early benchmarks and So We can see here that so the compositor is the first item It will this test was done by using a very simple clients that just was a gradient in a 250 per 250 pixels Buffer so pretty small buffer and We can see that he has a poor estimate is like Yeah, it's pretty good We use less power than before so that's a good sign so but yeah again It's we need to like take a step back when reading these numbers because The the when falling back to when using composition. There was no optimization like damage tracking or anything like this so when doing compositions a wool screen buffer was copied each time and When using planes or absolutely no copy whatsoever, so yeah, that's pretty good But maybe real world scenarios will be like not as optimistic as this and clients will probably use a lot more power, of course So yeah, what's next? So you have a bunch of short-term goals The first one is to perform more benchmarks One issue is that When we leave leave lift off as a bunch of layers on it to try to map as many layers as possible to planes The algorithm takes quite a long time because we need to perform a lot of atomic comets so we need to Take a layer put it into a plane and as the canal will this work if it does work Take the second layer try to put it in the plane ask the colonel again. Is this fine? So this takes quite a long time Depending on the hardware configuration So I have some ideas to try to find the best layer to plane mapping a little faster We need to test more hardware and see how it behaves but One important thing is that The incremental updates saves a lot of time so most of the time you only change the buffer you don't really Change the position of the windows a lot So you just update what the contents of the window so Incremental updates are very important to mitigate this also So the goal is here is to not miss a frame. We basically have a budget For if the screen refresh rate is 16 60 hertz then we have 16 millisecond to do everything to do a new frame So yeah, we need to be fast We need also to better support Oh, yes, so layer priority is an important thing So if if a window updates a lot, so for instance, if you have a video player each frame Each frame will be a new one and you will never basically reuse the previous frame Then you really want to put it into a plane to avoid having to copy the thing and So You you want to yeah, you want to do to try to see which which layers are dating more often than the others and put them into planes in priority I'd also like to have better multi output supports Because right now The first output that comes will take all the planes it can take and if you have another output Then it won't it won't have as many planes So we need to be fair when splitting planes across outputs and also we need a way to migrate planes So for instance, if I use 10 planes and one output and the new output comes in I need to migrate planes from the first one to the other one and it's a little bit tricky because Sometimes the output and that refreshed at the same time So for instance, if both outputs have different refresh rates Yeah, that's some synchronization issues here I have also a bunch of long-term goals So the first one is to have a feedback loop So basically the idea is that right now clients Allocate buffers sending to the compositor and the compositor needs to do the copy And so the clients decides what the buffer format is so if you the client decided XRGB But then the compositor Cannot put it into a plane So the idea as a feedback loop is to not just be sad and say okay I can't put into it put the window into play in life. It's terrible The idea is just to let the composite to add a way for the compositor to say okay To say to the client I can't put this format into into a plane But if you allocate using these different formats ARGB then life would be better And so the client can do it. So we need a little protocol to So I'm working in railing. So I'm working a way in protocol to do this Some kind of feedback loop to say okay use RGB and the client says okay. I'm using ARGB now So, yeah, this is the work in progress. It's called the DMA buff hints One of our long-term goal is to have for driver specific plugins So as I said right now, we have only the atomic testimony API To know if the plane combination will work So the idea will be to have Some driver specific Plugins inside LibLift of itself. So you we could have logic to say okay an Intel I know that this won't work. So I won't even try it and Think like this and I know that the bandwidth limitation is This limit so I won't try to go over it So it would allow us to be clear more clever when allocating when doing the plane mapping And the last item, but yeah, it's for the future is exotic configuration. Sometimes you have Planes which are under the composition layer So you need to draw a hole into the composition plane to be able to To show some planes under it. So it's very very Yeah tricky. We'll see if it's worth it And so There are a bunch of references here if you're interested And thank you for attention and yeah, feel free to ask questions Yeah How do you know? So can you repeat the question? Okay, so I repeat the question for people on the stream So the question is Do we so for so this is a question from the compositor point of view so As a compositor, do I put every? Window and put it and create a little bit of player or do I do something else? So yes, and so it's basically you take every client window you create a layer and then If it was if it's if it's put it to a plane leave it to free tell you okay You can put it into a plane and if it doesn't then you need to fall back to composition as you were doing before Yeah Okay, so what's the base so the question is basically how many planes we have generally so my Sunday bridge laptop is pretty old and only has three planes. So that's not a lot I know that you were Intel hardware has a lot more planes. I think like seven planes or something and some arm GPUs basically don't have a limit don't have a Maximum number of planes that you can have as many planes as you want, but you have some bundled with limits so they expose 10 planes and You can try to use them, but at some point it won't work anymore so Yeah, the trend is basically we'll get more and more planes as we keep going. So that's pretty good news Yep. Yeah, so we've talked also about Lea Blyft of IOC TL So basically you set up your layers and send all your layers to the kernel and then As the kernel will this work and do I need to composite anything on it? Yeah, there's been some talk about this I think it's important here just to start with this And then if it works then we can discuss about Maybe putting it into the kernel or maybe having device specific plugins We will discuss with the kernel developers what makes most sense, but for user space They could just keep using this API and it won't change like It's basically an implementation detail so we can change how it works behind the scenes and It won't matter a lot for user space Yeah, the idea here was just to use the current API as it was designed and then Depending on the result of the experiment, will we continue using this API or not? We'll see Yep, so exorg Yeah, I don't think exorg is using hardware planes So it was able to Yeah, but so right now if I have exorg launch session will you use plane? Yeah, yeah, okay, so Yeah, okay, so legacy so that's some legacy and it was able to use planes at some point But then you then you new stuff is not able to All winner devices 74 74 For each pipeline, okay, all right So ice lake eight planes So eight planes per pipe right so pair output, okay So 42 planes total distributed across Okay, okay, okay, okay Yeah, so form like I think for most desktop applications like eight planes is pretty good already Not needed a lot more To discuss Sometimes Yeah So the way you do scaling is basically setting some properties on the planes and Layers are you can set any property on it? So basically you can set the scaling properties and it'll just work. It will just try So on each plane to use it and Maybe it was maybe not if it doesn't work. It will ask you to fall back to composition So it's everything so you can set any property so all plane features should just just work So you can set a property that doesn't exist and then it will fail always and which Mm-hmm Basically the idea is that When you want to do a transform if you can't do the transform in any case you'll fall back to composition, right? Solution that doesn't include So what's the situation of like can you describe a like this top which Yeah Okay, okay, okay Yeah, yeah Yeah, yeah, sure Okay So if you want to if you take out the rotation then your final image won't be the same right So so yeah, yeah, so basically that's what happens so if so what I said client can't be put it into a plane so Because so it will ask you to composite it But there the other client can be put into a plane and that's fine Yeah, so that's more complicated it's planned but We'll see for the future Just doing this will already be big wins. So we'll see Yeah, I think multiple composition layers is like, yeah More questions Yep It was even becomes odd Yeah Because you need to like boot up the engine, yeah, I don't know if it takes time to we need to do more experiments Yeah, I'm not sure need to add the to ask the drive device like the why cannot drivers guys to see if that would be an issue Yeah, you basically don't want to use this quit GPUs for composition anyway because if you unplug it then you don't have anything Yeah There's a lot of complicated issues with the plane the layer to plane mapping algorithm It's pretty annoying too. Yeah Yeah, yeah Yeah, so in any case if if something took Harry is he's going on then it will fall back to composition anyway, so Yeah Yeah, I'm not a fan of like letting the compositor say, okay, the guy is moving the mouse and other things. So I'm gonna do some holistic to Yeah, we need to do some experiments to see if it's worth it or not. Yeah, sure So So partial updates and windows been accredited behind other windows. So partial updates should so For planes you don't really need to to say to the to the GPU whether Only a region are actually there is a property for this. Yes Yeah Actually, you can you can set up a layer and say I only updated this region of the of the layer And then the hardware will be some hardware will be able to make use of this so So the compositor Without we've left off already does damage tracking So it keeps track of which parts of the screen have been updated since last frame and Then only we paint the parts that need to be repainted. So it could just Like set set the property and the layer and it would be fine and About so right now. I don't have any logic to Not show layers that are completely occluded behind other layers, but it's a little difficult sometimes because you If you have transparency for instance, you still want to like do as a layer Even if it's completely occluded behind So the composite so the compositor will have more hints some clients can say my window is completely opaque So it's fine if you don't do anything under it So compositors are more knowledge to do this and should already do this for open GL I'll see if yeah, I don't know if it's worth it to to add this to the blifter. Yeah, maybe in the future. I'll see Depending the buffer format if it's X RGB for instance, we know it won't have an alpha channel. So we can Say it's a pack and Don't do anything under it. Yeah, maybe we'll see Okay