 My name is Keith Packard, my Debian ID is Keith P. I've been hacking on X for ever, yeah, approximately 25 years, something like that, maybe a little longer. I want to talk today about some recent work that I've been doing in X to improve application presentation, make it look prettier. I'm not going to describe what the problems are. I'll go over the solution that I have and we'll talk about what I'm, and of course there will be some demos. There's always demos, right? So the big question is, so the talk is about DRI 3000. So what is DRI 3000? DRI 3000 is a way of hooking direct rendering clients to X, yeah, we want to do that. That's what DRI 2 did, that's what the original DRI did, so you have a direct rendering application drawing, talking directly to the GPU, not through X, and then you want to get the results of that rendering onto the screen. Well that's part of DRI 3000. Another part of DRI 1000 is synchronization. One of the things that application developers do now that our computers are fast enough is they draw faster than the screen and they would desperately like the screen updates to be a whole screen and not two pieces of a screen with a big horizontal line. How many of you have watched movies or played video games and had a big old horizontal line to your screen? Yeah, that's called tearing. We don't like that. I'm going to try to get rid of that. The other thing that DRI 3000 is trying to address is how to be more aware of the fact that X environments are typically composited these days. A compositing environment means where you have all of your applications drawing off screen and you have a super application called the compositing manager that puts together pretty frames for you and adds all of your drop shadows. This is not running compositing right now because the only way I can develop DRI 3000 and test it is to turn it off for now. So I'm running metacity without compositing. So what are the goals of DRI 1000? Well the chief, the primary goal for DRI 3000 and the reason we started this about a year ago in Germany was to fix a bunch of known deficiencies in DRI 2. DRI 2 is current direct rendering interface. It's got a bunch of problems. I have two slides full. I'll show you what those are and we'll talk about how we're going to fix those. The other thing that DRI 3000 wants to do is improve the support for a bunch of presentation-related GL extensions. There are a bunch of GL-related extensions for media and VBike synchronization and making games happier and DRI 2 can't support those. And so we're going to talk about how we're going to fix them. The other thing that I really wanted to do was to make it possible to get VBike synchronization, which is to say getting rid of the tearing, for not just GL applications using DRI, but also regular old 2D applications. There's no particular reason why DRI 2 restricts the tearing fixes to GL applications, but it's all it exposes. You can't get at that functionality through the CoreX protocol, which really sucks because most of my applications are not 3D applications. I run word processors and text editors and all kinds of stuff. And I would really like to get the same pretty screen that I get with my PC board layout tools that I get in my text editor. And of course, one of the key things we always wanted, we're talking about desktop and laptop computing these days, is improving performance mostly so that we save a bunch of power. DRI 2 right now, typically for a GL application with a compositing manager, involves three copies of every frame on its way to the screen. Three is probably more than you want. We'd like to get to one or zero. Okay, so what are the components of DRI 3000? Well, in DRI 2, it was all one big monolithic extension, which is why you couldn't use DRI 2, which is why you couldn't get the VBike synchronization for non-DRI applications, because that was all part of the DRI 2 extension. Well, in DRI 3, we're going to split things up a little bit. The first thing we're going to create is a DRI 3 extension. The DRI 3 extension is the absolute bare minimum necessary to connect a direct rendering application to X. I have a slide showing the entire extension on one slide. The second thing we're going to do is we're going to create a presentation extension called present. If anybody has a better name for this extension, I would love to have a better name, which is not both a verb and a noun and an adverb or whatever this horrible word is. Yeah, my English language understanding is pretty minimal. But present is clearly a terrible name for the extension, but I haven't found a better one. So if you have ideas, please make suggestions. The other pieces are some new kernel APIs, which are optional. And we'll describe what these do. A kernel API for cut through flips. Well, I have two undefined terms in this slide, I apologize for that. Flip is where you have one frame buffer and another frame buffer. And you just want to tell the video hardware, hey, I know you're happy over here scanning out from this frame buffer. Please let me change a single 32 bit register value and scan out from this area instead. You can imagine that's frighteningly efficient in terms of memory bandwidth and performance. Because all you have to do is update one 32 bit register value, and now you're scanning out from a new frame. So you draw a new frame and a new frame buffer, bam. The problem is that most of the current implementation, especially the Intel driver, is when you ask for a flip. The flip happens at the next vertical blank interval, not within the current one. And the problem there is that games are desperately trying to barely 60 frames per second. And sometimes they go over a little bit by half a millisecond or so. That's well within the vertical retraced interval. But if you ask for a flip a half a millisecond too late, it's sorry you get to wait. And that's the other kind of video damage that we don't want called judder, where your refresh rate is stuttering. I don't know if you, anybody know what judder is and see it on the screen all the time? Yeah, not so many, huh? That's where things seem to jerk across the screen. They're complete frames, but things are jerky. And that, in my mind, is just as bad as tearing. And so we want to give applications the option to use, to update which frame they're scanning out from in the middle of the frame, usually within the top couple of half dozen scan lines or so, so that they can get a little bit of tearing at the top of the screen instead of a jerky presentation. It's a choice. We want to implement this. The other thing we want to do for the other kernel API we're going to do is something I'm calling page level switching or exchanging. And I'll describe that going on. Both of those kernel APIs have been implemented and the code's available. The final thing we need for DRI 3000, it's a bunch of new XEV extensions. We need to be able to pass FDs because we're not going to be, I'll talk more about that with DRI. And then we're creating these new special XGE event queues so that the GL library can capture events and not have them presented to application. Because that was actually turned out to be a big problem with DRI too, as we'll see in a minute. I wanted to go over a short history of DRI. One of the important things about DRI, the original DRI, was that it was implemented back a long time ago. I would love to make this work here for me. Can I just do that? No, I can't. There we go. OK, I couldn't read it on the screen here. I apologize. DRI 1 was the original direct rendering infrastructure. That was back when our video cards had enough memory to show you what was being scanned out now and to have one more buffer for the next frame. And that was it. So each application had to share this one big back buffer. So DRI 1 was a horrible clue that shared this single back buffer among all applications. And it was awesome at the time that it worked at all. It was the dancing bear of direct rendering infrastructure, which is to say it was amazing that the bear danced at all. So the awesome part about DRI 1 is that when you ran a DRI 1 application, and you put a breakpoint in your application with GDB while you were debugging under the X server, ha, you hit a breakpoint. And your DRI application has locked to the X server because it's rendering right now. I'm sorry. You can't type in GDB. It was kind of exciting. Another great part about DRI 1 is that it only allowed for temporary allocations and graphics memory. So once you would lock the GPU, do a bunch of rendering, including uploading steers, blah, blah, blah, putting stuff in temporary memory. And then you'd unlock the GPU. And all of your memory would go poof. It'd all be gone. So you'd have to upload your textures. The next frame, yay. Yeah, there were performance issues with that. DRI 2 fixed a ton of stuff. This is certainly no, the fact that we need to do it this again is no slight at DRI 2. It made a huge amount of stuff possible. It got us from GL 1.4 to GL 2.0, a huge advance. It gave us per window instead of a global back buffer, stencil buffers, depth buffers. The problem with the DRI 2 was that we decided that we wanted to remain compatible with GLX. And GLX has this horrible semantic that if multiple applications do GL to the same window, they share all of their off-screen buffers. So the back buffer is shared, and the depth buffer is shared, and the stencil buffer is shared. But the GLX protocol doesn't define any semantics for how you're going to synchronize between these buffers. And we did this in DRI 2 by having the X server allocate all these awesome buffers. The problem with that is the X server didn't care about these buffers, stencil buffer. The X server is not using this. So we invented these horrible naming scheme for these objects because we didn't want to create some weird X naming scheme for them when X couldn't use them. And the other thing in DRI 2 was that objects were passed between the server and the client with these global unique integers called gem handles. And these were actually global to the entire system. So anybody on the computer who could open up your graphics device could find these fine little handles, connect to these objects, and suck your pictures off. It was awesome. Kind of lacking a little security. Oh, there was also lifetime issues because these were just handles. These weren't references. So if you took this number and gave it to your neighbor, the only way that your neighbor would actually make sure that that object existed was to try to open it. If that object had been freed and another one allocated in the meantime, which would happen a lot, your neighbor would get a different object, maybe somebody else's object. It was kind of exciting. Okay, so in DRI 3, we're giving up on our GLX compatibility. We had actually given up on an incrementally during the lifetime of DRI 2 as graphic cards changed. It turned out, oh, I'm sorry, we really can't do shared stencil buffers and shared depth buffers because we have things like high Z. Well, on high Z, you have to do depth resolves. If you've rendered to your depth buffer and you wanna suck the bits out, you have to do a depth resolve. I'm sorry, we have no way of communicating across multiple applications whether the depth buffer has been resolved or not. So we gave up and the Intel driver had been allocated private buffers for everything except the back buffer for a long time. So we didn't have the reality of GLX anyhow. So with DRI 3, we said, fine, we give up. No more shared buffers for you. Everything's allocated and managed by the client now so that the server's not involved. It means that the client owns their own destiny and they get to tell when buffers get reused, which is kind of nice. And now instead of passing through these global handles, we use the DMA buff. The fact that DMA buff can wrap a gem object in a file descriptor and then we pass the file descriptor through the local socket and the other side picks up the file descriptor and opens the object. It's awesome because that file descriptor is actually a reference to a real object and when it pops out at the other side, you know you have actually the same object. You can also nicely close the object and the socket carries the reference along and so you don't have any lifetime issues. It's kind of... Okay, so what are we trying to fix with DRI 2? One of the big problems we had with DRI 2 is what happens when the window is resized? Well, when the window is resized, these objects allocated by the X server also get resized, of course, right? Your back buffer has to be resized when your window is resized. Well, how does the application find out that this has happened? Well, it's a lovely event and that event occurs, happens, arrived, excuse me, the event is delivered immediately but it arrives sometime afterwards and the application is done, some rendering after the window was resized but before it received the resize event. So there's some subset of the rendering application is done that's gone off to a buffer that will never be used and the application has no way of knowing what that rendering was. The other thing, of course, I talked about already was that the stencil buffers and depth buffers were a disaster for X and so we just didn't want to have them at all. The other exciting thing was until we decided to stop sharing stencil and depth buffers, we actually had to update the X server every time our GL driver needed a new buffer type which was awesome because you have to deploy a new 2D driver, a new DRI2 extension, a new X server, a new GL library and a new GL driver just to get some new GL feature, probably not optimal. More issues with DRI2. There was no integration with the composite extension at all when your application asked to have stuff swapped to the screen, if your window was redirected and that window was off screen, then the swap would happen sometime. The compositing manager would get told about that, oh, sometime and the compositing manager would put it on the real screen sometime later and the application would have no idea when the actual presentation occurred on the screen and there was no synchronization between the compositing manager and application updates. It was a disaster and a lot of extra copies. We couldn't create an XCB binding for DRI2 because of the way the, because of those, remember those events for resizing the buffer? Well, XCB had no method for handing those events from the bowels of the XCB library to the bowels of the MACE library without having it pass through the application which really couldn't understand them at all. That's kind of a technical thing, but it kind of, it turned out to be a problem because a lot of people wanted to switch to XCB, but you couldn't with MACE, so our MACE implementation was stuck. Of course I talked about the fact that I wanted to add synchronization, video blank synchronized 2D application support, GL apps are the only things that got that with DRI2. The other thing is we were missing a bunch of important extensions for buffer management and synchronization that DRI2 couldn't support. I don't think I'm gonna describe those here. Okay, so the DRI3000 architecture, as I said, consists of two extensions, the DRI3000 extension. It does three things. It provides access to the DRM device. Right now the only process in the system that can open the DRM device and prepare it for rendering is X server. We're looking at fixing that in the kernel with the implementation of render nodes, but right now the X server is the only person who knows which DRI device he went to open and how to get it ready for rendering. So the DRI3000 extension includes that. Then there's also the sharing of these DRM pixel buffers, these video driver allocated pixel buffers between your direct rendering client and the server. And the other thing that I added, and I'm not sure I need this, but it was fun to implement and maybe it will be useful. I'm certainly using it right now. I don't know if it's the right semantics, but it was to be able to share a footex between the X server and the application so that when the X server was done using a buffer, it could signal this footex semaphore and the client can wait for that. It turns out that I'm not sure I wanna use that mechanism because the footexes are, you get to wait for a single footex. You can't wait for any of n footexes to fire. So if I have three buffers that I might use any of, I really don't care which one is idle. I just need one of them to be idle. So right now the library is assuming that the oldest one will become idle first. And so it says, if there aren't any idle buffers, let me wait on the oldest one. It doesn't always work. So I don't know what I'm gonna do about that yet. The present extension is the other half. It's gonna copy a chunk of memory from a PIX map to the screen or maybe take a PIX map and make it the new scan out buffer if it's the right size. It'll synchronize that activity to DBlank and then it delivers events when the presentation occurs. Here's the entire DRI3 extension. It's, yeah, it's probably a little bit too small if I don't apologize for that. There's four requests. You could open the DRM device and the app opens the DRM device, prepares it for rendering and passes a file descriptor back to the client for that device. This is in contrast to the DRI2 method where the application opened the DRM device, passed a magic cookie to the X server, the X server blessed that client to be able to do stuff with that, then a magic cookie was passed back and then the client would pass a magic cookie into the kernel and be able to use this file descriptor. DRI2 was, that was before we had file descriptor passing, it was the only way to do it. PIX map from buffer, that's where I have a gem buffer or any other kind of kernel buffer that I can with a DMA buff handle and I pass the file descriptor over the X connection and a PIX map is created in the X server and now I can use the PIX map however I like. And then the converse operation where I have a PIX map and I want a buffer handle, the file descriptor comes back and I can talk to it through the kernel. And then the other thing I have is this ability to create a fence object or a footex from a file descriptor. I allocate a page in shared memory file system with a file descriptor, I pass that file descriptor to the server and I have a shared page and I use a footex in the first to that page. I don't think that's what I want, but that was what I implemented. The present extension has a couple of more, actually it has fewer requests, it only has three requests. The top three of these things are requests. Present region which takes an area of a PIX map and puts it into a window. The present notify media sync counter, this is required for a GL extension that asks what the current time is. Present select input, present has a bunch of events, you need to be able to select for which events you want and then it has a couple of events. Present complete configure notify and present complete notify. So the configure event happens anytime your window is resized. So you get this configure event, you got an old buffer. Well, now what you can do with the old buffers when you get this configure event you can allocate a new buffer and you can copy the events from the old buffer to the new buffer. So the client is responsible for managing his own buffer. He can do it synchronized to his own GL operations and he can do it in appropriate time. That way you don't lose any rendering. Now of course the application is almost certainly looking for regular X configured notify events to find out when his window's been resized. The requirement here is that the present extension learn about the configure before the application. So that if the application does drawing at the new size the GL code already knows about the new size and can allocate a new buffer. The other thing with the other event that we have is a present complete notify which is set when the present operation has completed which is to say it's set when the first pixel of the screen on which you are presenting is being scanned out by the monitor. Nominally, of course it's all approximate these days. Present region operation takes a bunch of parameters sufficient to put stuff up onto the screen. The client can see little serial number so the client can tell that occurs in the complete notify so the client can tell which complete notify corresponds to which present region operation. The other thing is it has a couple of regions. It has a valid region that says this area of my PIX map yeah you could put any of that on the screen that'd be fine with me. And it also has an update area that says this part of my PIX map you have to get that on the screen at least that much on the screen. So the valid area and the update area in between that the server is free to put as much as it wants or as little as it wants. So if the application just has a tiny little PIX map it only paints a tiny little area. You can set the valid region and the date area to be the same and then the X server will only copy that little region. And yet if you're doing real double buffering and you have your full contents of your window available you can set the valid region to the full window but if you only draw a single character you set your update to the single character. And then the X server is free to do a flip operation to that new buffer if it wants or it can blit just the tiny little area and that's how I implemented this. There's an X and Y offset within the window if you want to take a PIX map and just put a little piece of it on the screen. Then there's an idle fence so it's one of those sync fences I talked about when the PIX map is no longer in use the idle fence gets signaled to go. Then there's this three parameter disaster for the GL extension that lets you tell what frame to present stuff at. And then I'm going to have a set of boolean operations that say whether you want to have the tearing option or whether you want to force it to copy in case the application isn't ready to deal with flips. It'll support a V blank synchronized sub window updates which is kind of cool. You can paint a single character and say present just this character and that character will get blitted at V blank time so you won't get any tearing. Of course I talked about flips for small updates. Yeah. Here's an optimization that present offers. Multiple operations can be queued for the same frame. So if you have a bunch of updates to a single PIX map or updates to a disparate set of PIX maps you can queue all of those for the same frame. If they come from the same PIX map and they have the same valid region you can just throw the old one away and you need the two updates together. If they have different PIX maps then you can see if the new one completely replaces the old one in which case you can throw the old one away. So here I can queue multiple operations and I automatically get an optimization in the server where I can throw away old operations and just use the new one. Of course I don't throw it away entirely. I save enough of it around so that I can deliver both complete notify events so the application can't tell that I did any optimizations. This optimization is good for a significant performance boost in GLX gears. It takes my GLX gears in a tiny little window from 3,400 FPS to 5,400 FPS and we all know that GLX gears is. Thank you. It does show that it's effective though so that's kind of nice. Okay so presenting composite. The composite extend pits an off-screen PIX map for your window contents. That's how you get all the whizzy effects. Now you get enough application that takes all these off-screen stuff and does fun stuff to it like putting fire when the window goes away. Thank you, comp is. With the old, with the DRI2 or CoreX all of the window contents end up in this window off-screen window PIX map and the compositing manager deals only with that PIX map. That means when you're doing double buffering the application has to copy his data from the off-screen image in this PIX map and then the compositing manager has to take it from there and put it on the screen. That's a lot of copies because the compositing manager typically takes it from there and puts it in the back buffer and then puts it in the front buffer. So oftentimes you get as many as three copies. The other problem with that is that the copy happens at some random time and the application has no idea when the presentation has actually occurred. So what we wanna do is we wanna associate that present region operation with the correct screen count when it actually occurs on the screen. And the way that we do that, oh the other thing we wanna do is we wanna reduce copies. So what I'm gonna do here is I'm going to redirect the present operation to the compositor. So the present operation, the present region operation comes into the server and the server says, oh that window's been redirected. Let me go tell the guy who's managing that window's presentation that that present occurred. So it hands all the data from the present operation off to the compositing manager and now the compositing manager can do whatever at once with that data. It doesn't have to put it in the window pics map, it can put it right on the screen. That takes one copy. The other thing is it does, I extend the present region operation with a window and sequence list so that now when that present operation is done by the compositing manager to put the contents on the screen, it tax on the original window ID and sequence number so that when that present completes it sends a complete notify to the compositing manager and it sends a complete notify off to the application. So both of them learn that the contents are on the screen. So the application learns when his image gets not to his window buffer, which he doesn't care about, but when he gets to the screen. So now the application can be synchronized to the screen and the application can get frame rate information. For compositing full screen applications, we desperately don't want to do any copies. So in this case, we're gonna redirect that pics map off to the compositing manager. The compositing manager is gonna say, oh, this covers the whole screen. Let me present that to the screen and the X server is gonna say, oh, it covers the whole screen. Let me do a page flip for that. It's gonna flip, it's gonna do a page flip and you're not gonna do any copies at all. So for full screen applications, you're gonna take another round trip, another couple of context switches to the compositing manager, but you will not copy the data. And of course this happens per frame. Context switches are bad, but not that bad. And so I'm hoping that at least it will be much better than the current situation. I have a couple of redirection plans and this is still what I'm working out. The redirection stuff is not done as you'll see. The easy thing to do is to just take the composite, the present region request and send it to the compositor and say, that's all I'm gonna do. The compositor is then responsible for keeping the window pics map contents up to date. So for instance, if it wants to do a zoom of your screen so you can show applications and little tiny versions on the screen, it needs to make sure that the window contents and the window border are in the same object. So it's gonna have to take the present region operation that it had occurred before, take that pics map and copy into the window pics map and then use the window pics map to paint on the screens. The compositor would have a bunch of extra work. That might not be ideal. The other option is to get the applications to draw into a larger buffer. It met, so the normal geol applications is gonna allocate a buffer that's just the size of its window. Well, if you had the application allocate a buffer that was a little bit bigger, you could paint the window manager frame around that. And the way that you'd paint around that is you'd have the server actually track damage to the real window pics map and copy the border, the window manager contents, into whatever pics map was being presented to you. So you'd always have that window border in that pics map. This is pretty easy to do with the server, a little bit of damage tracking, a little blitting. The hard part here is actually getting GL to deal with drawing to larger buffer by having even started this work. And I'm gonna do the other stuff first just to see what the performance is like. So the tricky part here, remember in DRI1 we talked about the fact that all the applications shared a single giant off-screen buffer? Well, applications knew how to draw at arbitrary offsets in that buffer. So the Mesa used to have this fine ability to draw within a larger buffer, a smaller window. They deleted all that code. Thanks, guys. I'm gonna have to go recreate it all if I wanna do this. And I think I probably do. The other thing I wanna be able to do with this particular optimization is reduce for window swap. So if you have an application that's in a little window and it does a swap, well, right now the usual thing you're gonna do is you're gonna take the contents of that window and copy them to the scan out buffer, right? Cause you have just a sub part of the scan out buffer that's being updated. Well, what if that window was drawn into a buffer that was big enough and aligned correctly so that the pages that make up that window were exactly aligned with the pages that make up the scan out buffer that covered the pixels where it wants to go. And then I have this fun kernel API that just takes these two rectangular regions of pages and swaps them. So I'm gonna update the page table entries of the scan out buffer so that it gets the pages that the application just provided. Now the trick there, of course, is I have to fill in a bunch of data around the outside of that window where the window was padded out to the appropriate page alignment. But the benefit is a tremendous improvement in performance. And again, what I have to have here is I have to have that application able to paint within a sub buffer, within a sub part of its buffer offset from the upper left-hand corner. I presented at LCA last January some performance data showing a hack that I did and that hack was showing that I was getting about two thirds or so of the benefits of doing a regular full screen flip by doing this hack. So it's dramatically less, dramatically faster than doing the pixel copy, which, but slightly slower than doing a full screen swap. It was a little surprising because oftentimes when you end up doing page PTE whacking, the PTE whacking is expensive enough that you don't get any benefit. But I'm not actually whacking page tables. I'm whacking GTT page table entries and there's no cache flushing necessary there because it's a scan out buffer, which is uncached. So the cost of doing the page table updates is cheap, which was kind of surprising. Okay, I wanted to talk about status. The current status, the DRI3 extension is complete and working. This is all being done with the DRI3 extension. I'll show you some demos. The kernel patches for the rectangular page swapping have been posted. The kernel patches for the cut through flipping have been posted and people have looked at them a bunch. But Eric and I were discussing today and I'm gonna throw that out and do it again. Yeah, it was wrong. The present extension is all working. So my GL stuff all runs with that today. It doesn't do any redirection. So obviously the first thing I'm gonna do coming up is do the redirection stuff and do the obvious simple stuff and try to get that to work and see what it looks like. And the next thing I wanna do after that is figure out what to do with video. Because the video wants exactly the same stuff that RGB data wants. It's just in YUV format. So I need to probably extend the present extension to take YUV buffers and put them on the screen. And then of course the X server, if it does a YUV buffer, it's free to use overlays. So coming up with APIs that describe how the overlays work and make that all happen, that's all future work. I wanted to do some demos now because demos are always fun. Who wants demos? Everybody wants demos. Okay. So the first thing I wanted to run here was a really cool demo, which is just, I think the other one is right over here. Right, come on. I wanna be able to drag this. There we go. Here is an application which is just drawing, it's an X application drawing a rectangle sideways on the screen. And you notice that it is not tearing, it's amazing. Well yeah, if you move, of course if you move the window, now the window matters getting involved and the window matters bleeding it. And so you get tearing. That's what the tearing looks like when the window matters moving it. Stop moving the window, then it stops tearing. So that's a simple one. That's just doing blitz of course. And it's a regular X application. The other application I wanted to run was our favorite application. Oops, I wanted to run this application. This application, oh I wanted to run it in the tearing mode too. Let me run it in the tearing mode first so you can see what the difference is. I actually added an option to this application to leave it in the old mode. And you can see what it does here. Anybody like that mode? Ooh. Yeah, lots of tearing. It is 1993, isn't it? Yeah, and now the new application that doesn't tear anymore. Oh by the way, this also has, I posted recently a hack about how to use X input two to track the mouse without polling. Yeah, this uses that now of course. This is like most awesome XIs ever now. So the hardest part about getting this to work was figuring out how to get X input two events delivered through the XT library. Oh my god. That was disgusting. Yeah. Okay, that was fun. Okay, and of course you have to show the GL application that it actually works. The GL application. Is there, are there any others? As you can see it's not tearing, it looks all pretty, it's lovely. Yeah. And the final thing I wanted to, what else did I want to try? I just, this afternoon I was hacking up metacity. I can show you metacity. What the heck? Let me see if I've got, what have I got here? What? What could possibly be? Yeah, so this is code that I started writing at four this afternoon. So it's a little fresh. Okay, here we go. So now I have, let me just get my X term window over and I have metacity. Oh, I need to get the compositing turned on. Do, do, do, do, do, do. Deconf editor. And I'll turn on the compositing manager and my suspicion is my screen's gonna go dark or at least just the PIX map in the back and I'll have to restart my X window, my server. Ha ha, indeed. Yeah, not so much. Yeah, okay, well, as you can see, it's still a work in progress. Let me, I'll have to kill metacity now. Goes working in one monitor, doesn't work in two monitors. Okay, well, yeah, not so much. What I could show you, what I can show you right now is what metacity looks like. This is a simple X-based compositing manager. So I wanna show you what GLX Gears looks like with this, with compositing turned on in this environment. Here we have GLX Gears running with a compositing manager. Oh, come on. Yeah, well, that's about how a compositing and DRI, compositing works before we have a DRI three. So, broken demos, it's the multi-monitor situation, of course, multi-monitors always break. Okay, we have about 10 minutes, seven or eight minutes of questions and then it'll be time for dinner. Slices. Can't exist? The 128 by eight pixel slices, do what? How, where they come from, where the size comes from. Oh, where that size comes from. Why it's not complete lines. Yeah, okay, so to make graphics go fast, we don't draw linearly in memory anymore. We actually tile the screen so that when you draw vertically through a section of the screen, you hit the same page so you don't just completely destroy your PTE cache. Gannout Buffer is literally arranged in these tiles. I see. Yeah. I mean, otherwise you'd be replacing whole scan lines and that would work too, just less efficient. You've talked about compositing. So, I assume also you've designed especially the present extension also with X-Wailand in mind. Have you done any proof of concept work for that? No, I'm not working on Wailand at the present time. I think it'll probably help X-Wailand a bit but that's not my core interest right now. Okay. Any more questions this evening? Or is everybody, oh, Joey has a question. So, I mean, speaking of Wailand or Unity or something like this, how does this tie into that ecosystem or whatever you want to call it? Because, mirror doesn't count because it's not an actual Windows system yet. But in the Wailand world, in the Wailand world, the present extension obviously the Wailand acts as an X-compositing manager. So, the fact that the compositing operations are redirected to that compositing manager would presumably make them more efficient and a Wailand-based compositing manager using the present redirection stuff could avoid a bunch of the copies that would otherwise have to do. So, X applications running under a Wailand environment with a Wailand compositing manager would presumably be as efficient as they are X with an X-compositing manager. I don't, that wasn't really my question. My question was really, are these things necessary? Oh, absolutely. I mean, if you're gonna run an X application, you want it to run as efficiently as possible in a compositing environment. These things being Wailand. Oh, Wailand. At all. I don't know, right? So, Wailand offers some additional changes in the ecosystem. It promised a simpler protocol on a smaller code base. It promises more security and it promises the ability to do new things with user interfaces. And so the question is, is it more efficient to do that transition or to continue using X and try to fix those problems? And I think that's an open question. Certainly, this is an attempt to try to fix some of the big performance and usability gaps in X on the output side. We still have work to do with security in X in order to make applications not able to snoop each other's input. I don't think snooping output is all that interesting. Applications that put stuff on the screen, including passwords, are probably just broken. But in terms of having input snooping, that's probably something we want to work on fixing. And I don't think that's very difficult. In terms of running X applications, I don't see any of us not running X applications for a long time. And so certainly any work that we do in X to make those applications more efficient is going to be good for now and the future, independent of what underlying compositing energy you use. Because we're going to be running most of our core user interface through X for the foreseeable future. So I'm trying to make it better. Other questions or comments? I think it's about time for dinner. We're going to have about three more minutes. Thank you very much for coming and enjoy the rest of DevConf.