 Okay, we're going to talk about X today, and Compositing Managers, and some other technical details about the new DRI3000 work I'm working on, and current status, and future plans. Of course. Brief overview of compositing, what is desktop compositing? It's what all modern Windows systems do, of course. The applications paint stuff into an off-screen buffer, and then you have an overarching compositing manager, or a Windows system, or something, that takes all these off-screen application images, your spreadsheet, and your presentation, and your 47 different clocks, and merges them all together into a pretty view on the screen, with translucency between the applications, and little drop shadows, and all kinds of other stuff. Most of the work of a compositing manager is taking the application updates, and the application changes the contents of its window on the screen, getting those updates onto the screen. The compositing manager spends a huge amount of its time just copying data. In the year 2013, when you copy data, you're probing a lot of transistors, and driving a lot of capacitance outside of your CPU, and that is really expensive. We want to avoid that. Why do we do desktop compositing? Well, it's the usual reason, right? We have a problem here. We want to put pretty animations, and we want to do pretty effects on the screen, and we want the screen to look nice, so we just insert this additional level of indirection between the application and the screen, so that we can have our nice effects. Well, of course, the problem with another level of indirection is that it's another level of indirection, and it's going to have some expense. Why is it expensive? We have extra data. Every single window, this window here, has a copy on the screen, and there's a copy off the screen, and the compositing manager is managing the moving of the data from that off-screen copy onto the screen, so I have extra memory involved, and then there's all this additional data motion. The application paints stuff off the screen, the compositing manager copies it onto the screen. A lot of additional transistors moving around. So I'm working on solving this problem, and in conjunction with solving some of the other persistent problems in X with director rendering, implementing a new system that I call kind of colloquially DRI-3000. DRI-3000 actually consists of two separate pieces, the DRI-3 extension, which hooks director rendering applications like your OpenGL or your media applications that render directly with the hardware, hooks those up to X, that's the DRI-3 extension, and then there's a brand new extension called present, which takes application contents and puts them on the screen. And the reason that we need a separate extension for that is that you need to be able to not just copy the data blindly, but you want to copy the data in a way that is mindful of how the screen data are presented. In particular, the screen is scanned from top to bottom, so when you copy data onto the screen, you want to make sure that your copy doesn't happen in the middle of this beam scanning from top to bottom, otherwise you end up with half of one image and half of the old image and half of the new image in a lovely tear. So we like to avoid that. The other thing we like to do is to be very efficient, what you want to do is instead of actually copying the data, you can just say, well, I have this old frame buffer and I have this new frame buffer and I'm just going to tell my video hardware, stop using that memory and go ahead and use this memory now. So you literally change one register in the video hardware and you get an instantaneous transition from one frame buffer to the next. So it doesn't copy the data at all, the zero copy plan, that's always best. And so putting together a new extension that can do the copies when necessary, synchronize the vertical blank and when the copy isn't necessary, when you can actually do a full screen swap to do that. Why do we need a new extension for that? Why can't we just repurpose some of the existing extensions? Well, in the 2D LAN, there isn't an extension for this. If you run 2D applications like any GTK or QT applications that paint stuff on the screen, they're double buffered. They're nicely double buffered. Your updates are atomic, but there's no synchronization between that application update and the vertical blank signal. So if you have an animated 2D application, then you have animated tearing on your screen. You get this asynchronization between the vertical retrace and the application contents, which is ugly. Why can't I just use the existing mechanisms, the DRI2 mechanisms and kind of hack them up a little bit to support X? Well, the main thing is that DRI2 is totally ignorant of our compositing manager. And again, the compositing manager is a piece that mediates between the application and the screen. So you have this lovely direct rendering infrastructure that knows how to do all this fine vertical blank synchronized updates to the screen. And then you have this compositing manager, which may or may not be using that if it's a simple compositing manager like Metacity, then it doesn't use it at all. And so you have all this mechanism for your lovely 3D applications that doesn't do anything because your compositing manager doesn't use it. So we want to be able to take advantage of these mechanisms all at the same time. So the goals of the DRI3000 work are to resolve some longstanding DRI2 related issues that I'm not going to talk about today. I gave a presentation about that at DEBCON for about a month ago. It has a lot of details about the DRI3 extension, the present extension and their technologies. And that presentation is available online in video form from DEBCON. You can go listen to it or read it there. The main thing, of course, was to improve the support for applications that want to do stuff with synchronization about put on the screen. There's a bunch of GL extensions related to this. There's GL extensions that will give you frame-precise timings of when your stuff was put on the screen. And you can ask for your stuff to be put on at a particular frame time. It even handles interlacing and all kinds of craziness. Well, DRI2 didn't support any of this. We have, I think there are three or four tests in the OpenGL test we write now, which all fail on this extension. So we have existing functionality that applications would like to use. We have an existing implementation in DRI2, which doesn't work at all. So we need to fix that. Looked at trying to fix it in DRI2, it's like, yeah, not so much. And so we needed to go and do a bunch of work on DRI2 or do something separate. The other real big goal was to provide vertical blank synchronization for non-GL applications. I know most of you spend all day long playing 3D games, right? Who plays all day long playing games on their Linux computer? Yeah, one person. Yeah, the rest of us spend some time, at least some small portion of our day doing actual work and creating content, creating software, creating documents, using applications which are largely 2D applications. As I said, GTK and QT have all this nice double buffering support. They do all the stuff necessary to get synchronous atomic updates onto the screen. They have no way of getting that synchronized to the vertical blank interval. They have no way of making it look good. So when you run 2D applications on X, they look terrible. Because if they animate at all, they always get tearing, tearing or other nasty artifacts on the screen. The other real big goal of the DRI3000 work is the principal thing I'm talking about here today, is this goal of reducing copies when doing compositing. Right now, we have a copy from the application into the window buffer and then a copy from window buffer to the back. It's a lot of copies. So as I said, DRI3000 has a bunch of different components. It's got the new DRI3000 extension, which is the absolute smallest extension. It's four requests, I think, for different APIs, for applications that want to do direct rendering in the X environment. It's the absolute minimum necessary. The only thing that it does is let you share an image of a PIX map in this case between the application, the direct rendered application, and the X server. That's the extent of the extension. Yeah, okay, the extent of the extension. It's as small as I could possibly make it. That extension is apparently complete and works fine. The second extension is called to present extension. I've been looking for a better name and we had a little, I posted something on the Xorg developers list about asking for a better name. And I had a bunch of people come up with a lot of substitute names. I still haven't found anything better. So if anybody knows something better than the present extension or the present extension or whatever, that would be great. So we need something that's most of the X extensions that I've done in the past 20 years or so have been verbs. We need something that's not also easily interpreted as a noun. That would be awesome. Or in the present tense case, kind of an adjective. So we want something that's nominally seen as a verb in most contexts. It would be great. In any case, we have the present extension. Then we have a couple of kernel changes. One of the kernel changes is for a kernel level API to, remember I told you you could just switch that one register value when you're scanning out for one frame buffer and now you want to scan out for the next one. Well, right now the kernel API says, yeah, I'll flip it at the next frame. Which is all fine and dandy if you know well ahead of time when you want to switch the frame buffers. But many 3D applications, especially our high performance 3D games, want to barely hit 60 frames a second. They want to do as much drawing as possible to make the scene look as detailed as possible in the 16 milliseconds they have allotted to them. Which means that they pack in as much drawing as possible until they get to just about the frame time and they say, okay, we're done now. Let me flip. Most of the time they make their 16 millisecond budget. And occasionally they miss their 16 millisecond budget. And with this current system, if you flip only at the vertical retrace interval, that means if you miss, you miss an entire frame. So you get frame, frame, frame, frame, pause, frame, frame, frame, frame. That pause is called judder and it's really annoying. It's another kind of visual noise on the screen. And instead of that visual noise, what the game developers really want us to do is say, if I've just barely missed, then tear a little bit up near the top of the screen where the ceiling is dark and nobody cares anyhow. Just get it on, get most of the frame into this frame time. So I don't have to wait for another frame. Get rid of the judder, give me a little bit of tearing instead. So I have a kernel API for changing how the kernel does the flipping so that it does these what I call cut through flips. Where you just get the flip command and you just smash it into the hardware and it starts scanning out immediately. Turns out the Intel hardware which I use most of the time like on this laptop does that just fine. You just have to tell the hardware ahead of time, oh, by the way, I'm going to be smashing this register and I want you to update the screen as soon as you can. And the hardware actually has a little promise that there it says, yeah, probably within a couple of scan lines we'll probably switch because it's got caching and buffering and all kinds of stuff. So instead of getting a whole frame delay to your flip, you're going to get a couple of scan line delays. A huge, huge benefit for our fine game developers. The other API that I put together is for another copying optimization that I call page level switching where you can actually take two rectangular PIX maps and take a subset from one and a subset from the other and swap them like this. Makes for a very efficient, very efficient subscreen switching. And there's a huge raft of changes to the X server and XCB to support a bunch of the DRI-3 stuff for FD passing and event queues for present and all kinds of stuff. XCB turns out to be a very primitive library to talk X over the protocol wire and it's so primitive that it's missing some important functionality for new extensions and so I have to go mash that as well. I have a short history of DRI. The direct rendering infrastructure was started back in 1998, I think, with DRI-1. This is back probably before most of you were wearing long pants. At least Daniel still isn't wearing long pants. We had a single shared back buffer on the screen, that was because that's what the video hardware did. It was kind of a disaster. You could only talk to the graphics card and expect it to retain stuff in its memory for as long as you held the display lock. So you'd actually grab a physical lock for the screen, render your entire scene and then release the lock. It made for some kind of bad multi application performance. DRI-2 fixed a bunch of those problems, made it a whole bunch better. All applications drew off screen so it didn't have to have this global lock and we had memory buffers allocated by the kernel so that you could actually have persistent graphics objects. It was awesome. DRI-3, the new extension, changes things just a little bit instead of having the X server allocate all of our direct rendered objects. Now the applications allocate them on their own and they get to manage them on their own, which is a heck of a lot easier. And instead of passing file descriptors by magic numbers, we pass our graphics objects by file descriptors. And that solves a bunch of lifetime issues and a bunch of ownership issues and a bunch of other problems that we've had with DRI-2. So that's why we're rewriting DRI-2 is to solve some of these problems. The DRI-3 architecture consists of two extensions, as I said, DRI-3, which provides access to the direct rendering device and does all the direct rendering stuff, and the present extension, whose sole purpose is to take a bag of pixels from the application and put them in the application's window. That's its entire job. Not just applications want the present extension, though. The present extension is designed to be used by applications to get their bits into their windows and then it's designed to be used by the compositing manager to take this collection of windows and put them onto the screen. So we have this two-faced notion of present. We're on one half, it's an application API for getting stuff into the windows, and on the other half, it's a compositing manager API for getting window contents onto the screen. And now we have this advantage of being able to use this one extension to do both jobs. And that way we can actually talk about improving the performance and improving the interoperability and making it designed for this instead of working by chance, which the current system does. OpenGL applications are effectively always double-buffered today. There's no way an OpenGL application can draw directly to its window. The OpenGL application draws its contents off-screen and paints them onto the screen. Even if the OpenGL application thinks it's drawing its contents to the screen, it's not drawing its contents to the screen. They're drawing to a fake front buffer, which is actually the same as a back buffer. And so even your single-buffered OpenGL application is actually double-buffered, because we don't know how to do single-buffered GL applications. 2D applications using GTK Plus or QT are always double-buffered if they use the usual drawing APIs in those libraries. The compositing manager wants to be double-buffered, but for a completely different reason. Your GL application wants to be double-buffered because 90% of the frame time, the thing that it's drawing isn't done yet. And so if it was drawing it incrementally to the screen, it would look terrible. You would see little pieces of your UI slowly appearing on the screen. With 2D applications, the 2D operations are usually so short, it wouldn't matter if they were single-buffered or double-buffered. You could clear the area and repaint the entire window in less than a frame time. So you don't even typically notice it. Applications like GNOME terminal are often not double-buffered. X-Term isn't double-buffered. And nobody seems to notice very often because the amount of time that it's changing the screen is dwarfed by the amount of time the screen is static. And so you see a little flickering occasionally. You see some annoying visual artifacts on the screen, but for the most part, 2D applications would be fine single-buffered except that people don't like that little bit of visual noise and so we like to get rid of it. The other reason we want to double-buffer these applications, again, is to get rid of the tearing between frames. We want our display to be seamless and smooth all the time. Compositing managers, that's the only reason they double-buffer. The copying is always gonna be plenty fast. I could draw the frame, I could draw the entire scene in a tiny fraction of the time that it takes to re-display it, but we want to get rid of the tearing, we want to make it look pretty. So when I talk about optimizing the system for reducing... I'll give you this one. It must be five o'clock because the batteries are all dead. There's a really simple optimization that Present Extension does. If an application cues multiple present operations for the same frame time, then it throws away the earlier ones and only displays the later one. A really simple optimization gets me a huge benefit in applications which are dominated by the update time, like our favorite benchmark, the one that the GL application that shall not be named. That particular application that happens to draw three different objects in primary colors on the screen that look like gears, went from 5,000 frames a second to 7,000 frames a second just by this simple expediency of throwing away the frames that weren't actually ever gonna be seen by the user. That was a pretty good optimization. More than that, because of the way Present works, you're giving it PIXMAP IDs, the application and the Windows system know about objects that are persistent. So if I update multiple regions from the same PIXMAP in the same frame time, I can merge those copies into the same frame update so I get a single set of copy operations per frame. If you're using different PIXMAPs, you can throw away the old operations, of course. Now, in order to make the applications not see that you're doing this and not really be aware, I just, I leave the operations queued and I tell the application, oh yeah, that operation, yeah, it completed. Yeah, I didn't have to do a lot for it, but it's done now. The NOOP that I executed for it has been completed. So that's been a fairly successful, very minor optimization that gets performance with the DRI3 to be significantly faster, 10 or 20% for common 3D applications than DRI2. Pretty simple optimization. Now, what about present and composite? The composite extension, of course, we've talked about, has all these off-screen window contents. Now, naively, when an application did a present operation, the contents that it's updating in its window would appear in this off-screen window buffer. That makes sense, right? If you want it to appear on the screen, what the application does is put the contents in the off-screen window buffer and then have the compositing manager take the off-screen window buffer and put it on the screen. Fairly straightforward. Except that now you've got this extra copy. You've got the application presenting a new frame, ending up in its window buffer, and then having that window buffer copied onto the screen. So there's two copies there. It'd be much simpler to just take the application's new frame and hand it to the compositing manager and say, oh, this application would like this to appear on the screen at about this time. So that is the plan. It's not currently implemented, but the plan is to just tell the compositing manager, hey, this application has this new bits to put on the screen. You might want to do something about that and completely bypass the off-screen window PIX map. And that way that off-screen PIX map would never see the contents of the application. The only place that those contents would appear would be on the screen. And the compositing manager would take a hold of that image and keep track of it until that frame went away. Fairly straightforward, actually. It's a nice, easy optimization that's made possible because, again, the application uses present to get the stuff into its window and the compositing manager uses present to get the data from the window onto the screen. So I can use this same extension twice to take the application's contents, hand it to the compositing manager, and then take those application content by handle and put it onto the screen. So you only copy the data once, even though there's two logical present operations that occur. Now, the one problem with this is that we have applications like VNC or other shadowing applications that want those application contents to appear in that off-screen PIX map. They want to see those contents. So if you're running a window-sharing application that wants to take the contents to your window and broadcast it out over the network or out over a wireless connection to your television or up onto your projector here, then that application actually has to have those contents in an off-screen buffer. And the easy plan that I'm planning on trying first is to tell the compositing manager, oh, by the way, there's an application that would love to see those contents in this off-screen buffer. You might want to put them there as well and have the compositing manager figure out how to do that. Basically, totally get rid of the X server, server's responsibility for managing the window contents. It's a fairly simple addition to the present extension. Essentially, the application is going to send a present operation to the X server. It's going to get redirected to the compositing manager and the compositing manager's going to turn around and do a present operation directly to the screen. And it's going to hand enough information to the X server in that operation that that data, the original window and sequence ID, can get passed back to the original application and he can get notified that his original composite operation has now been completed. So that lets the application know precisely when their data is not in the window, which the application doesn't care about, it lets the application know when their data is visible to the user, which is the key part that's missing in the current DRI2 solution. So we actually get the compliance for the OpenGL extension specification that says the application is told when the first pixel from the screen is being scanned out. You get like a system time in microseconds. That actually a fairly nice addition in addition to the benefit that we get from reducing copies where actually they get applications that actually need frame times, the right data there. For full screen applications, right now, what a smart compositing manager will do is it will recognize that an application is now covering the whole screen like this one is here and it will say, oh, yeah, when the application is covering the whole screen, then turn off all the compositing stuff and let the application talk directly to the screen, which is, in X, this huge pain because you have to unredirect your full screen window, turning off all the compositing extension stuff for your window, which almost always causes a bunch of flicker on the screen, which is really annoying. With the present extension optimizations, there's not gonna be any need to do that composite extension dance. The compositing manager is just gonna get this PIX map redirected to it and it's just gonna send it right onto the X server and the full screen swap will just happen. It will be fairly straightforward and we get the benefit of the flipping that we like that just changes the pointer and we get the benefit of no copies. I have two different plans for trying the redirection. I have a simple plan that I've been kind of talking about right now. Take the raw present request the application passes and just basically pack up all those parameters from the request from the application and hand them the composite manager and say, oh, Mr. Composite Manager, do this thing that the application asked for. That's a pretty straightforward plan in the X server. The question is, is it gonna be too complicated for the compositing manager to be successful at? And that's what I'm gonna do some prototyping. I've been starting the prototyping already. I can show you the progress on that so far. The other thing that I want to do for the present extension that's a little more complicated is to actually tell the application, oh, don't draw to just your window size buffer. Draw to this bigger buffer so that I can paint all of the window decorations from your window manager around your application and now I can actually make that be the new window back buffer. We have this off-screen buffer that contains the window contents. Well, it contains the window contents and it contains all the window manager decorations around it. So the frame and the resize bars and the close box. All that stuff isn't part of your application. It's owned by the window manager. So the other option here is to tell the applications, please draw your application contents in this slightly larger buffer and then the X server can manage making sure that those window manager decorations have been painted around that buffer and then it can send that updated buffer to the compositing manager. The benefit of that is that this whole dance of having the compositing manager need to manage the windows off-screen buffer mechanically go away. It's all managed internally in the X server and there's some additional optimizations I can take advantage of later. So how would we draw to larger buffers? I'm sorry, it's very late in the evening. So all that I need to do is to make sure that the window manager decorations get painted around the application is to do damage tracking to the original window buffer and if any window manager paints to the frame area around the application, I can just track that damage and when the application gives me new buffer I can copy the damaged regions from that into the applications buffer and then the X server would need to remember which PIX maps it had put window manager contents into and what areas needed to be updated. It's a bunch of bookkeeping but it's fairly straightforward and it would make the compositing manager much simpler to implement and so it may be a better scheme and I'm gonna try both. The other API I talked about in the kernel is this rectangular region swapping API. I talked about this at LCA in January. That presentation's also available online if you wanna look at that one. It's effectively this cute kernel hack that goes and edits the page tables for those two off-screen buffers. You have these two images that are made up of a tiled collection of pages. If you just take the pages between those two buffers and swap them, then all of a sudden those buffers have new contents and you can do a page-level, a page entry swap instead of a pixel content swap. So instead of writing four kilobytes of data for a page, the data in the page, I can simply write a 32-bit page table entry and update that and that's a tremendous performance benefit. I measured 30 or 40% faster performance doing the page level, page table entry updates compared to copying the pixels. On Intel hardware, Page actually covers a rectangular region of pixels on the screen. It's 128 by 8 pixels. Yeah, it seems like a really tiny area to be four kilobytes, but you've figured out and four kilobytes is only 1,000 pixels and 1,000 pixels is only 128 by 8 region. So this screen is covered by hundreds and hundreds of pages. The problem with wanting to do this page-level swapping is that application contents would have to be aligned to a page-level boundary with the target between the source and the target. Otherwise, I can't do this page table trick and that means again that I need to get the application to draw his contents, the applications contents excuse me, into a large enough buffer that I can align stuff and that's kind of a pain. I have a demo for you today of metacity and using present. I've actually hacked up metacity. It took, they actually changed just to metacity. It took about 90 minutes to implement. Fixing the X service though it didn't crash anymore when I ran metacity, took another couple of hours but that was just a bug and that was really not implementation time for metacity, honest. Metacity, I don't know how many of you have ever used metacity as a window manager? If you use GNOME 2, did you ever turn on a compositing manager feature in that? Yeah, was it pretty much what you wanted from a compositing manager? It gave you the cute little drop shadows. It gave you translucent windows. It did pretty much what you wanted for compositing manager, not a lot of fancy UI. So it's a pretty simple and very basic compositor but it shows the basic requirements. If you use that compositor, that compositor was written entirely with the render extension which is an old 2D extension and all of its updates to the screen were just copy area. All it did was copy stuff around. It was not vertical VBlank synchronized. When you ran that and did an animated application you always got tearing on the screen which was annoying as all heck. So I've hacked up metacity, that old compositing manager code in metacity did actually use the present extension. Why did I do this instead of some modern GL based compositor? That's very simple. It only took 90 minutes. My plan for that of course to make that a basis for prototyping all of my compositing redirect stuff again because it's a small simple code base and I can make sure the ideas are sound before I try to go off and do this in OpenGL. OpenGL is not really my friend. I do OpenGL drivers for a living. I don't write OpenGL applications for a living and apparently they're very different. I understand coding in OpenGL involves like talking to the OpenGL API which I've done like twice. So what's the status of this fine work? The DRI extension is complete and working. The present extension is working. It doesn't have any of the redirection stuff. Metacity with the present extension is working. I can give you a demo of that. The kernel page level swapping API. I implemented all of that. I have a prototype. It works. It's very fast. It probably needs some API cleanups but it's basically done. The cut through API changes when I talked about cut through where I changed the pointer to the current scan out buffer in the middle of the frame. The API changes for that were very simple. I did the implementation for the Intel hardware that stuck this cut through change into the command ring. So it didn't happen right away. It happened sometime really soon which isn't really what I wanted. The hardware actually supports cut through right now and so actually using that capability in the hardware and still having it synchronize the appropriate place in the command execution stream that will take more implementation work and I need to actually do that. I know what I want. I know it's possible. Now it's just a matter of typing. Future work, obviously I wanna do the simple redirection and see how well that works with my metacity prototype. I'll have that working probably in a couple of months. I wanna figure out what to do with video. Right now the present extension doesn't have any notion of anything other than a PIX map containing RGB pixels. But there's no reason in the world that it can't also talk about YUV data so that a media application can hand it a buffer full of YUV data and say oh please put this on the screen too and then the compositing manager could do the right thing with that. And then I wanna think about, well on this environment I have two monitors here. I have my monitor up here and the monitor over here. They can display different PIX maps so I could do page level swapping up here and separate page level swapping here and I really wanna do that because these are not synchronized at all. They run at different frame rates so I really wanna be able to treat them as separate objects. There's no reason with a present extension I couldn't have per output per monitor PIX maps. Now I have a couple minutes of demos and then we'll have about 10 minutes or questions. Let me see if this works at all. So this is running old school metacity for the demonstration purposes. I think I have, I can run an application that will show, this is with compositing turned off. And so here's an application using a present extension and with this projector it's actually working. There's no tearing at all. You can see this is just drawing this vertical line, sweeping it back and forth. Without the present extension this tears miserably. Can I do that? I can't see the window manager title bar on this app at all. Ooh I hit it. But if I come over here and get the window to be half on the other monitor. Now synchronize the my internal monitor here and you can see that it's no longer synchronized up there on the screen. So that's just present picking a monitor to synchronize to and not synchronizing to both. And again if we had per screen PIX maps or per monitor PIX maps then we could then we could actually synchronize to both. So you can see the difference between synchronized and not so much. Okay. Yeah okay metacity just crashed. This is metacity from Debian's current release and I turn on compositing and it crashes. Thank you. This is not my fault. I think all I need to do is restart metacity actually. Ready for the desktop. And I can restart this present application and you'll notice that with the, even with, even all on this one monitor because metacity is not using the present extension as just copying stuff. You can see that the copy happens. It kind of a random place in the screen. Why is it happening up close to the top of the monitor? Well it's happening close to the top of the monitor because my application gets its data copied exactly at the vertical blank time. And then metacity comes along sometime later and says oh you put stuff on the screen. I should put that on the monitor and it takes as you can see about 100 scan lines for it to actually do all of that computation and get stuff onto the screen. But now I can switch out my metacity to the new shiny metacity. And this shiny new metacity doesn't tear at all. Isn't that awesome? That's my exciting demo today. Yeah. So as I said, we have about 10 minutes for questions and then it's time to go get dinner. Yeah. Okay the question is how well does this work with rotated screens? Right now the way that rotation is done with the render extension is all handled by shadow frame buffers within the X server and there's no current synchronization of those updates to the vertical retrace. Now one of the easy things to do with the present infrastructure in the X server would be to actually synchronize those updates with the vertical retrace on each monitor because the server now knows what vertical retrace is and can actually have callbacks at vertical retrace time. So it would be easy to take the present infrastructure in the X server and fix those paths to make those synchronized. In terms of getting rid of that additional copy with a rotation, yeah, we should probably do that and that would mean reworking the render extension to not do that implicit copy and expect the compositing manager to do the copy for it instead. Again, fairly straightforward but I haven't spent any time doing that work. Good question though. Other questions? Everybody ready to be done for the day and go get some dinner? Oh, other question? Also the obvious question is when is this coming to the distros near you? Well, we have an X developers conference next week where we're going to discuss when this is going to get integrated into Xorg. The kernel changes are not required for any of this functionality, they just improve it. So really the gating function is Mesa and Xorg and those are both fairly much under my control. So the X server changes will be integrated probably in another month or two and then pushed out for distributions to pick up in the next version of the X server, 1.15 or 1.16 and then I'll probably happen this winter and then I imagine distros will pick it up, like Ubuntu will probably pick it up in 14.04 I would imagine, although Ubuntu in particular who knows with their mirror plans but Debian certainly will get the new X server integrated into unstable fairly quickly and Fedora at some point, yep. The other thing is if you're using a Windows system underneath X like the Windows or Macintosh or Wayland or Mir, all of these reductions and copies help there as well because all of those Windows systems desperately want to be told about whole window updates and the present extension gives them precisely the information they need in order to optimize those updates as well and make that the integration between X and any underlying Windows system more efficient as well. So even if you're not using X as your foundation Windows system, having this particular migration in OpenGL, GTK and QT, all of which are very easy to do, will benefit even those environments. Questions and comments? Hey guys, thanks for coming. Thanks for spending the last hour of your day with us here in this fine room and we'll see you again tomorrow.