 I've been working in Korea, and in general, I'm working on the X-server, and here I was working on ARDLX, which is what I'll be talking about today. So the first part of this talk is about what ARDLX is, how it works, and some of the problems we had when we tried to do this. And the next second half of this talk will be about what kind of problems are still in there, what doesn't work, and what's needed to finish this work. So it's a step back from ARDLX, and we're all looking at the process of the steps that come with this. Amazing features we still have. This is not using cheese, new brand new output mobsetting matty, this is the ideal PS mobsetting bar. Oh well. I'm trying to take a picture with my friends. So the regular, the normal mode of operation for the X-server is that we have a client that takes to the X-11, and when the client decides not to run, I want to do the Rectangle, and then the client takes the X-11, and then puts these random requests into a network packet, and sends it off to the X-server. The X-server then sits on the other side of this network connection, and the packet contains these random requests, and then takes out the random request, and it performs an assay coming. So it runs the Rectangle and wants to try and get the client to request. So this is a network transparency that X has always had, and this is what allows it to work over a network, and have a remote display over here, and the client here, and they will send the random request back and forth. So with OpenGL, OpenGL is supposedly the 3D graphics API that's available on the normal platforms, and if you want to use OpenGL under the X-server, you have to use GLX. This is a set of OpenGL extensions that specify how OpenGL interacts with the X-server and the X-mer. In this case, GLX has two basic modes of operation. There's the indirect mode operation, which is pretty much how X normally works, and there's a direct mode. So in this case, the application wants to render and still link things to GIF, which is similar to what the X is when you render a regular X request. So it links to this one here, library, which in indirect mode, which is how the X normally works, in indirect mode, the library will perform pretty much the same task as the X-server. So if you want to render this 3D volume, you send off the 3D triangles, and you send the textures, and GIF passes to GIFGL. GIFGL will then take these random requests and put them into a network packet and send them to the X-server. And again, the X-server is on the other side, so the network request connection receives the random request, and passes and renders these commands. The way it used to work was that in this indirect case where the X-server receives the random request on the network, the X-server would have a software implementation of the openGL API. So if you wanted to render a triangle to a point of software, it would do a scan-line conversion of the triangle and perform the texture of the box, and render the triangle step-by-step. This won't be done in software, and eventually the openGL application will cover this onto the screen. And that's, of course, way slower than if you had hardware acceleration. So for a while, indirect rendering was slow and in software. And direct rendering, which is the other case, was the fast hardware acceleration. So I'm recovering from that. So you're having too many coherence because of being stuck up, just that we don't want to talk to the X-server. All right, so as I mentioned, there's another mode of operation for the openGL binding. This is the direct mode. And in this case, in this case, another link to the GL. And as I just explained, the indirect case is where the application talks to the GLH protocol module, which will then take the random request, and put it in the actual package, and then over here to the X-server, where the GLH module receives the request, and decodes that, and sends the rendering request into the GLH protocol module. And this is the software racerization I was talking about that takes the triangles to the scan-line conversion, and eventually copies the context to the screen. The other case is a direct case. And this is where we started to deviate from how the X-server normally works. In this case, what we ended up with is the driver actually being linked into the drive. So we have the application up here, looking into the GL. But then as we try to start rendering, there's a lower part that starts out by talking to the X-server. And it asks the X-server if the X-server can do direct rendering, because that's usually the preferred mode of operation. And if the X-server says, yes, I can do that, and it's allowed to pass back the name of the driver to load, and the address of the framebar. So at this point, the client actually knows where and where the framebar goes, which driver to load, and it proceeds to load the driver as he has it here. Which then GL opens, and in each license, the address of the framebar, among other things. So at this point, we actually have the driver for the hardware. This goes into the client, and the client starts submitting regular requests into the driver, instead of putting them all in the network. So we received the regular request to the GLI driver, which instead of starting software rendering, we'll program the hardware to rasterize the track source of the try-angles as is necessary, and they'll set up pointers to the textures, whether they're specific in the video memory, and it'll render everything on-screen directly. And the X-server basically doesn't know what will happen, because it just passed out the framebar for the rest, and the driver goes out to load the rendering. So that's the data rendering case. And since this is hardware, and we load the driver on the client side, and program the hardware, and then the hardware does the rendering. This is also a sequence with fast rendering, X-server rendering, and for a long time, the interagenda was soft, and slow software rendering. The X-server doesn't know what will happen, but it is part of the game, somewhat, because you still need to respect the old-level windows you have in X. So if you have an X window, you're running the X-server and the ToxRazer, and you drag another window on top of your ToxRazer window, the rendering has to respect that window, so you can't just render across your X-terminate whatever it is you're having on top of your ToxRazer window. So whenever the windows move or the clip racks change, that's the civilization going on between the driver rendering client and the X-server. So the X-server says both you have to re-read the clip list, and whenever the driver rendering client is ready to do the custom rendering, it goes and re-read the clip list so that it doesn't overwrite the X-terminate or other rendering. That's the interagenda rendering. Okay, the interagenda rendering, which is what AID likes, is the case where we basically want the X-server to load the half-rider and use the half-rider to implement the render request that they came and come in. So we want to make the X-server work the way it normally does for normal X. We could accept it and do it for a few years more. But the reason why we have direct rendering in the first place is that we want performance. With the X-server, with normal rendering, we don't really need that much performance. The overhead for, we need performance, of course, but the overhead for doing the mastering and the network access is negligible compared to the X-server rendering, especially if you have to fall back to software. So when you see your X-server being slow, if you're scrolling as slow, if the PDF document is slow, it's the regular bus. We have to fall back to software and do the same operations in software. Because we spend too much time sending network packages back forward. It's always software fallback or something like that. But for our own, we never do software fallback. It's mostly always hardware. And since we're running quake, we want the biggest possible frame rate per second. Everything matters. So if we can avoid the network access here, we might get one or two extra frames per second in quake. And that's one of the reasons. So here is most of the problems. So the question is, we have dynamic rendering, which is faster than inner rendering. And it does the same while we want X-server in quake. And then there's a few reasons. I'm listening to Jilek's text on the wrong picture. Which is probably the primary reason I did this work. So for Compete, the compositing right there, Compete is an OpenGL application that renders the entire screen, the entire desktop using OpenGL. And it does so using the Composite extension. What the Composite extension allows you to do is you take the contents of a window and when clients, the applications render to an X render, it's rendered directly into the frame buffer. You see the contents appear in-screen as the application's rendered. What Composite does is it saves all that rendering. And without the application knowing so, all the rendering I have is still an off-screen buffer. So the application will go in queue with a number of rendering requests and the extra will perform these rendering requests. But it won't actually appear on the screen as it usually did. This is why he needs all this intermediary, because we need intermediaries for these off-screen buffers. But that's a different topic. It's pretty clever, actually, because that allows us then to determine how we're going to composite, how we're going to combine these off-screen buffers on screen. And this is where stuff like transparent windows becomes possible, because we do this in a second step. So we can use extra channels in the PIX map to determine the transparency of these PIX and it will scale our windows as Composite does. Composite gives us all these possibilities and Composite actually implements it. But to go from the off-screen PIX to something that Composite can use for rendering, we need some way to go from the PIX map in extra intermediaries, or at least extra intermediaries, to a texture that we can use for rendering, how we use all of these to render the data. And this is the texture from the drawable PIX. It's called texture PIX map. This is the exception. So that allows us to take the PIX map and use it as a texture. So as Keith mentioned, when we talk about doing the entire external to XGL, to OpenGL, it's probably not feasible, because there might be deviations between different OpenGL implementations. But doing it this way, once we have OpenGL accelerated available inside the server, we can do this piecewise. We should turn it off, that's a really good way to accelerate, say X middle using OpenGL. We can do that for one driver, where we have that PTO available. If it turns out we can do some of the compositing for rendering with OpenGL in a specific driver, we can enable that and use that. So it opens up a lot of this work inside the X server, and as such, it's an enable, doing all of this. And of course, the ability to use Accelerate OpenGL across a network is interesting itself. If you have your 3D, if you have an X server that has the possibility to look at a 3D driver, you can actually have Accelerate X on a desktop and run the application from an application server so that we can get Accelerate 3D in this case. Of course, the caveat in this case is that if you have textures, you need to turn it off for this case. This is the drive-back. You have to send the textures through the network. With direct rendering, you can download the textures directly because client and driver is in the same process space. But overall, Accelerate X is interesting because we can then accelerate these compositing methods that everybody uses these days. So how do we do this? So from a 20,000 feet perspective, it's easy because we definitely need to let the X server load the DRI driver in pretty much the same way that the client would load the DRI driver. And a lot of the talk that the client does in the X server, you set up to download this. For example, the client has the X server to use the port and the DRI driver to use the virtual framework. So that's a king combo or not between the client and the X server. All this is now just internal to the X server. So we can just go into a different part of the X server to say you need to support the DRI driver So all this is basically like the X server. Now there's no protocol going out in the wire. The X server already has a module infrastructure so we can use the DRI driver to load the DRI driver and then call into it. We need to provide some functionality to the DRI driver so that the DRI driver can call into the X server for various pieces of information. In general, it isn't that big a task. It's just a matter of dealing with the driver and then tweaking interfaces to make sure that the driver can get information into the X server and vice versa. Yeah, I'm going to skip this slide. This is about how we initialize visuals to this kind of message but it was made before and it's still there. So one of the more interesting case problems that we were able to do in this work was the DRI driver. In the case when we have a direct render we have an application over here that loads the driver to program the hardware so this is the driver when it sits till the hardware to go on from the render to the driver. At the same time we have the X server over here to say move a window around by programing the hardware to copy what part it's going to run. The applications are trying to access the same hardware and programming to do different things. So we need to synchronize this in some way and the way to do this to do this is to put a finger around the other thing. So that's the DRI driver. The way it works is that the DRI driver knows that it has to synchronize access to the hardware. He knows that there might be different DRIs on each one. You can have two copies of this in different countries and you can have several clients at the same time and you need to respect that. So the DRI driver that he knows and the client to do the rendering already knows this knows about this and he takes it whenever it's about to to keep the submit command and releases it once it's done. On the other hand the X server doesn't know about the lock. It thinks that one without the X server and the X server assumes that it so most of the drivers are written due to just render and expect whatever they submit to the cart is still going to make time to come around and apply the render. But that's not the case in this because we might, from point one you submit a render request in the X server. Maybe the in the X server gets swapped out and the DRI render time runs and it falls in each range of DRIs and then the X server is swapped in again and now the state is completely different So we need some mechanism around this to make sure that the X server gets confused so what we do is as soon as we receive a request from the server we take the island and restore the state that the X server expects to see and then we go off and render the request as you show and then when we close the pipe we render all the request that was sitting in our home and just before we go back to sleep we release a number so that if the DRI gatherer should be able to acquire the log and do the render request So that's just a thing from the DRI in the X server and the client, the DRI render client that is known to you and it's not a motivation way to do things for the work and so they think they're not thankful Yeah, that's fine. So the question is if there's no issue because there's the freezing when you drag an application around and the application freezes it doesn't run until you load it and as far as I know this is part of the problem because as you prepare the engine to drag it around the X server is just standing with requests to move the wing around it doesn't release the log while you move the wing around and it's fine to render but it sees that the log is taken all the time as she moves the wing around so it just sits there and waits for the log to be released which isn't the reason that you load the wing around and stop the wing So it's not the most efficient way because you get immediate access to this where you stop the wing but it basically works and the better way to do this would be if the X server actually knew about the log it would only take it for that sure amount of time that X would need to be rendered so if she submit an X request this is probably a window for one part of the stream to another or whatever it is the X server needs to do and we will probably see this issue go away or if you can have the convention log as architected by a submit request and the driver has written a search but that's the difference But anyway, the interesting part is that with this log setup we get into a foundation because we have a driver then the DIG driver that tries to drag the log before it just is 3D rendered and now we try to run that from within the X server So what happens is the X server receives a request of the network saying please render these 3D dryers and then it goes and drags the log because that's what it does and then calls into the DIG driver and then the DIG driver will think I need to drag the log before this but the X server already has the log so we have a recursive jet log here because we try to run the driver that drags the log from inside the log so we have to do some some hanging on this and then a similar way to do this is to just see if a request comes in and it turns out that this is a 3D request then before calling into the driver to render that request you just release the log and now the driver will say let me grab the log and render this and that will succeed because you just let go of the log so the driver will be able to render whatever it renders and then you release the log and return to the X server and the X server will then immediately retaste the log so the X server takes the log and eventually the X server releases the log and returns to the X server so this actually works fairly well I mean then I hope taking and releasing the log is not cheap because it's not just taking the log it's also about restoring the state of the graphics card to be what the X server or the 3D drive expects but in the end there's not much of a performance energy to measure here and then it should be more efficient ok, so the picture from now on well this is a big the big idea for doing this the big motivation and the thing is now that we have this 3D driver running from within the X server memory address space we usually say all right, but the picture comes from here because the picture is the X server or the X server address space and now we have the the 3D driver in the X server address space too so we can say to the driver all right, we've got this picture over here this is your texture so the implementation of texture and picture is rather simple now because we can just point to that picture and there's a noble way to do this it's the best way to see if we have that picture sitting in a building river on the car so that's what the X server will do so we do really keep the the video memory inside the picture in an offscreen rocker on the video and actually render your footage and bring the rectangle to whatever you do to the window then that drawing will accumulate into the offscreen rocker on the video and then when you want to texture we want to render a polygon and a triangle and we want to use that video memory as a texture we can just point the 3D stack to that video memory area with the X map so that's the ideal case but it doesn't work that way the I'll get into that later but for now what we're doing is we just ask the X server to not put these pictures into we don't want to keep the pictures in offscreen rocker so whenever you do you bring the rectangle to the window the offscreen rocker will develop the video memory of the picture and when we want to use it as a texture we post the GL stacks to this host memory area and the 3D stacks download the host memory and set up the texture so we have this back and forth from host memory to video memory all the time we go from host memory to video memory on every video we update so that's what we're doing now as far as the labels to follow next to the desktops as we've seen today and it's something we like to use the same in video memory the same piece of video memory that the X server ran us into we like to be able to use it as a texture without coding things around but we can't do that right now because the way the X server manages memory is kind of awkward so we have video memory and what happens as a style of this is that we take our memory from the framebox so we set aside this video memory for the offscreen the image that we see on screen then what happens is that if you have TRI game we split the rest of that memory into two parts and the X server managed one half of the video memory and then 3D stacks another part of that video memory so we have two sections of the video memory we can really move the X around the X server to understand the DRI the DRI driver to understand the X memory so the problem is that if we ran it to the pixel this pixel will live on the X server half of the video memory and when we want to use it as a texture we really need it to be in the 3D driver part of the video memory but it's over the other half so we can't do that so what we'd like to do here is we just have one big new memory on the card and this is what the words that Keith mentioned the talking guys are doing this work doing a current memory memory for the video memory so we can have the video memory we can allocate an object in the video memory alright this is the buffer and the X server is going to render some stuff into it and then later on the DRI driver will use the buffer as a texture and the DRI driver will use that same buffer in the counters in there so the work they're doing will not allow us to work and we can have the serial memory take it completely back which will then get rid of the slow scrolling and then a bit forward which we're seeing in companies and in accelerators composite desktop and the other thing is regression functionality so the way that DRI render works is as I said at the beginning of the talk the DRI driver will ask the X server for the address as a framework and then it loads the driver and renders straight into the framework but if we're running a composite desktop it shouldn't be doing that it should be rendering into that piecebox there's window patterns but the DRI render driver doesn't know about this and just renders it to the protocol as it's always done and it doesn't work that well so it looks like it's working because this is DXK is running a DRI renderer but when they try to move the window around it's obvious that it's not working because so the window stays and then when the window settles down the and the thing is companies doesn't actually update the X server's idea of where the window is until the window settles down I mean as long as it's working like this you can't really tell that the X server is a squabbley shape so it doesn't even try but the moment you let it go it settles down it says alright I've got a new position so this is where it should be running it's only when it actually has a fun position to move it around another thing is that it doesn't respect the clip lists here because sometimes we've got problems I think that we have the backing is different because we have a big window sitting in front of the whole desktop so none of these windows are visible anyway in the X server system so there's a big window sitting in front of it but companies doesn't really work into this window so we see the window anyway but the I've got a lot of space lists and I just ran it into the framework so that's something we need to fix what we need to do here is again tie it into the regular menu work because we could tell the the outbound to say when you add to the framework it will point into a different address and say this is where it should be running and then give it the address of the PICNAC we wanted to run it to but there's not a lot of problems with that so with the technical PICNAC idea the way now we actually need to add this contents in host memory but if we did it for diary rendering it would end up in invader so we couldn't easily use that as a texture another thing is even though when we move it to the diary we see the frame of data here but that's actually several days of data behind that old yield frame and like old yield can do a double block rate which in this case will render to the back block and then when you say swap it comes from the back block into the front block it also has depth buffers that tracks the depth of these PICNAC so if you render a new PICNAC and that PICNAC is in front of that PICNAC it'll cover it but if the new PICNAC render is just behind it it'll just not update that PICNAC and keep the old value in place it's got other types of buffers, stencil buffers and all these buffers are allocated out of a big chunk of memory the same way they'll be allocated from buffers so if we try to render out of these buffers they still they'll still clip the rendering according to the old algorithm and what we really want is more dedicated buffers for say the depth buffers and that means that the 3D printer needs to know how to allocate it doesn't know what to do right now just take a chunk out of a big screen inside a buffer and it uses that so with the memory renderer work the driver would be able to go and request an dedicated buffer for back buffer for depth buffer and it can then set up and render to an old screen PICNAC the same way we do it for regular PICNAC as mentioned we have another problem and so if we're in the expose case this is when the key decision is demoed but I can show so I have two windows here and this is my laptop it's a 1024x768 screen so that's not a lot of space coming between two windows I had to resize them I put the next to each other or I could use the expose which will if I throw the mouse on the computer before we scale it down like that and I get a view of all the windows on my desktop but the extra doesn't know that the windows have been resized and moved around like this the extra does still think that the three windows are full screen size and all sits on top of each other so if I if I could click into this window the extra will actually think that the mouse cursor is up here in the other form of the window even though it looks like it's almost in the middle of the window and if I were to select this text here I would be clicking up here instead of so what looks in the middle so the point again is that the extra doesn't know that the windows have been moved around and scaled like this and it has to map the input into the windows according to where it thinks the windows are so in this case COMPITS is this clever enough about this because it doesn't try to let me drag an icon like what would be useful if you could drag this icon into this folder, I want to copy file from so I click here and drag it down here but COMPITS doesn't let me do that because it doesn't work so what it still does is that I can click on the window and select the windows to be focused and that's sort of the limit of what you can do and what happens here is that COMPITS destroys the windows so COMPITS knows that the windows have been scaled down and COMPITS can highlight the windows as they move over with the mouse but I can actually interact with the windows in this case so as Keith mentioned the input redirecting work that David has been doing actually lets the application instruct the extra how to map input to windows so in this window here this is thrown away off in the corner and scaled down the COMPITS will demand the California X by sending it instead of trying as that describes the relation between the windows as it appears on screen how it actually is in the X so with that in place we should be able to finally get from policy working to know how it was meant to be so that's a few two items left still but it's getting there so first thing is about support for transparent windows with open here so with regular X windows they request a visual and a visual is just a data structure that describes how the pixels in a window is to be interpreted by data so it's not possible to say I want to ask for a visual that has an health challenge so that along with the great green and blue values we picture the transparency of the pixel and then you can use this to make here windows like this shape so you can notice here how this path is full of paint you can't see anything through this but in this case we have this slightly translucent you can see the folder of the desktop here so you can do this for regular X applications but if you want to use an open here application and specify that you want a ADP vision for the open GL application that's we don't really support that but I don't think it would be a big big problem, depending if the X if the green and purple can render an image that has the alpha values in it supporters would be pretty easy because we just once we have green and red dialed render and green and purple can provide alpha values for each pixel it should basically just work I don't see that as a big problem but yeah, XV is another regression that has been introduced with the combustible because XV uses it's a different story but the effects are similar if you put an XV on the desktop and try to move it around you'll see that the video doesn't play that correctly this is again that the XV except it uses a feature of the card that allows the card to do the color space animation and it uses a color key to indicate on the screen where it should be the video source where it should get in the pixel from the frame of it and the thing is as you walk into the window and move it around the description of you can tell the alpha where to put the color space converted data and then this contents can only be so you play the window and as you move it around you don't actually update that position and even if you could update it still wouldn't transform as it really does so to get that working we probably need to use over here and we use pixel chairs in the color space conversion I know there again has done some work in this area and I think David has something his HDL server as well that does this but what he actually likes is still up there and that's something I need to put in as well scrolling in files and anything that's still slow and that's that points down to the way we do the texture from pixel the applications does the XV or renders the window parameter into host memory so when you're scrolling on Firefox nobody will file for it we do to scroll the window as in as the X will probably be the area of 10 pixels how much you're scrolling and then file for them and that's straight down there but what we're doing now is we the X server has been instructed to do this host memory so there's scroll to this level now there's event copy in the host memory to cover the lines up in place and renders this for this stuff and for text rendering it says if you want to copy the image and characters into the buffer most tasks are in to accelerate this compositing into the but in this case we put back the software so we have those composite individual characters so this is slow and once we've done that we have to cover the entire buffer out to the texture to render the new frame on the screen so that's a big problem right now but hopefully if we once the rendering is in place we can just render it to one video memory of objects we use an object in the X server in the 3D stack if we want that to copy anything we can accelerate our render so that's the idea it's different then but we know how to do it one last question render resizing resizing yeah that's a more involved issue it really comes down to some applications like to slow the resizing and that's how objects there's a lot more factors involved here we definitely need to and some of the things that are interesting that we can do with composites is that we can actually synchronize the resizing between the application and the render before the first visualization you resize the render and in the application we get a number of resizing events and you get these head renderings but now we can actually resize the render and once the application has rendered a new frame it's possible to render a few resizing events we get consistent resizing but it's still going to be a big issue to see if it's better