 Good morning! Great to be back in New Zealand. Let's see, last Thursday I stopped working on graphics. Finally it's done here. But I have a bunch of stuff from last year and actually of course my minion army will continue to produce new graphics. Excitements over the coming years so nothing will slow down my absence. I wanted to talk to you about some work that I actually did last work last year on drawing 2D graphics with OpenGL. A system that Eric and Holt started a long time ago. In 2004 I did a bunch of presentations, one of them at OLS called Getting X Off the Hardware. It's only 11 years later. We're making good progress. So that speech talked about doing 2D drawing with OpenGL. It talked about how we were moving from a land in which you wrote a lot of custom per hardware, per chip and per input system code for X and moving to a world where we're writing a lot more externalized functionality in terms of using GL for rendering and using EVDev for input and X was going to become much like it was back in the 90s. Everybody goes to Portland to revisit the 90s and X is no exception. We're going back to a world where we have kernel based drivers for graphics and fairly simple rendering infrastructure. I wanted to talk also about hardware independent X drivers and how we're getting rid of the mode setting stuff from X drivers and how I've accelerated rendering development using an X on X solution called Zephyr. Okay, Glamour is GL based X acceleration. It was started by Eric in 2008 it turns out. In 2008 Mesa kind of wasn't as good as it is today. It had support for a slightly older version of the GL standard. It had some sluggish areas in vertex shading on most of the hardware we had then. As a result Glamour was really targeting a very different world and in the ensuing six years we've really made enough changes in the GL world that it was really time to revisit Glamour and try to figure out how we wanted to support 2D graphics using GL today in particular on Mesa. I don't test on anything other than free software of course and so I really had no idea how well Glamour works on non-free software and I really don't care. A brief overview of GL and I'm sure Ian will laugh at this. GL basically does four things. GL has vertex shaders that you write, little programs that you write in GLSL that give a pile of input data, whatever input data you want and they compute triangle or rectangle coordinates for both the source and destination of your blitz or whatever painting you're doing. There's hardware in the system that takes those coordinates for the primitive and kind of interpolates values across the primitive. So if you have source coordinates and destination coordinates it'll interpolate them so you have for each destination pixel you know the address of the source pixel to fetch from. And then you have another piece of code that you get to write called the fragment shader and so the fragment shader is given all these interpolated coordinates and it goes and fetches whatever data it needs to get from various pieces and computes a destination pixel value operand which is an RGBA value and then you throw that hardware, throw that RGBA value in another piece of hardware and it actually draws that pixel to the frame buffer. That's GL in four bullet items. Did I miss anything? No my fragment shader does not do interpolation. A real hardware apparently has put the interpolation into the fragment shader because that way you can do non-linear interpolation I'm sure. Okay. That's what GL makes it look like in any case. Here's a short slide on what X rendering looks like. X has a bunch of geometric primitives. You have rectangles, text, lines, polygons. The X server breaks some of the more complicated primitives down into little spans which is just a horizontal chunk of pixels. A single line you know 10 or 20 pixels long. So if you have some complicated polygon it actually cracks that polygon and hands your driver these little spans. That was awesome in 1986 because none of the render had polygon rasterization so doing it in software and having all the drivers do something simple was awesome. There's also more complicated primitives, arcs, wide lines, field arcs, polygons. We don't use these anymore so we really don't care. X drawing has a fill style every X permit is either filled with a solid color or a regular pattern. Either a tile which is a multi-color pattern or a stipple which is a two-color pattern. And why there are two different kinds? Well yeah. Welcome to 1980. There's a raster operation. These are bitwise boolean operations back in the small talk era this was kind of awesome because people thought that these 16 operations would like to cover all of the possible things you'd ever want to do with rendering on the screen. Two of them are useful. Copy and clear? Copy. One of them is useful. They're completely irrelevant today. It's okay. GL supports them. The other thing that we used to do is we used to do crazy tricks with our color maps. Hardware used to have a mapping between the pixel value and the RGB value that would appear on the screen. You'd crack these pixels up into a collection of modifiable ones and non-modifiable ones and you'd have this plain mask thing which would say don't draw to those two bits of the frame buffer. The other six ones, yeah, draw those. But those just leave them whatever value they were. Yeah, we don't use those today. And the last thing in X core rendering is a dash pattern. GL also has dashed lines. GL dashed lines, there's like eight different dashed line patterns. If you like one of those eight patterns, it's awesome. If you wanted something different, yeah, you can't use that. X dashed patterns are way over-specified. X dashed patterns is an arbitrarily long list of links of dashes at eight bits per length. So you could have a thousand element dash pattern. It was like one, two, maybe the value of pi encoded in pixel links. Yeah, it's kind of silly. Glamour last year, before we got started kind of rewriting it for the modern world, Glamour had a couple of fixed GLSL programs to do some of the low-level rendering. It didn't actually cover the entire X-spec though. All of the coordinates in Glamour were computed on the CPU. Here I have this massively parallel VLIW machine that does floating point operations in the blink of an eye. And I compute everything with the CPU because, God forbid, I should ask the GPU to add or multiply numbers. The tiles and stipples, these repeating patterns that X uses all the time, they were actually repeated mechanically on the CPU to actually fill multiple rectangles to do tiling. A lot of the time Glamour just throw up its hands. Yeah, I don't know how to do this X operation. I'm going to just leave the CPU handle this. And so the Glamour would actually pull the image out of the frame buffer back to main memory, paint on it carefully with the CPU, and then push it back into the frame buffer. For some things it was fine. If you were running X-term from 1986, it worked great. If you were running a compositing window manager, it kind of sucked. Let's see, my Glamour plan was to move all the computation to the GPU. It's like the CPU, yeah, we have a CPU. I think most of us would prefer to be using our CPU to be running Emax instead of running my tiling computations. So the plan was to just dump raw X coordinates in integer form right at the GPU and let the GPU deal with everything. No fallbacks at all. You would never use the CPU for drawing anything. And the way that I was going to avoid this, you saw the combination of rendering operations at X, the way that I was going to avoid a combinatorial explosion of GLSL was by templating my GLSL construction and kind of gluing bits together. I'll show you how that works. Here's how I construct GLSL programs. I break the GLSL program into a bunch of little facets. Each facet has a collection of declarations, a little chunk of code, and a selection of variable values that it's getting from the external world. Each facet declares these three little pieces. Each of the vertex and fragment shaders have these little templates that are just basically sprint f strings with percent s's for the declarations, each set of declarations in each set of code. There's a geometry primitive facet. So if you want to draw lines, you have a little facet for drawing lines. It tells you how to get X coordinates and construct lines in GLLand. There's a fill style facet. And then I build the GLSL programs that I need to compile as you're executing X. So when you first start up X, if you don't draw anything, you don't compile any GLSL. If you start drawing crazy stuff, you'll get a couple of facets. And if you draw all the operations, like you're running some benchmark, you compile all of these programs. Which works fine. Here it is. I know I'm putting code up on the screen. I'm really sorry. This is a facet that draws rectangles with GLSL 1.30. It has a little the primitive that's coming in from the application attribute. And it has this little thing which computes the position on the line by basically multiplying and scaling the line from an integer to an appropriate floating point value. Pretty simple primitive. You notice my GLSL programs are huge. This one is what, two lines of executable code. I'll show you what it generates. Here's the solid fill facet. This one's really hard. Here's the line of code here. Can I get my cursor over there? Probably. Yeah, not so much. Here it is. The line of code here. It stores a solid color on the screen. This actually turns out to be interesting. This turns out to be slower than fetching tile values from a texture. I think it's like GL is like, wait a minute. That's not complicated enough for me. You get penalized because your program is too simple. We're going to slow you down. This is what I compute as a result. Here's the vertex shader which fetches the vertexes from X format and computes the coordinates. I'm actually using GL instancing for this so that I actually dump X and Y values to the two endpoints and compute the... Why am I using instancing for this? This seems crazy. Less vertex data. That's right. In any case, this computes the GL position XYZW. Oddly, I'm doing 2D graphics and GL has these other dimension that I don't understand. You see a lot of times when I'm setting the Z value to some constant and I pick zero. I don't know if zero is a good Z value. Maybe one is better. They're all nice Z values. It's like living in flat land. It's like, wait a minute. Here's the fragment shader. There is the line of code. There's a foreground color coming in. One of the questions I have is I have this uniform and GL has these uniforms. That's the value the program sets. It's the same for all the executions of this primitive. Uniforms seem to me to be expensive in most GL drivers. I could set it as a vertex attribute. Would that be better? I don't know. Here's the adventures of what I had to do to get Cortex running. Cortex is crazy stuff. It's all encoding dependent. Do you remember non-unicoding coatings? Like when you're running an MS-DOS machine and your quotes and your smart quotes start putting weird punctuation on the screen. Cortex lives in that world. Nobody uses this stuff anymore, but we get benchmarked on it. One of the goals was to have no fallbacks. I tried to do this as simply as possible. What I do is I dump the entire font in one enormous texture on the screen. I store it in one bit per pixel in an 8-bit byte. My fragment shader has to pull out an appropriate byte from the texture and shift in mask to pull out the bit. If you don't have integer operations in your GL library, this is not going to work. Eric, do you have integers that can do this on VC4? Awesome. Integers are good. What doesn't have integers these days, Ehrlich? R300. Okay. Molly does. Okay. Not doing well. Divide is bad. Okay. Line dashing. I talked about this a bit. GL supports only a few fixed dash patterns, so it's like, yeah, I just can't use that at all. X has this unlimited length. I think I want to try encoding 300 or 400 digits of pi in a dash pattern and see if this works. That'd be kind of fun. Oh, I could do the Debian tartan with dash patterns in a wide line. So what I do for this is I actually store the dash pattern in a texture because GL loves textures. I just paint it with X commands into a texture. And then I just fetch one pixel per dash element and figure out whether the pixel needs to be on or off. The vertex shader has to compute the appropriate start coordinate within the texture to figure out where to start the dash pattern. And then it needs to figure if the line is more vertical or more horizontal to figure out which way to interpolate, whether you're interpolating along X or just along Y. Because the dash patterns are actually per pixel. So it's a little bit of computation in the vertex shader. And the fragment shader is really simple. It's like, here's my interpolated dash position. Fetch the pixel. Is it a 1 or a 0? Oh, look, it's a 1. I'll paint the foreground color. That actually goes really fast as a result. So here's what I accelerate these days in core operations. I accelerate field rectangles oddly. Filled spans, which is used for all those legacy grotty old X crazy graphic stuff. Copy area. Here's an interesting problem. So copy area is an X operation which is well-defined when you have an overlapping operation. X says you can do overlapping copies and they work. GL and its infinite wisdom decided to say that oh, if you do overlapping copy operations, you could just light the GP1 fire. It's completely undefined. Don't ever do that. And I thought, oh my god, when I was doing the initial development, I'm going to have to find some way to use the old pieces of the hardware in my GPU to do copy area because it would be horrific to make a temporary intermediate copy of the pixels while doing the copy area. Right? So I spent a bunch of time and wrote a bunch of code and used the GL legacy path to get at the old legacy 2D Blitter in my Intel chip and it worked and it was the same speed as my old 2D code or my 2D acceleration stuff. And I said, good to go. And then somebody said, yeah, but your GPU has those and they work correctly. But if you actually read the GL spec, those are also undefined. So you can't actually use those. Fine. Went back and rewrote it to use a temporary. Bits from the source to a temporary and copy the bits from the temporary back to the bitmap. And it was faster than the old hardware. Yeah, question? So the question is, do you have to do that if the two areas don't overlap? And the answer is what? Yeah, we don't have to do that if the areas don't overlap and we do check. But most of the time, so an overlapping copy area, something you do when you scroll your terminal, you hit return of the terminal, it scrolls up, or when you take a window and slide it around the screen. So if you're doing copies from one PIX map to itself, they're effectively always overlapping because those are the only interesting copies. Is it? Oh, no, no, it's the, there's an NVIDIA extension that you can tell it. Yes. Yeah. So there is an, so if the operations are not overlapping, then you can use a magic NVIDIA extension and say, hey, I promise to not have overlapping operations, can you make sure that things work when I do the copy between the object and itself? And there's an NVIDIA extension that most of our drivers support, and it does a bunch of serialization and synchronization of the hardware to make sure it's actually going to work. So yes, you can actually do that. That was six months ago. Thanks, Eric. I also accelerate an X operation called copy plane that no application uses anymore. Yeah, I actually only accelerate one of those directions and the other ones. So you can, so you have two drawables and you want to pull out, so you want to slice your source and pick out just one bit from every pixel and then use that to paint the destination. Well, so you can do this from deep drawables or shallow drawables. So you can do it with a one bit source or an N-bit source. Well, GL doesn't do this one bit stuff. And so right now I'm doing all the one bit operations with software, which kind of sucks, but nobody uses one bit very much anymore, which means that the copy plane when doing a one to N, it's actually kind of like a put image. So it actually works okay, but it's not really accelerated. The other thing that accelerated, in fact, the first thing that accelerated was dots. Now, dots are an interesting operation not because applications use them, but because they paint just a single pixel and they take an XY coordinate. So there's this huge amount of data flow that you have to get from your application all the way to the rendering engine. And in every other acceleration architecture I've ever built, literally, dots turned out to be way faster and softer than they were in hardware. Because one of the things you have to do to draw a dot is you have to see if it's clipped or not. So you have the window. You want to know if your dot is in the window or not. And that was a very expensive computation. Well, it turns out that my GL hardware does clipping. And so that expense that was usual for acceleration paths went away. The other thing is you had to compute the address within the frame buffer given this XY coordinate. And I used to have to do that in software before I loaded into my accelerator. GL can do the coordinate transformation for me and saving that computation. So that cost from the old acceleration architectures are gone. So I was like, hmm, I wonder if this is actually going to be faster now. Because I was expecting it to be a little slower, a lot slower, egregiously slow. Slow to the point that I would consider using a fallback. But it turns out that because GL is actually a competent graphics API and does all the things that you actually need graphics to do, I'm able to take the coordinates that the application gives me and then copy them to the GPU and say go. And if you draw a lot of dots, like a thousand at a time, it's actually frighteningly fast. And this operation turned out to run at like 200 or 300 billion dots. But no, I don't know what it was. It was an insanely large number. Well, I can find out. I mean, we'll do some demos here in a second. The other thing that GL does is it draws lines. Who do? GL, of course, was started in an era when wireframe graphics was interesting and people did a lot of wireframe stuff because our GPUs didn't do polygons fast enough. And as a result, GL has fairly competent lines. It doesn't have dashed lines that I liked, but we could work around that in our fragment shader. And so I accelerate thin lines and they are frighteningly fast as well. Now the one benchmark that you might see that draws lines is GTK Perf. GTK Perf draws one line per color and then switches colors and draws another line. So GTK Perf, it's not horrible, but GTK Perf isn't scary fast. And I'll show you X11 Perf which draws a thousand lines of the same color and it goes really fast. And then of course I accelerate the text we talked about before. So interesting results. Solid tile and stippled all run at about the same speed. Actually Tiled runs the fastest and I really don't know why. Tiled actually has to fetch texture values for every destination pixel that it's drawing. And yet it seems to consistently run about 5% faster than solid which uses one color for everything. I'd love to know why. The really awesome part is that Glamour accelerates all these corner cases that we've ignored for years and years and years with no effort at all. So all these operations like, oh I have this 15 by 17 stipple with this weird raster up. Before I would have said, yeah throw up your hands. It's going to go really slow. I'm so sorry. I don't want to spend my life writing every special case. With Glamour, because I'm able to use GLSL and this combinatorial way and kind of dynamically program the thing, every operation runs as fast as it possibly can. So there's just literally no special cases anymore. So everything goes at hardware speed. The other thing I learned was having written a lot of assembly language code for my Intel GPU. Writing it in a high level language, it's way better. Never going back. Who remembers writing assembly code on, you know, like your favorite microcontroller? Who remembers the day you got to use a real language? Ever going back to writing thousands of lines of 80-50 assembly code? Never doing that. So that's the main thing I learned there was that A, it's easier to write. And B, because it's easier to write, I'm not afraid to do hard things in it. So this crazy bitmap text fetching stuff that would have been really complicated in assembly language, and in fact generates a lot of assembly language code, that would have been frightening for me to actually try to deliver a reliable version of that and have to keep rewriting it for every new GPU generation. The advantage of using the high level language is poof, it all just happens. Here are some performance results. This may be a little hard for you to read. This is actually every X11 perf test. And it's run on both Glamour and the two Intel acceleration architectures, SNA and UXA. What it is is a long scale plot with the one running right down the middle. So lines pointing to the right are where Glamour is faster, and lines pointing to the left are where one of the acceleration architectures is faster. And I think you see a kind of a fairly good pattern here where Glamour is doing better on most operations. But there are still some standouts on the other side. And I'll discuss those in a second. This is, I pulled this SVG out of a web posting I did about nine months ago, six months ago. And you can actually read it there. It's kind of big. Some people complained about having, I think it's like 500 kilobytes or something. No, bigger than that. Yeah. Your browser gets kind of torqued by that. Some lessons from X11 perf. Large batch operations are awesome. So when you draw a thousand dots, a thousand lines, 80 glyphs, when you draw a thousand rectangles, GL gets up a good head of steam and really plows right on through those primitives. When you draw small operations, GL is a complicated library. And our current Mesa implementation has a lot of steps between me and drawing the first primitive. So the places here where we saw lines going to the left, is that really not going to go backwards? Oh, right. This is an SVG and LibreOffice is like, no! So these lines to the left down here where you see all the blue and green, those are windowing operations. And that's where the test is like resizing a single window. To resize an X window, the computations within the server are trivial. You adjust some clipping lists. But what happens on the screen is you paint two tiny little rectangles of the new color when you resize it. You make it a little bit bigger, you paint a rectangle here and a rectangle here. Painting two rectangles takes almost as long as painting a thousand rectangles. So doing it in software is dramatically faster. The other thing there are some other operations like a few wide line paths, some wide arc paths and polygon paths that end up being just, you know, you're going to paint a couple hundred primitives with GL and small operations that just get totally swamped by the overhead of GL. I would love it if GL had like no overhead and I could just like blast stuff right into the hardware. That's a lot of what SNA does, the SNA Acceleration Architecture. The problem with SNA is that it's coded at the assembly language level and as a result it accelerates very few actual operations. But the operations it does accelerate are scary fast because it doesn't have this large abstraction layer between it and the hardware. So some potential performance ideas to make this even better. One of the ideas I came up with is hey, I've got this queue of stuff heading for the hardware inside of Mesa. It doesn't actually get to the hardware until I tell Mesa, hey, I'd like to send that to the hardware now and flush it out. What if I just kept that queue up inside the X server and noticed that, oh, hey, I'm drawing another couple of rectangles. Oh, look, I drew rectangles last time. Let me stick these new rectangles within the list. That way I would get to amortize the cost of that one render operation across multiple primitives. Where this is going to really help is things like window operations and the all important things. The other thing that I'm thinking about is for applications like GTK Perf, which do want to draw a few primitives in each color, or for applications that are painting a complicated web page where they're filling in little regions of different color and a couple glyphs here and a couple rectangles over here, they're changing these things that I think of as fairly static rapidly. And I'm using the uniform mechanism in GL to pass these into the fragment shaders. What if I pass them in as vertex attributes instead? What if I pass them in as essentially a per primitive attribute instead of a per operation attribute? Would that let me do more batching? Would that let me amortize the cost of the GL operation across more primitives? I don't know. Okay, so that was core X. Then we also wanted to accelerate the render extension. Render was added X in 2000. The reason I added the render extension was because I wanted to have any text that was in an era when displays were about 96 DPI. With my 4K monitor at home now, I don't know if I need anti-aliasing anymore. I'll get that working in a couple months and we'll find out. There were some conflicting goals when we did the render extension in year 2000. Not too many people had competent GPUs. So we wanted to be able to satisfy the 2D rendering requirements in software reasonably efficiently. And we wanted to be able to perform well in existing hardware. After we did the render extension, we discovered that actually using the render extension is an enormous pain for application. So we built a new library on top of that called Cairo. So if you've programmed in Cairo, Cairo was invented to solve the problem of how do I draw graphics using render extension? Oh, and also it works on Windows and Prints. Here's what render does. It has three operands. Essentially there's a mask operand, which is a shape source, which can be a solid color, a gradient in linear gradient or a radio gradient, or it can be an image, of course. And then there's a thing to paint to. And so you take the mask, which kind of clips out a piece of the source and paints it on the surface. Pretty simple. Looks a lot like CoreX, but instead of doing raster ops, you're doing compositing. The other special case we have for masks, is if you compute a mask with just an image, you can actually ship up an alpha mask to the server and paint with that. You can construct geometry on the fly with trapezoids or triangles or rectangles. Or we have a magic special case for the important operation, any alias text. We have an optimization for painting glyphs fast. Those are the operations. You can either do a single rectangle, a bunch of glyphs, or some geometry. Render glyphs are kind of interesting. The client creates basically a bag of images. And you can throw as many glyphs as you want into this bag. And then when you're painting a sequence of glyphs, you can just pull glyphs out of the bags and put them into this big list and paint them on the screen all at once. So the glyphs operation is kind of the batch operation in render. Everything else is very much incremental and very small. And that means that render is really conflicting with OpenGL. Because when you draw a couple of trapezoids, or you draw a single rectangle, you're not taking advantage of any of the batching. So one of the thoughts of doing that batching inside the X-server was to be able to render a little more efficiently, because it has almost no batching at all. Except for glyphs, and glyphs are a pretty useful case. Glamour stores glyphs in a fairly straightforward way. Glyph sets themselves are often bigger than the maximum texture size. If you have a font with 4,000 glyphs for each 40 by 40 pixels, that's a steaming pile of texture space. Applications, it turns out, often draw a single line of text from multiple fonts. So they usually use multiple glyph sets for that. So Glamour couldn't do the same thing for render text that it did for core text. I couldn't just create a giant texture and dump all the glyphs from a glyph set in and say go. So instead what I do is I create a single cache and I just paint the glyphs that I need for this time right into that cache, hand the whole thing to the hardware. The next time the text operation comes in I say oh, I have some of those glyphs in that cache already and I need to add some more. Eventually the cache gets to be painting the glyphs you have on the screen. The other thing that I do is when the cache is full, you think of a cache as something you'd replace a glyph in? No, not with GL. The thing you desperately don't want to do is replace an image that you might be currently using in your texture. So what I do instead is when the cache is full I just say pitch it. I throw the cache away and I allocate a brand new one and start putting glyphs in that. And that works out, the code is really simple the performance is really good and I don't fight the way GL wants me to operate. Mostly I don't stall the GPU when I miss. It's really simple to draw a glyph to make sure the glyph is in the cache and stick the glyph into all the info and that into the list of vertices you're going to pass to the vertex shader. So future render work, that's the operation that I have currently accelerated. There's a pile more render stuff to be done. Last summer we had a GSOC student who promised to work on this and Eric asked if I would work with a student and collaborate with him to make sure this work got done in a sensible fashion and our GSOC student evaporated. So I kind of waited around for GSOC to start and then something happened and I got sidetracked by this stuff. So we need to figure out, render has a lot of complicated sources. In core X you have solid colors, tiles and stipples. In render we have solid colors and tiles or textures and we also have gradients. So we need to figure out what to do with render sources to make it work the way that I did for core stuff. I think it's going to be possible. The gradient computations kind of tricky so your fragment shader may get kind of ugly doing gradients. But I think it's important to make sure I do all the computation on the GPU again. Need to add some compositing, composite code and trapezoids maybe and I hope this is all going to work out. I have some infrastructure that I did for the glyph stuff. It looks like it's going to work out okay but I don't have the code yet so I have no real ideas. Measuring X performance. We have this tool that measures X performance. It's called X11 perf. It kind of misrepresents real applications. Not too many applications draw a thousand lines at once or a thousand rectangles. We used to have this really awesome system called Kyro perf trace that would trace that application, save the trace of the application, you'd replay it and get a good measure of performance of that application. The traces that we have are kind of old and stale. Kyro itself is in kind of flux right now and I don't know exactly how well it's working. The other thing is that the Kyro perf trace replayer only draws off-screen and I have huge performance differences between on-screen and off-screen drawing and I need to be able to do both. We have GTK perf. That's an awesome performance measurement. It shows how fast GTK moves widgets around. Oh that's important. That's kind of useless. In the GL world all your GL games actually have, many of them actually have benchmarking mode built into the game. So here you have a real application that real customers actually really want to play and you have a way to use that game to construct a benchmark. It's like yes! Actual benchmarking. Not so much in 2D land because 2D is mostly fast enough. I'm not really worried about absolute performance but I'm sharing this thermal envelope of my CPU and GPU with all your application execution. So the faster your graphics goes the more time I have to spend doing other stuff or the less battery I consume drawing. So the reason I'm interested in performance in this environment is not because 2D applications aren't fast enough but because every jewel I spend on doing graphics is a jewel I've wasted sucked out of the battery and a jewel your application doesn't get to consume and heating up your lap. GL versus X sometimes it's not great most of GL is kind of like X but sometimes there's a really mismatch. Texture formats. GL has these texture formats. X has pick maps and pictures and windows and visuals. I've got a slide about that. There's a bunch of limitations in glass that really suck and hurt a lot for X. And then there's older hardware. X pick maps define only depth oh I only have 5 minutes left. And GL textures only define the content so they only tell you whether you have red, green, and blue. Maybe it's an alpha only. They don't actually tell you how many bits per each field that you have. And so you have X telling you how many bits you have and GL telling you what the data is in those bits. And in X the way that you can take an arbitrary pick map and say oh now I want you to pretend that those values are RGBA or no I'm done using those as RGBA. I want you to take those same bits and pretend that they're just BGR. GTK does this. Thanks guys. So there's a bunch of that glamour needs to do in order to convert back and forth between these pixel formats as you change how you're using them. Right now we actually occasionally end up making a temporary copy of the image in a new format and hoping things go okay. I really want to be able to use a new extension that GL has image storage and texture what? Texture views. Yeah. Arb texture views. And so I'm hoping to be able to take advantage of those. Right. There's some limitations in Glast 2. Glast 2 is what most of your embedded GPUs use and I really want to be able to run X in those efficiently. It's missing a lot of the stuff that glamour are currently using. X doesn't do triangles. Those three-sided objects. X draws rectangles like real men. The closest thing we have in GL is a quad. And so I use quads all over the place. 32-bit integers. I use those for doing a lot of transformations. Glast doesn't do instance drawing. I use those to pull X objects into coordinates. And there's a bunch of texture format stuff that it's missing as well. Oh, and logic ops. But again, any sane X app is only using copies. So maybe we'll do some more fallbacks there. Older hardware. To be specific, Intel until last year was selling 915 and 945 hardware which is basically 9. It doesn't have vertex shaders. Did you see those vertex shader programs I was writing that do all the complicated on coordinates? Yeah. Not so much on this hardware. So we actually in the current Mesa driver it executes those on the CPU. And they're really slow. Glamour performance in the 915 sucks. Don't do that. I don't really know what to do. I guess we can keep the existing 915 drivers around? We could fix Mesa. Who wants to sign up for that? Oh, I have a customer with a million 915s who will pay me lots of money to make Mesa go faster on that. Unlikely. We could construct a new drawing only driver interface and it's like, yeah, well you're not going to run GL on that. We're going to construct a new API for you. That seems like a winning plan. So that's a real problem. Question. Are the features like vertex shaders used so extensively in GLAMR that you couldn't do it without it? So the question is could we just not use vertex shaders, for instance, which aren't supportive of this old hardware and the answer is no. Because the way GLAMR works is it takes the X coordinates and converts them into the actual polygon coordinates in the vertex shader. So I could, yes, I could construct a new version of GLAMR that worked like the old version of GLAMR and did all those transformations in the CPU. But that would mean essentially duplicating the entire GLAMR stack again. So I'm really not... Eric said you'd ask that question. So the problem with the other 915 driver is it's buggy and it's slow. It's actually slower for these operations than the classic 915 driver. Go for it. Did I mention that one of the solutions is to fix Mesa? Yeah. Have a good time. Are you getting paid for that? Awesome. Who's crap? Thank you. Okay, I wanted to spend a few minutes talking about the mode setting driver, but we don't have time. And I wanted to talk about Zephyr, but we also don't have time for that. But that's okay. Mode setting is in the server. The plan is here to get rid of all of the drivers. I want to show you this next slide because this is kind of awesome. Mode setting saves a bit of code. I'm literally taking 368,235 lines of code in the X server and replacing it with 5,032. I think this is epic even on AJAX's scales. So that's what I have to talk about today. I'll be here the rest of the week if you want to ask me questions. I'm sorry I don't have any more time today, but thanks for coming very much this morning. Have a great time at the rest of LCA.