 So this talk is basically an incremental up-to-put forward and idea for a new driver infrastructure for 3D graphics. Basically, I've included the first two slides here, and we'll start with that. Basically, it helps to start with a look at where things were previously, or perhaps still are, in terms of open-source 3D graphics drivers. Basically, we've got a thing called Mesa, and it's probably bigger than that block. But the thing that's disproportionate to emphasize the fact that actually it's not that much bigger, that the DRI drivers over the years have gotten larger and larger as they have to deal with more and more aspects of the peculiarities of OpenGL, the peculiarities of our interaction with the X server, with the kernel module, and so forth. And all of these competing influences are sort of boiling away in here, fighting with each other and working to just make it hard to do interesting things with the drivers, making it hard to add features, making it hard to fix bugs, making it hard to get things stable, or ever really done. So I guess over the years, we've written a few drivers, and we've noticed that there are probably some possibilities to cut this thing up and to break it into manageable chunks. And so that's basically the starting point where we've gone. Basically, we've looked at putting interfaces into the middle of the driver, taking a chunk off that's really all about GL, taking a chunk off that's all about the X server, kernel module, that sort of stuff, and leaving this chunk in the middle. They're interesting personally, which does the hardware interaction if it really is the driver. So that's what we ended up with. This is what we set out to pursue. And this talk will be a bit of a status of where we're at with it. So we've introduced a couple of new things here, first of the state tracker concept. Basically, that state tracker looks a lot like an old-fashioned base of driver, but rather than targeting the particularities of hardware, it's targeting this interface here, which is really the interface to Gallium 3D. So that's the clue that takes you from Mesa to an old world of Mesa to just sort of a newer, cleaner set of interfaces. And the second interface is this guy here, which we're calling at this point the Winstess interface. It's the back-end interface that abstracts away things like quick breaks, command submission, any sort of hardware interaction, talking to the kernel module, all those things are sort of behind that interface. And this interface really looks a lot like a batch buffer submission interface, and perhaps a memory manager interface. So it's memory management on the card and command submission. And that's kind of it. And the rest of the task of turning that into a working driver to the particular environment that you care about, be that TRI or something else, is all pushed into this module back here, the Winstess. And that diagram is still pretty much correct. There's been evolution, but we've been pursuing this for a little while now, and it's looking like it's a good design. So since then, so those interfaces have evolved. We made our first stab at things, and there have been people out there who've been tracking this work. And I feel a pain, because it's been hard enough for us to keep up with it. And if you were doing this as a spare time thing, I think this was. If you're doing this as a spare time thing, you probably find that you've missed most of your time just dealing with the interface changes that occurred over the last few months. That's kind of hard to avoid that, because it's hard to jump right in with the right interface from the start. We've evolved as issues have come up, and it's inevitable. But hopefully now we're moving to a point where things are stabilizing. We've got a pretty good idea of what we're doing with these interfaces. There's some things on the horizon, but I'm hoping that these guys' pain will be drawing to an end. But we're getting there, basically. So that's all for the pouring. OK, so the thing that you'll miss from this slide, this is my view of the driver. It's all about drawing. So some of the people care about allocating surfaces and doing swap numbers and things like that. All the stuff that GLX handles. GLX has just kind of off the horizon here. So we've had probably the big change to this diagram is that there's a line going down through here that handles those things. And this really looks like a 3D rendering context. So the bit that's missing really corresponds to a per-screen driver. So this plan is very good for drawing context. It pretty much ignores how you create a context and what environment those contexts live in. And that's important for a couple of reasons. Most, I think, the key that trips us up is the fact that there is sharing of surfaces and various other data structures between contexts. And to really achieve that properly, you do need some sort of external persistent entity that manages that. So that was kind of missing from my worldview and just kind of deliberate, just to get to the core of what we really wanted to achieve. But I think that in reality it has asserted itself. And we've got to deal with that a little bit. OK, with development, with built-in hardware drivers running, they're all in various states of completeness. Basically, the 915 has been our workforce. It's a simple piece of hardware. It's well-behaved. We love it. And it's been kind of the mule that we've built this thing on the back of. Software type is the software rasterizer. And so the status, these are pretty much in the order of done, I guess. So 915 is, you know, it runs by itself. They produce some work. You know, it's there. The biggest problem with it is that it's probably impossible for anybody in this room to build. Basically, at the time we started this work, we all had test machines that had 915 running on a fine. We just kept them in that state. Now nobody knows how to recreate them. They've all got various odd kernel modules, X drivers, patches, and things like that that just happened to work at that point. There's no way that I could give you a recipe to build that driver. And that's something that we're going to fix pretty shortly. So interesting stuff has happened on the head on the trunk since this time, since the time that we started that work. So DRI2 has landed. What else is there? Evolution in the memory manager, et cetera. And it's going to be very interesting just to get this driver into that environment and bring it up to speed and see what the world looks like with that thing. So SoftBike is what's called the software rust driver. That's probably, really it's the most complete of the drivers. I guess it's number two because it's not as interesting as the hardware drive in most people. It needs some performance work. It'd be a very interesting project if anybody wanted to take it on. Basically, if you've got an interest in runtime per generation or doing cool stuff that people will love you for, SoftBike has this opportunity. Then, and okay, we spent a bit of time on the software rust with that leader. And there's a fairly comfortable 965 driver out there that, again, if you didn't like the 915 driver, the fact that you can't build it, you won't like the 965 because it's enormous. And then it's impressive that throughout this turmoil and the traffic development process that some people out there have even tried to track it in places, at least two groups have done that. So, on the front row, there's been an enterprise in Nouveau, I believe, to take the 914 driver on the local internet. And Jerem has put some effort into the hardware drivers as well. So, software rust drivers are talked about. You get some buzzwords like LLVM, which is a really cool compiler architecture that's useful for runtime code generation. We have code that compiles fragment shaders dynamically using LLVM to really finish the project. You want to compile a lot more than just the fragment shader, you want to integrate that dynamic compilation process with something that looked at drawing a whole triangle or a bunch of triangles and code-different the whole lot. And if you took that on, you'd probably be rewarded with a very capable software rasterizer. That's a task for somebody. I don't think it's not really on the horizon to do that as cool as it is. 915 doesn't practice, obviously, no. We're going to fix that. We're going to make it fast. Yes, and this is the other thing, too. Part of this process, if you go back to this diagram, here we are. If you think about dependencies, this piece here, which is really the driver now, that really, the only thing that really knows about or has any knowledge of is how the drives the hardware, it really doesn't know anything about OpenGL. It doesn't know anything about the DRI, it doesn't know anything about window existence, generally. The nice thing about that is, well, two nice things, one of them is it's very straightforward to write one of these. You don't have to think about that stuff while you're working on this code. But as a side benefit, it means that that code can run in a whole lot of different places. You can't do that by swapping out the other components. So the thing that we have done, and this is TGS and Business, we like to pay our wages, this thing, we've got it running on Windows. And it works. This thing actually exists in some sense to the DX9 driver. What can I say, you don't care, but it's helpful for us. The point is, it does that. It'll do whatever you like. Whatever you care about, whatever obscure set of API and operating system and window system that you've got an interest in, you have the possibility to put these drivers to that environment in a manageable way. It's feasible. And this is something that really wasn't possible with... vaguely possible with the DRI drivers. A lot of work, you probably would have ended up an unsupported fork with the DRI drivers. This is a new thing. So the point is that if you care about portability, this is a win for you. And you don't even need hardware. Again, this is probably not that applicable to everybody here. But if you're... If you're trying to run a driver, you don't... The driver now no longer actually talks to hardware. It talks through a very stereotype interface that you can put whatever you like behind. So if you want to analyze the hardware in directions by dumping files, if you've got a simulator that runs in a batch sort of a way, if you just don't have the hardware to test the driver, to change whatever, that's possible. And this works as well. So probably the most interesting thing to do with this is capture and replay of the... replay. So if you're capturing files then you probably also want to be able to replay them. So that would be a useful thing to tack on at some point. I'm ripping through it. Okay. Okay, so probably the sexiest thing that's going on in Mesa with this architecture right now is the cell driver. We... What have we done? Again, this goes out there. It exists. You can pull it down. If you've got a PS3 you can install the next one. I'm sure you know that. This is a project to actually bring 3D graphics to your PS3. It's possible. So, in fact, so Brian Paul has primarily been working on this. He's really taking away at it for a little while now. It got the first triangle in December. It's... And by that first triangle rasterization is being done on the SPUs. FedEx processing currently being done on the PPUs. So we're really treating the system like a regular computer with a CPU here and this sort of very odd graphics card. Rasterization of a graphics card here. That's a useful way to start bringing it up. I think the plan ultimately is to really do everything on the end use. So FedEx processing and so on. But the initial task is rasterization. So let's now run simple method demos. Basic text string is working. The main different parts are runtime compilation of the fragment shaders. So interestingly, I probably didn't get a lot of characteristics of this interface here. The extraction that makes the hardware is to throw... One of the characteristics of the extraction it makes is to throw away the fixed function view of the world. So lights and matrices and texture environment. All the sort of classic GL stuff is being thrown away. Basically the hardware looks like it's in state. There's a vertex program and a fragment program. And we have a pretty expressive language for describing those programs. And between them it's possible to implement all of the plastic open GL transformation, lighting, coating, texturing etc. through these fragment and vertex programs. This is a very clean way of describing graphics state. It means that instead of dealing with icon concepts, like lights and how do you optimize a light and how do you code generate a light which is an odd just doesn't make sense. You're dealing with sort of sensible things. How do I code generate a program at runtime? I have a program I want to compile into at runtime. That's easy for people to understand. It means you don't have to be a graphic expert to work on the performance part of the driver. You have to be a compiler etc. Is that better? Yeah. So what's missing currently is that dynamic compilation set. So we've got a couple of hard coded shaders that we've linked between. It's enough to get simple lighting and shading and basic texturing working. It will get done. But if anybody's interested, it would be a fun project as well if you've got an interest in any of this stuff. So that's what the first triangle looked like back in December. I thought we'd better have a triangle. It seems this is how if you do a lot of graphics, this is your light. Basically, by the time you get to this and you're getting there. The nice thing about this is I'm trying to color these squares slightly according to the SPU that's doing the work. We can see they really do join up pretty nicely. That's pretty much still how it's working. It packages up, bits of rasterization work, palms map is used, get the results back that's the job done. Probably going forward there are a lot of ways to optimize that. There's probably two or three different strategies. I guess the thing about the cell chip is that it has no hardware architecture. You're not forced into implementing any particular graphics pipeline on that chip. You have a possibility to choose your pipeline. Most of these drivers, the software drivers emulate or model after some set of behaviors that's been prototyped in hardware or developed in hardware because those guys have thought of that a lot more recently than anybody doing software rasterization. In terms of the cell driver, you've got choices to whether you model the third mode like a good one probably the 915 in zone rendering mode if that means anything or whether you model something that's a bit more dynamic in the way that packages work out like a modern NVIDIA or ATI GPU. There's no constraints on you because the hardware is a completely general purpose computation and a very very strong one that really if you think about it the latest IBM revision of this chip is pushing I believe six gigahertz as you compare so GPUs have got a lot of flaws but they don't really push a very high pop rate they pop a max out about a gigahertz I think there's a possibility that a well optimized cell driver could be a pretty competent mid-range GPU especially if you think about the fact that the cell's been out for a few years now if you're going to compare it against GPU hardware, you can probably compare it against the hardware that's been released at the same time that the cell came out. I think in that co-board of GPUs you probably find that a well optimized cell driver is actually a contender but we have to get that first and that's happening so I guess if I've transitioned to some of the things we've learned along the way if we go back to the original driver I gave a pretty rosy view of that simplified picture of what's missing is this large other chunk of code sitting out there that basically implements a second set of drawing parts which are used on fallback so fallback is obviously some set of GL states that we don't really know how to program the driver the hardware to do and basically the strategy that we've used for dealing with that is to say well, we can't do it let's get the software rasterized to draw it and that's pretty unsatisfactory you get a correct image you satisfy the components test that people assume the driver isn't working because it takes seconds or minutes to render a frame and that's not what people are really looking for in their drivers in fact if you get the option most people will turn fallbacks off and accept minor rendering errors but to have a usable application so that's jumping ahead a bit so in terms of Galleon we did have, and this impacts one little bit we had a strategy for fallbacks that basically was sketched out and never fully implemented and that was basically along the same lines and it sounds like I'm saying this will never happen it could still but I guess I'd like to think there was a better way than just coming up with another fallback part that used the same trick of the software us so fallbacks are bad, we don't like them can we do better? I think we probably can we've got hardware that's Turing complete ish well that's nice we've got smart we've got hardware that's flexible, we've got a nice layered interface we can probably come up with ways of getting the rasterization that we need even though it doesn't obviously map on the hardware so things that were typically fallbacks in the part that anti-aliased points in GL there's interesting rules for how these are supposed to be drawn that don't map well or don't map natively onto the facilities that are provided in regular GPUs GPUs are really optimized to draw triangles to apply a shader to those triangles to stick the result out to memory it's really hard to map the round AA points onto triangles natively without playing with text and that's what we do we play the state behind the scenes so if you can imagine somebody drawing an anti-aliased point applying to that a particular shader in a particular set of states basically what we end up doing is writing a second shader that effectively gives you the texture we'll end up drawing a quad with a second shader that discards all of the pixels that shouldn't be in the round point and applies the anti-aliased full of coverage coverage stuff to that shape and composing that shader with the shader the application that's actually provided so we'll glue these two shaders together in the state we track it basically outside of the driver there's a piece of code that's written once and works on any piece of hardware that basically constructs a new GL state as the desired application state and what we need to do to implement AA points and then just sends it down to the hardware driver the driver never knew it happened the driver just finds this being asked to draw a quad with a particular shader and eats it up and you get an AA point there's no fallback generally that's a better approval what we're trying to do more of and the goal with the 915 and the 950 is a bit more classic because it does have some of these wards and it does have fallbacks in the current driver and our goal in the sort of median term is to basically demonstrate this concept by putting together a 915 driver that has no fallback will render anything in GL one point whatever without resorting to a software hardware okay this is the thing that I mentioned earlier the lovely diagram admits any sort of information about how GLX is supposed to be implemented this was obvious from the start because SWAP was the GLX concept and we ended up with this sort of weird layer breaking jump to there and it really lays that break it's just that the components that implement that jump were missing or absent from the diagram and similarly other things like creative surfaces etc really need a little bit more to be correct so we've ended up so that wins this thing this is basically the subject probably most of the evolution is happening here and that's a little bit surprising because of all these boxes this is actually a very small piece of code this is 3000 lines of code I think in the current repository but what you'll see if you look at it is just a box of interfaces there's five or six so coming into here there's actually two interfaces here so one two the regular DRI driver interface three then it's talking to the DRM and to the the EDX there's five interfaces jammed into 3000 lines of code and basically for this to be correct it's at this little bit there's a line through there above the line we have per context stuff so basically the things that the command processing, drawing et cetera stuff below the line persistence per screen object that handles service management context creation that sort of stuff okay so to illustrate how much is in there and how surprisingly complex it is that's it kind of broken out into its components and I think this is probably closer to the truth as to how this is going to be estimated going forward I think we'll probably see these three parts as the winters get split off into their own little modules it's not as nice that it works better and the nice thing about that is so with that previous with this everything in one place situation you end up with the same problem that we had with Mesa drivers as a whole to create new ones of these the new