 Hallo, alle. Jeg er Erik. Jeg har arbeidt på kollabra på grafikkstjen. Jeg vil snakke om en ikke virkelig hårdere nøyblende projekter. Jeg hadde en accident i dag, og jeg har hitt min hød, så jeg har skjønt min slutt. Og jeg vil, som resultat, gå litt slik. Vi må ikke gjøre det til enden av denne præsentasjonen. Så det er meg. Så ja, det gode her, i genere, er å implementere OpenGL on top of Vulkan, til å make a simpler graphic stack in the future of the Linux desktop. Så det er et par andre atenter på dette uten der det already exists. ThinkSilicon has made something called Glove. It only implements OpenGL ES 2.0, and the CLA requires a copyright assignment there, which can be tricky for some companies to work with. Google has Angle. They again also implement only OpenGL ES. And then there's something called VKGL, which targets OpenGL 3 core profile, so it's not going to do legacy OpenGL versions, no fixed function or anything like that. And this is a pretty slow-moving project. It's a spare-time project for a single developer, and it has a really long way to go before it can be very useful. So for solving this, there's nothing really out there that fits well. Did I do the wrong way? Ja, so I guess I swapped these slides, I'm sorry for that. So why we want to solve this is that OpenGL is a requirement for supporting desktop applications. It's a pretty dated API. It's been around since 1992, and the hardware has changed a lot since then, and so has the software world. So there's a lot of things that OpenGL isn't so well suited for. My work came out of virtualization, so being able to use GPUs in virtual environments. I'm working on Virgil and Mesa for this, and there's some problems with that that we're trying to circumvent by seeing if this venue works or not. Vulkan is also becoming more and more proven technology, and it's pretty clear that it's becoming the dominant graphics API going forward, which means that OpenGL has a little bit of a less bright future, and it's better for the graphics community if we can all kind of work on one API rather than multiple ones. But as I said, we still need to support all applications that use OpenGL currently. Yeah, and there's some other use cases that can be enabled by this work. For instance, mobile platforms that support Vulkan can get full OpenGL support, which they don't currently have. So my solution to this is called Zinc. It's a Mesa gallium driver that takes the gallium API calls and translates them into Vulkan. It's currently in what I would call an early kind of out of tree prototype stage. The driver works reasonably well. It supports OpenGL 3.0 on both Radvi and AMV, so they're open source in Mesa, AMD, and Intel drivers. We haven't tested much on anything else. Some people tried to get it working on Nvidia, but there were some difficulties with the lack of DRI2. Dave Ehrlich has jumped on this and started contributing a lot of really cool features. The driver is written in a pretty naive, happy-go-lucky approach to kind of see what could possibly go wrong. It turns out it works a lot better than I feared. I can run a lot of games and other demo applications with a pretty usable performance. I'm not going to talk much about performance because we haven't focused that much about that, but for some simple benchmarks I get roughly half of what I965 provides, and that's with some pretty not great stalls that we're doing to place the Windows system. Currently I'm taking a little bit of a step back and re-engineering it a little bit because a bunch of the early design decisions I've made turned out to be mistakes. I'm trying to build a smaller feature version of it that I can upstream in Mesa and then build more of the features on top of that again. It's a gallium driver for those who don't know Mesa. Gallium is a system that takes openGL calls and translates them into a more low-level API. Sync will take that and translate that to Vulkan. I'm communicating with some Windows system. This is the software Windows system which is not good where we have a hardware, like a DMA buff based thing as well. Her's a rough overview of how draw calls or the data flow of the driver. For instance, a compiler that feeds a program cache and there's a pipeline cache that takes and random passes and creates frame buffers and feeds that into command buffers. We're doing the shader work with a NUR which is an SSA-based compiler IR in Mesa. I translate that into Spurvy, which is the Vulkan shader IR. I chose to go that direction rather than something like TGSI, mostly because SSA to SSA seem to be a great fit. It turns out to be a little bit harder than I thought because of some annoying differences in how they treat some of the fundamental SSA constructs. I will talk more about that. It's written as a reusable module, hopefully similar to the we currently have a Spurvy to NUR, so it's kind of written in a somewhat similar fashion. I'm hoping that maybe we can make an entry GLSL all the way to Spurvy compiler in Mesa. This isn't an important goal, but it seems neat to be able to do that without pulling in other compilers for maybe we need to pre-compile from shaders, for instance. It does not generate awesome code. It seems to the Vulkan drivers on desktop make up for the bad code we generate by optimizing it afterwards, but I fear that this is not going to fly on mobile. Mobile drivers are probably less eager to spend CPU time on optimizing the code first. I'm going to talk a little bit now about the different difficulties we met. Some of these have solutions, some of them have ugly hacks. So control flow, we don't support it. Pretty bad. This is one of the big ticket things that needs to be fixed before I can upstream it. I have some prototype that crashes and burns in some cases. It's trickier than it sounds, but the reason for this is these SSA differences that I talked about. In our jumps can occur from anywhere inside a basic block, which is a very creative way of defining a basic block, I think. So for instance, yeah, in our jump and return, break, continue, this can all do it, and some intrinsics also have implied control flow modifications in spur of the all of these terminate the basic block, and that means that addressing fine nodes gets pretty hairy. It's probably not super duper hard to solve, but I think I'm going to have to like accept not the direct translation here first and like do something a little bit clever. I haven't really spent much thought about this after kind of hitting this wall. Yeah, another problem is that the SSA values of Norris type less, they're basically just a bucket of bits, which means that we currently just bit cast everything into, and from UNT. This creates a lot of needless instructions in a lot of cases, and yeah, it's not great. Jason Ekstrand, of Intel, has been nice enough to create a prototype for a scanning pass that finds out which casts can be removed and replace the different SSA values with typed versions. It doesn't work for all of this stuff, but it doesn't need to. It only needs to do better than nothing. There still, yeah, some constants are still problematic with that, but I can probably extend it. I haven't tested this pass yet because of time restrictions. Yeah, then another thing that needs to be solved that we're not doing oftenly is how to bind shader resources to the shaders, and for those who don't know Vulkan, the way you address resources or you bind them is through descriptor sets, so you have a set of, like you can have end descriptor sets, and they have kind of indexed resources inside of them. We currently just push all of this into one big descriptor set, which is a pretty easy approach, but it might not be great from performance point of view, especially the Vulkan spec suggests that this is not a good idea. We stole that idea from the XVK, and it seems to work okay, but at some point we should probably look into doing this a bit more properly, and I think making a descriptor set per stage, per shader stage makes this easier, and we will get much smaller indexes there, and that seems good. Yeah, and then we also need to have, as you draw, we need to deal with the descriptor sets. Currently what we do is we create a huge descriptor set pool, and just allocate from there until the allocation fails. Then we flush the GPU and wait for the GPU to finish and reset the pool. That's obviously not a great idea, and it causes some validation errors and some frame rate hiccups. So it's probably better to have multiple smaller pools and keep track of how many descriptors we've used from there, and use some fences to automatically wait if the next thing wasn't finished yet on the GPU. For those who don't know what pipeline objects are there, an object in Vulkan that encapsulates pretty much all of the draw states, these are relatively expensive to create, so we try to cache them and keep them around for the next time. Try to use the same draw state. We currently just do that kind of naively, put them in a hash map, but in the future we want to move to building a non-optimized version of this eagerly and then kind of have a background thread that creates optimized pipelines instead. This is what DXVK does, and it seems to work well for them. If any of the Intel driver people is listening, it would be great if you guys started respecting the disable optimization bit for this plan. This whole thing is similar to variant caching, and it's not really a big question to do here. We need to also deal with image layouts, which is one of the big differences with OpenGL. For those who doesn't know quite what that is, it's a hint to the driver or what you're going to do with an image. You can, for instance, be set in a state where you can only do rendering to it or you can only do texturing from it. Generally speaking, these optimizations or these might have some optimizations on the hardware level where they're faster at being accessed. We just use the big hammer called Layout General and translate to that as early as we can and just keep them there. Every operation is allowed, but the performance isn't necessarily great. This is not a big problem for us yet, mainly because we're mostly CPU-limited, rather than GPU-limited. Of course, it depends on the application, but it has some nasty implications with the race in us with multiple contexts, where the problem is that the resource has different, like the resources can be shared and the context is the only one who kind of sees when the transition happens, so we need to insert some fencing or something there to make that sure we're not doing that. I haven't tried anything that does multiple contexts yet. In addition to this, Angle is doing some really cool work of building something called a frame graph, like you kind of build a timeline of what's happening with things and then issuing things so you can move your image transfers as early as possible, which is supposedly better for performance. I can imagine we can do just steal their ideas there and something, but this I think comes down the line a bit from where we're not now. Uniforms is also kind of a little bit different. We can have freestanding, like default uniforms that are a default uniform block and in Gallium, this is a little bit... They look different from the IR and stuff like that. We basically just do a shader, a compiler pass that transforms them into a uniform block. There's already some stuff in Gallium to do this by default, but we can't use that for some technical reasons yet. There's some difficulties here. It's not really that interesting. I'm going to just go. Yeah, and then one other issue is that we're depending for some OpenGL 3.0 features stuff like that. We depend on some EXT extensions and stuff like VK EXT transform feedback. I'm not so sure we can rely on that forever and probably not at all on mobile. We need to rethink some of these solutions. I kind of can envision some of these stuff being done with a giant computer shader that inline all the different stages apart from the fragment shader into and build some cues. Basically, I've reimplemented the GPU pipeline computer shader. People have done this before with CUDA and it's pretty successful. So it is an option, but maybe it's better to do some more targeted smaller things for simple features where we maybe not fully support the spec, but get applications running. We'll see. There's other extensions that has the same problem. We're currently supporting OpenGL 3.0, but there's a little bit of an asterisk there. There's some features we're not missing. Most of these are fixed function kind of details. Polygon mode is different in Vulkan and OpenGL where in OpenGL you can specify different polygon mode for front and back faces. On Vulkan you can only support one, specify one. We just currently issue a warning if they're different and use the front-face state for that. I haven't seen any applications fail with this, but as we test more applications, I'm sure there's some CAD applications that aren't going to render correctly. There's several emulation paths. We can try something like drawing all the back faces first and then all the front faces and assume the ordering isn't important. We can write out the primitives to buffer and stream it out and use a geometry shader to construct triangles afterwards, maybe. I don't know. It's a pretty low priority issue to fix. Texture borders are a little bit different. In OpenGL you can have arbitrary texture border colors. In Vulkan there's three different ones you can have. Transparent black, opaque black, and opaque white. We just hard-code it as transparent black all the time, and I've only seen synthetic tests give any problems, but it's totally possible to do this by injecting some shader codes. It's not actually that bad to do that, but it's not awesome. If needed, we could implement some extension to support more modes, but I doubt it's going to be important. Point size. In OpenGL you can either write the GL point size output in the vertex shader, or you can fix the point size for the whole draw call. We need to have some, in Vulkan you can only write the vertex shader. We haven't done any forwarding there of that state. It's kind of boring code to write, but it's relatively easy. I think also some of the other drivers that are working to be upstream are going to need something similar, so maybe we can work with them on a shared solution. Alpha testing. This is, in theory, a very simple to implement, but it requires control flow, which we don't support. If we fix the control flow issue, this will go away very quickly. Currently, we support OpenGL 2.1, like as the lowest set of features we can support. It requires Vulkan 1.0 and a bunch of different physical device features, which we don't test for, probably well before we land this driver, but maybe with an override or something so you can run things even if it's not perfect. OpenGL 3.0 requires two more XT extensions. Those are both enabled on RedV and ANV, so on both of those drivers we get 3.0. I had some slides about future versions, but I decided to cut them there in the slide deck if someone wants to download it and read some more. It basically goes through which Vulkan features we're going to require for all the way up to 4.6. In the future, there's a bunch of more stuff to be done. The biggest problem is that the compiler is not as great as it could be. I'm not really a compiler guy, so if someone who knows compilers would like to help out, that would be awesome. Yeah, and then it's like fixing rendering issues and applications and upstreaming in Mesa. Yeah, and after that, it's like cracking away on the new OpenGL versions. Currently, I'm kind of the bottleneck here and I'm really sorry about that for anyone who wants to contribute. I would be very willing to have a discussion on fixing that somehow. So yeah, any questions? Two minutes. Two minutes, so one or two questions, I guess. How valuable would it be to take this project and run it outside of the Mesa framework, I'd say? Actually, if you want to... I know Apple is dropping OpenGL support on their macOS, so it's a matter of time before we need to do something like this. How reliant are you on the existing codebase from Gerym? Or how valuable would it be to ship something like this outside of the freedom of environment? So I think if I were to have a crack at this myself, I think I would have gone the other way around and gotten Mesa to run on macOS, I think. There are builds like CI checks and stuff in Mesa for macOS. I think it works. And something very similar to Zink, I think, would be needed, but that outputs metal instead. There are some people who have... Yes, molten VK is interesting, but I think layering emulation on top of emulation is a way to insanity. Like, if someone wants to try it out, they're free to, and if they report some bugs, that's great. I'll have a look at that, but I don't have a mac. I'm not going to give this a try. All right, any more questions? Quick one. OK, thanks a lot. Also, we're hiring.