 Hi, I'm Alyssa Rothenzweig, and this is a talk on the Open Graphics stack. I'm a graphics developer at Collabra, leading the open source panfrost driver for the ARM Mali GPUs, and in my spare time, I'm helping out with the ASAHI Linux project, porting Linux to Apple's M1 processor, and I'm excited to talk about the Open Graphics stack here in the embedded Linux conference. My goal for this talk is to help understand what is involved on the Linux stack for 3D graphics nowadays, because there are quite a number of moving parts, and I hope by the end of this talk you'll understand each of the parts involved and each of the different layers. And I also want to talk about how open source fits into this, and why it matters. So first of all, before we can even talk about open source drivers, or drivers, 3D drivers at all, we have to ask the question, do embedded systems need 3D acceleration? Since this is the embedded Linux conference, the answer is yes, they can, sometimes. If you're talking about an 8-bit microcontroller, no, you're not going to want 3D acceleration for that, but for the more complicated embedded systems that we would be interested in running Linux on, we absolutely can sometimes need 3D acceleration, need GPUs. There's a tension here. On one hand, you have the embedded devices whose life cycles can be up to 20 years out in the field, just working as they should, total stability. On the other hand, you have these very complicated user space graphics drivers, and the closed source drivers have end-of-life, often as soon as five years after the product is released, which is quite a tension, even if it's 10 years, it's no hardware vendor wants to support their driver 20-year-old hardware. With open-source drivers, you don't have to rely on the vendor, and that's one of the big perks for embedded systems that need 3D acceleration. If we convince ourselves that open-source drivers are a necessity, both because the drivers themselves can be updated and also because they will work smoothly with mainline kernels, and allow up-repping kernels, allow using the LTS releases of the kernel, all the things which are greatly complicated, if not prohibited, with various downstream proprietary graphics stacks on Linux. Even if we convince ourselves we need this, we have to ask them, are open-source drivers here, are they ready, can we use them in our products today, can end-users go use them, and the answer is an affirmative yes, and I'm happy to say yes because so many of these open-source graphics talks over the years have been coming soon to you, and this year at all the tops I'm doing, I'm giving a bull's yes, and I feel confident with that answer, and I feel very glad that we've gotten to this point. For any vendor you can think of, whether that be Intel and AMD on the desktop side, Broadcom and ARM on the ARM side, or defaulting to reverse engineering on the ARM side, Qualcomm, Vivalta and Apple, there are credible reverse engineering projects in the latter cases, and credible drivers in all cases, and all open-source, and that's a beautiful thing to me, is that covers every GPU that matters nowadays. Something about imagination, well last I heard, there was a job posting about hiring Linux graphics people, and I said relevant, so the bottom line is that open-source graphics is the status quo, and that's a good thing for us, and it's a good thing for embedded Linux. So what does this open-graphic stack look like? As promised it's rather complicated, but hoping we can break it down. At the very top of the stack you have the application trying to do some high-level rendering. This might be a game, or it might be a desktop, or a web browser. These are sort of the more traditional applications you think of. In the embedded space this might be a user interface of some kind, might be an instance of Weston running for your Weyland compositor on a closed system. Closed is contained, not proprietary, hopefully you can have an open-closed system. At any rate, it's going to talk to a graphics engine, which is going to mediate to the high-level APIs. There are quite a few graphics APIs, but the most common on Linux will be either OpenGL or Vulkan. In both cases, the engine will provide programs to run on the graphics hardware, known as Shaders, and these will pass through the API down to the compiler. The graphics compilers are themselves very complicated stacks, going through multiple representations of the program, multiple essentially compilers within the big compiler, but the net result is going to be a binary that can run on the hardware. Separate from this, rather in parallel with this compiler stack, we also have the commands being sent to the hardware themselves. These can take two different forms. In OpenGL, you're presented with a state machine, which some common infrastructure called gallium has to then deconstruct into much more rigidly defined states with descriptors and more functional-style programming levels, or you can specify those descriptors in those state objects directly, which is how Vulkan would work. In either case, these state objects get passed to the backend driver, which will translate them into commands specific to the hardware, and then pass those commands off into the kernel. A driver in the Linux kernel will then just simply pass those commands off to the hardware, and you have your 3D rendering on screen. In terms of which software does which pieces, the applications and the engines are going to vary tremendously, and there are plenty of open source examples of both. For the hardware, obviously there's no software there, for the kernel, that's obviously the Linux kernel, and then everything else on the slide is happening within the Mesa umbrella project. So let's talk about Mesa. Mesa, at its core, is a user space library for writing graphics drivers and for providing a graphics driver. It implements a number of different APIs, OpenGL, OpenCL, and Vulkan, to name the most important three, and it has tons of common infrastructure used to implement these APIs. Most importantly, it has the gallium infrastructure, as I mentioned. This consists of common code for dealing with the OpenGL and OpenCL APIs, and translating them into something more amenable to hardware drivers, as well as all of the gallium-based drivers themselves. Separately, there's the NUR compiler, where NUR stands for the new immediate representation. Well, it was a new one when we came out with it. And NUR is our common form for shaders in the middle of Mesa, and then of course we have compilers that translate from NUR into the hardware construction sets for all of the different hardware you can think of. And the net result is that Mesa provides the user space component of the driver. The net build will be libgl.so, or an equivalent for Vulkan. And on the front end, Mesa will implement this API, OpenGL or Vulkan, and then in the back end, after doing all of this compiling and all of this translation, it will talk to the kernel using the standard system calls, which are different graphics flavors of IOCTL. Mesa consists of the most complicated part of the 3D graphics Falcon Linux, and it's become a sort of umbrella project. You can pick out several distinct subprojects within the Mesa monorebo. But this is the whole of our user space drivers. I'm a Mesa developer, and I'm quite happy to be in the project. Of course, this is the embedded Linux conference, and the 3D graphics picture is not complete without talking about Linux drivers because Mesa has no direct access to the hardware. Any access that Mesa wants to do to the hardware has to be mediated through an appropriate kernel graphics driver. And this is simply, first, both security and performance concerns and just stability. There was a time when user space display drivers ran the show and the kernel could sit back, and that didn't pan out so well. Ask any X11 developer. So we have very small, well, in the embedded space, we have very small kernel graphics drivers that provide the common interface for the much more complicated, much larger user space Mesa driver. So the kernel drivers are built on a large body of common code. Most importantly, the direct rendering management, DRM layer, and the graphics execution manager, Gem. And these are common libraries within the kernel, essentially, they're within the GPU subsystem used to implement all of our 3D rendering drivers. As an asset, these are also used in conjunction with kernel mode setting to implement display drivers. In the x86 desktop world, 3D rendering and display are intermingled into a single driver because they're implemented on the same hardware. Oftentimes, this is also mixed up with video decoding and video encoding, all on the same thing that consumer would call a GPU. In the ARM world, in the embedded world, we don't see the same sort of cohesion of the part. Instead, you have separate blocks on the system on chip for 3D rendering versus for display versus for video codecs. And as a consequence, you have different drivers for each of these, and there is a sort of plug and play aspect at the system on chip level, which brings some unique challenges into the graphic stack. So my point in this whole digression is saying that this talk is really focused on the 3D rendering side, which is about DRM and GIM, not KMS. So the next segment, I'd like to talk about the bird's eye view of what happens at each of these layers of the stack. I showed you the big picture, but let's go a little deeper, try to understand each layer separately. At the top, again, we have our application and our engine, which are using some API. And these are thinking in terms of drawing triangles and in terms of programmable shaders. Generally, there are two types of shaders, and you need both. The first is the vertex shader, which calculates positions for all of the vertices in your geometry. And the other is the pixel shader or the fragment shader, which calculates the colors. And this is the programming model that most APIs are implementing and that hardware has native support for. It's not perfect. It doesn't map perfectly to all hardware. It's not necessarily the most comfortable programming model, which is why people often will use an engine instead of using these APIs directly. But it is a good trade-off between the requirements of every different piece of hardware and the requirements of every different application, which might want to use the GPU. Of course, there's more to it on the hardware side, because hardware nowadays is not working in terms of this high-level abstraction. Instead, there tends to be a more descriptor-based setup, either literally with descriptors and memory pointers everywhere, in the case of the ARMology GPU, or less literally in terms of state objects that can be bound at will, which is a much more typical design on the varying sizes all the way down to just register rights for user space. The bottom line is, though, there has to be some translation from the OpenGL concepts to what hardware actually wants. And we have our frontends for this. This is Gallium helping with this. Our drivers were helping with this. And you go from, instead of having a GL viewport, GL depth funk calls, you instead have a depth stencil object of viewport state. And you can bind these and unbind these. And this is a much more natural interface for the hardware. And of course, you still draw triangles. The triangle's all the way down the snake. In parallel, we have our compiler, which is organized the same way that any compiler is organized, even for a CPU. You have your source code that gets parsed into an abstract syntax tree. The abstract syntax tree gets translated into some intermediate representation, which can be optimized with common code. And then once all of your common optimizations are applied, you can translate that into something hardware specific, which, again, can be optimized and finally translated into a vinyl binary. We'll talk a bit on why we have so many layers and why that's a good thing. The driver itself is doing all this translation, as mentioned. And it also has the added wrinkle of having to deal with memory allocation, at least for OpenGL. For Open, this is not so clear cut. But really, the user space driver has three roles. Compile shaders, translate states, and manage memory. And if you can get away without doing any memory management, you're down to only 2-thirds of a driver, which is part of the buzz for what can I suppose, pushing that off into the application instead of having to do gymnastics in the driver to infer what an application would have meant to do in order to use the memory resources effectively, which is one of the big challenges with OpenGL. At any rate, all of this is feeding into a kernel, which has really two responsibilities only. One is implementing this memory application. This user space is just making a system called allocate memory when needed. It's up to the kernel driver to actually do the memory allocation, typically by programming a memory management unit for the GPU in order to map chunks of graphics memory for particular use. In the embedded world, we're using unified memory, so the same physical memory can be used on both the CPU and the GPU simultaneously, and with a shared mapping, which can be made very efficiently. And this is just a matter of programming memory management unit in the correct way. Then the other piece is to, of course, actually execute the job, which in the most simple case is a matter of pointing the hardware to a particular place in memory to tell it to start executing commands or start reading descriptors or however it's set up. The other wrinkle here is that there can be many processes rendering at the same time using the same GPU. And if the hardware can only run one process at a time or two processes at a time, it's up to the kernel to schedule how the GPU's time is allocated. And this is, again, a classic kernel responsibility, not just for graphics, but on the CPU, one of these mere common kernel responsibilities mapping memory and scheduling processes to CPUs. So there's some sense to which graphics feels like having an operating system within the operating system, which is a lot of fun for some definition of fun. So with that bird's eye view, we can talk about what Vulkan brings to the table, because there's a lot of buzz about Vulkan, both for high-end games to very low-resource embedded systems that need 3D rendering. And so what is the point of all of this? Ultimately, it's about lowering CPU overhead with the big idea that OpenGL drivers have to do all of these gymnastics to maintain this stack. If you can pull out some of those responsibilities from the driver and make them the responsibility of the application, then the application has the potential to be more efficient, because it has more knowledge about what it's trying to do than the driver, which at best has to fall back to custom heuristics, which, as you know, heuristic in computer science is a euphemism for wrong. So that's kind of the point of Vulkan. The other change is that the GLSL parser is eliminated. Which should excite the security people in the room, because parsing text is one of the most error-prone tasks in terms of subtle buffer overflow type bugs. And remember, the user-based graphics drivers are giant chunks of C code. And at one point, you could say, well, they don't have any privileges, so it's fine. But now in the era of WebGL, there is a real privilege escalation path where driver bugs can turn into security vulnerabilities and take over a system, which we don't want happening. So in theory, getting rid of this parser and using just pre-compiled vendor neutral shaders could be more secure and simply surely makes for simpler drivers. So there's a lot of excitement around Vulkan. It's very clearly an improvement on OpenGL. But there's no need for me to do two separate presentations, one on the OpenGL stack and one on the Vulkan stack, because beyond this surface level, they are essentially identical in terms of what's required from the kernel compiler and so forth. Speaking of, what's the deal with these compilers? Why are there so many levels to them and layers and what's going on here? And this is essentially a response to a very deep tension in our compiler stacks, and especially in the open source compiler stack. And this is a unique challenge for us. On one hand, we want to have separation of concerns. We want to have different pieces of our compiler to be hardware agnostic so we can share them across all of our different pieces of hardware. And believe me, there is very little in common between a modern Intel chip and an old ARM chip. But they share the same compiler in Mesa, at least for a big chunk of it. So we want to be able to do this code sharing. And there is this fundamental tension of how do we have our cake needed to? And the answer is just having turtles all the way down the stack. We have different intermediate representations, different representations of the program for different purposes. We have the representation of the GLSL code itself at the start, which is there to facilitate the parsing and very early error checking. All of your syntax errors are all handled here. So we have one GLSL compiler that can be used essentially identically for every piece of hardware ever. And this is one of the complicated parts of Mesa. So it's a good thing we only need one of them. We have our common intermediate representations, for example, neural representation, being the most important at this point, where the goal is not to represent the original program faithfully. The goal is to be easy to optimize. And so where is the GLSL intermediate representation? It's a tree-based representation, modeling what the source code of the program really looked like, and preserving all the variable names and all of that. Neur is a flat intermediate representation that that's completely static single assignment form. So translating from GLSL to Neur destroys the source level of the program. But it puts it into a form that's much easier to optimize in the same way that translating from Clang's abstract syntax tree into LLVM IR is required for all of the common LLVM ops positions if you have experience on that compiler side. And finally, you have all of the different IRs for the different back ends. For example, I maintain the Bifrost intermediate representation, which models Bifrost and more recently about Hall of Hardware, both GPUs from ARM. And the Bifrost intermediate representation is much more difficult to optimize for doing the sorts of general purpose optimizations that Neur concerns itself with. But it models faithfully all of the different quirks about the ARM Bifrost hardware, which means it's possible to write optimizations on Bifrost intermediate representation that take full advantage of the Bifrost hardware without having to pollute the common code, so to speak. So having this tree of intermediate representations that chain together is really important. And this composability is essential to being able to maintain a compiler stack that can handle multiple different APIs of input and a dozen different instruction sets of output and be able to share code in the middle. And this is something that Mesa does really well. LLVM for the CPU also has the same challenges with the same sort of solution, although LLVM in some respects is very different. For example, what this actually looks like, here's a diagram of all of the different compiler paths that you would take with the Panfrost driver targeting the Mully hardware. In the simple case that I keep talking about, you start with OpenGL, you pass in a GLSL program, that gets parsed into a GLSL abstract syntax tree and translated into Neur. The Neur gets optimized and translated into a Panfrost specific intermediate representation, for example, the Bifrost intermediate representation, and then that gets translated into the hardware specific code, fine. Vulkan is just as simple, except instead of inputting GLSL and translating GLSL to Neur, we input Spurvy and translate Spurvy to Neur, which is a much simpler path. But again, Panfrost is completely agnostic to GLSL and Spurvy because it just sees Neur input and the GLSL and Spurvy front ends are all handled in completely common code, shared across every driver, which means there is a considerable simplification and cost reduction for developing open source drivers in Mesa as opposed to proprietary drivers. The OpenCL path is unfortunately much less nice. You have essentially C code input, OpenCL C code, which means we have to go through Clang and have Clang output LVM and then translate the LVM into Spurvy, which is complicated, some would say it's over complicated, but you really don't wanna write your own C front end if you can help it, especially not a C++ front end because yes, OpenCL C++ is a thing. So it's, you know, it struggles all the way down, but we do get Spurvy at the end of the day and from Spurvy, we can use our common code path to translate into Neur and once we have Neur, well, we can feed that into the back end and our back end again remains completely agnostic. So this is a very nice way of seeing everything compose. Unfortunately, looking at the diagram of every Mesa compile path or at least everyone that I could fit on the screen, there are more, believe me, there are more. It's not so nice, but you can see the same sorts of code paths coming up. You have the common direction going from OpenGL to DLSIL to Neur to the driver to the GPU, but there are other ways to input programs, including a number of legacy front ends that get handled totally transparently to the driver, which allows us to implement legacy OpenGL features with no added work per driver, because it's all just common Mesa code. And there are older drivers that require older intermediate representations, for example, TGSI, which predates Neur, and then there are drivers which really want to do their compiler in LLVM instead of in Neur and will end up translating from Neur to LLVM, and then use an LLVM compiler as opposed to directly implementing a Neur compiler. This is fine. It's not going to come up in the embedded world, but I felt it's important to mention that we do have a diversity of stacks even within the common code or within the common Mesa tree and common code is not imposed with an iron fist. It's there for your own benefit, and it's a lot of benefit. And in total, you get a functional somehow graphic stack compilers with very little code objectively that can target any of these different APIs, any of these different hardware. You can add APIs with very little work because of the joy of common code, and you can maintain this very easily including long term. And this is a very nice value proposition for a very nice way to throw up in source and for Linux. And I'm very thrilled to be part of this and to see this through. So thank you for listening, and I would love to answer questions. Thank you.