 All right, let's get started here. Mike Clark says it's one minute after. So hi. Welcome to ELC, second day. So I'm going to be talking today a little bit about the new Vulkan Graphics API. For those of you who don't know who I am, my name is Jason Eckstrand. I work on the open source 3D driver team at Intel as part of Intel's open source technology center. And we develop OpenGL, OpenGL ES, and Vulkan drivers for Intel hardware that supports, well, depending on your API version, all the way back to the original Intel integrated graphics. So today we're going to be talking a little bit about Vulkan. First, we're going to answer the question, what is Vulkan? So Vulkan is a new 3D rendering and compute API from the Kronos group. The Kronos group is the cross industry consortium that develops other APIs, such as OpenGL, OpenCL, OpenGL ES, OpenMax for media, and a variety of other things. WebGL is another one. And so they develop a bunch of different APIs that are available, usually cross platform and cross hardware. So OpenGL, for instance, is available on Windows, Mac, Linux, BSD, whatever you want, on all the hardware vendors. So the Kronos group focuses on making these APIs in such a way that they're portable cross platforms. Vulkan in particular is a completely designed 3D API. It is not OpenGL++. So OpenGL has a lot of concepts in it. Vulkan is a very different API, even the way that you call functions looks different in Vulkan. It's been specifically designed for modern GPUs and software. So OpenGL is a little bit crafty. We'll talk about that. Vulkan is very much designed to take advantage of more modern programming practices, modern GPUs, modern CPUs, et cetera. It's also designed both through desktop and embedded use cases. So one of the things that we've had for a while now is we've had two different GL-type APIs. We have OpenGL for the desktop world, and we have OpenGL ES for the embedded world. And if you're on desktop, Intel has supported OpenGL ES on Linux. But AMD, I noticed not, and I don't think Nvidia does. In the mobile world, you usually don't get desktop GL. So if you're going to write a GL application and you want to be able to target both mobile slash embedded and desktop, you have to be able to run against two different APIs. Vulkan, we have one API that targets both use cases. Vulkan, even though it's a brand new API, is designed to specifically target currently shipping hardware. So the hardware level target is anything that should be able to do that can do GL ES 3.1 should be able to do Vulkan. There are different APIs, and there are some slight differences that make that not entirely true. But that's the general idea. So a little bit of history. OpenGL, 1.0 was released by SGI in January of 1992. It was based on the proprietary IrisGL API, which simply stood for Iris Graphics Library. It was a heavily state machine-based API. So this made sense because hardware was a state machine. And so having a state machine-based API really very well matched the hardware of the day. It didn't have any real Windows system story. So how do you actually interact with the Windows system? Well, SGI didn't really want to deal with that. They just left it open to whatever platform. So the original IrisGL contained a lot of stuff that OpenGL doesn't. It contained 3D drawing. It contained 2D drawing. It contained a Windows system interaction stuff. I think it even contained input. So there was a lot of stuff in this library, and they wanted to make the 3D part of it, which was really great, useful for everybody. So they just stripped everything else out, and they just said, well, your system will provide you with the other bits that you need. And they just kind of left that up. So the result is that we have, on Windows, you have WGL on X11. You add GLX on Mac. You have, I don't remember exactly what it's called, but you have some other different way of getting at the OpenGL context. And there was nothing unified to it at all. In 2003, OpenGL ES 1.0 came out. This was based on 1.4, but was designed for embedded applications. So there's a few different things that are different. One is that they removed some of the older OpenGL concepts, tightened things up. They added some restrictions to the way you did texture uploads and things so that GLES drivers did not have to implement quite as much stuff as the desktop GL driver did. You could keep the driver a little bit smaller. They also added support for using fixed point types in a bunch of input areas where OpenGL previously required floats. Because in a lot of embedded applications, especially in 2003 and even today, you're still going to cross chips that don't actually have hardware floating point support. And so this provided a little bit of a nicer graphics API in those scenarios. It also brought with it a unified Windows system layer called EGL. And EGL, in part of the reason, again, was because it's decided for the embedded world where you didn't necessarily want to be running X11. And so the platform vendor would provide you with an EGL implementation, then you would use everything on top of that. EGL has since become the standard way of getting a hold of an OpenGL context across Windows, Mac, Linux, with X11, or also in more of an embedded scenario. And this has really tightened up that story quite a bit. In 2007, OpenGL ES2 came out. And that brought with it a fully programmable pipeline. So OpenGL ES1 and OpenGL 1 through 1.4 were entirely restricted to completely fixed function stuff. You couldn't do shaders. You couldn't really change it beyond the buttons and knobs that OpenGL gave you. ES2 and GL3 brought really a fully programmable pipeline. Actually, the programmability came to desktop GL in GL2. One of the things they did with OpenGL ES2 that has never been done with any other OpenGL anything API is that they did a complete backwards compatibility break with ES1. And the reason why they did this is that they wanted to update it to if they're going to go modern, they wanted to go a modern all the way. And so now you have vortex buffers and everything of shaders. There is no fixed function stuff, which, again, in the embedded world is important because it makes your driver smaller. Desktop GL vendors have to carry a lot of extra driver code just to handle all of the old 1.0 stuff. And this allowed for some reductions. As a final data point, ES3.2 was in August 2015. There's two other reasons between 2.0 and 3.2. And so that's kind of where we are today. So OpenGL is 25-year-old API. And for an API, 25 years old is kind of ancient. I mean, there are some other ones that have stuck around that long, P-threads, for instance. But it's really hard for an API to adapt to a changing scenario over the course of 25 years. And OpenGL has done a really good job. Unfortunately, not everything has actually stood the test of time. So for one thing, I mentioned that OpenGL is a state machine API that made a lot of sense back then, but it doesn't really match any modern programming paradigms that people have, which makes it harder to learn. It makes it harder to work with. And it has some other negative implications. Another thing is that OpenGL state is entirely tied to a single on-screen context. This has a lot of negative implications when it comes to multithreading, because you now have a singleton in you thing that you can't thread around. It also hides absolutely everything the GPU is doing. So back in the old OpenGL 1.0 days, you didn't know what the GPU hardware would look like at all. For all you knew, they were using some crazy 23-bit floating point format. And OpenGL intentionally hid everything so that they could abstract it across different hardware platforms, but this comes at a cost of not really knowing what the driver is doing under the hood. Thing is, all of these things, as much as they don't make sense today, they made sense in 1992. But there's a lot that has changed. So for one thing, multithreading is now commonplace. You don't want to have a big, fat singleton context in a multithreader program. It just doesn't work. Back in 1992, you didn't have threads. I mean, you had threads in the operating system, but you only really had multithreaded scenarios in big giant compute machines. Nobody really thought much about doing multithreading on a desktop, and they certainly didn't think about doing a multithreaded UI application. And this state machine singleton context doesn't thread at all. Also today, off-screen rendering is a thing. So a lot of applications, whether it's doing reflections or just because they want to do some bit of image processing on the GPU, they want to do it off-screen. Well, why in 2017 do I still have to go talk to X11 to get an OpenGL context? Well, because the API is kind of old. And it's not entirely true that you have to talk to X11 to get a context. There are ways via EGL with, I think, surfaceless and a couple of other things that you can finally start to get a context without X11. But that's only been in the last couple of years, even though off-screen rendering has been in the last 10. Another thing is that GPU hardware is a lot more standardized today. So back then, it was just kind of a grab bag. And different bits of hardware accelerated different parts of the pipeline, and not everybody actually accelerated everything. Lots of stuff was still done in software. But today, in a lot of ways, GPU hardware looks fairly similar, which means two things. One is that we don't need to hide everything anymore. So for instance, nobody has 23-bit floating-point texture formats. It doesn't exist. Everybody has a fairly standard set of texture formats. You know what they are. And so you don't need all of these conversion steps inside the driver. You can just give the driver the data in a format it can use. So we don't really need to hide everything. And the other thing is app developers don't want us to hide everything. Because every single time we hide something, it means there's a potential performance issue inside of the driver, inside of the hardware, that they can't see. Because they just see this opaque API. And all they know is that when they call gl-tech-sub-image2d, the API does something and the pixels get where they're supposed to be. But they have no idea whether the particular way in which they're calling it happens to stall the GPU, or does a bunch of CPU-side format conversion, or whatever else. And they don't want these kinds of hidden performance problems. So OpenGL is adopted as well as it can. There's a lot of extensions to help with the stuff. Like I said, there's ways with each EL you can get an off-screen context now. Although it's a bit difficult. You can run theoretically those multi-threaded OpenGL, although it's kind of buggy everywhere. There's a bunch of stuff that they've done to try and standardize texture formats and make things a bit more explicit. But there's still kind of a limit as to what we can do with the 25-year-old API. So Vulkan takes a completely different approach. For one thing, Vulkan is an object-based API with no global state. In OpenGL, every single time you want to do anything, you bind something, and then you mutate the thing that's currently bound, and then you bind something else, and then you mutate the thing that's currently bound. Vulkan is object-based. So you actually have handles to different objects, and you can work directly on them. Also, those objects are not needed to handle into a hash table. They're an actual pointer now, so you avoid that whole indirection. Most of the objects are actually immutable, which is another thing that we have learned over the years with programming, is that mutable things are generally bad. If you can avoid it, it's usually a good thing to do. So with GL, everything you do is mutable. You create a texture. You can upload some different MIP levels, and then you can decide, I want to make the texture bigger and just start changing bits of the texture at will. With Vulkan, you create an image upfront. It's immutable, and this solves a lot of the threading problems because you aren't mutating things all over the place. You just create them. It does have some state concepts, but they're entirely localized to what's called a command buffer, which is the way that you record commands to execute on the GPU. And the command buffers are also separate. They're recorded potentially in parallel. And working on them is re-entrant in the sense that if you're working on two different command buffers from different pools, you can do it multi-threaded without any problems. And the state is tied to the command buffer. So you get rid of this entire global concept. Another thing is that Windows system integration, or WSI, is an extension of Vulkan and not the other way around. So in the OpenGL world, first you go talk to X, and you say, get me a window. Great. Then you go to, say, please get me a context on that window. And then you go from there. With Vulkan, instead you go to the Vulkan and say, get me a Vulkan device. And then you start walking around to your Windows system asking, can you actually present on this? Which means that Vulkan is headless by default. As soon as you have a driver installed, you can do headless rendering. It also provides a bit more flexibility in terms of how you think about your Vulkan application. It's far more explicit about what the GPU is doing. So texture formats, memory management, synchronization, a lot of that stuff is now client-controlled. And we'll get into some of these things a bit more. But we still hide some stuff. Because Intel has an implementation, and Vidya has an implementation, Qualcomm has an implementation. And not everybody's hardware is the same. Not everybody's driver is the same. And so there's still things that we have to hide. But we try to hide a lot less, and kind of as little as possible, to give application developers a lot more visibility and transparency into what's going on. Another thing is that Vulkan drivers do no area checking. We'll get back to this. This is actually a fairly important point for anybody who's worried about CPU constraints. All right, so I'm gonna focus on four things today. Pipeline objects, render passes, multi-fitting and synchronization, and area handling. So we're gonna sort of go through these one at a time and explain how Vulkan is sort of a different way of thinking about graphics than OpenGL is, and how that can help, especially with CPU overhead and with more predictability to the API. So let's start off with pipelines. This extremely complicated picture is the diagram for everything that's in the Vulkan pipeline. I stole this straight from the spec, and there's a lot of stuff in here, but I'm just gonna point out a few things. So all of the pink boxes are sort of what we call fixed function stages. These are bits of the pipeline that do a single job, and you just sort of change it by a few settings here and there to control exactly what it does. The orange boxes are programmable stages. So these are your shaders. You get inputs from the fixed function stages, you do some calculations, you dump it out, and it goes into another fixed function stage. The blue and green boxes are all different forms of memory that are interacting with the API. So there's a lot going on here. You have these fixed function bits, you have these programmable bits, and all links together, and this is a pipeline. The Vulkan pipeline object describes all this stuff. But I wanna talk about a specific issue that happens a lot in graphics that Vulkan pipelines are designed to solve. So let's start off by looking at a shader. This is a Vortex shader. It's a very simple Vortex shader. All it does is it has a texture coordinate and a Vortex as inputs, it has a matrix, and it passes the texture coordinate directly through, and it multiplies the matrix by the Vortex position and writes that to GL position. Okay, very simple Vortex shader. You can imagine seeing this in an application. But there's a few questions that should come to mind. You have these inputs, where do they come from? Well, that's a good question. You have outputs, where do they go? You just wrote to some variables. How does the data get out of the shader and into that? How does it get from the previous thing and into the inputs? What happens sort of between the stages? And this is something that isn't defined by Vulkan. It's something that's supposedly hardware specific. But it really affects how the shader's work and how you have to think about them. So for instance, to give you kind of an example in C, everybody's first C program is Hello World. So you type include stdio.h and then you do main and then you do some magic things called arguments and then you do printf Hello World and hopefully we remember the backslash n and you run it and it prints Hello World at the terminal. Okay, so you wrote five lines of code. It prints Hello World to your terminal. What actually happens to get Hello World to the terminal? That printf call does an amazing amount of stuff. You have piles of stuff that's happening in your standard library and you have other things that happen before your main function ever gets executed to pass the arguments into it. It's the same kind of thing with GL shaders where you have these inputs and outputs but it's not actually defined by GL how they get there. It's just, it happens. So it's all implementation dependent and one of the issues that you run into is that a lot of times these fixed function stages that are supposedly hardware are actually implemented in the state shaders. So for instance, one example is Vortex Fetch. So AMD hardware does not have a fixed function Vortex Fetch. Instead, they simply do a memory load inside the shader, convert from, and then they emit shader code to convert from whatever bytes are in that buffer into the floating point values that you want in the shader and off you go. They don't actually have a piece of hardware to do that. Another example is color blending. A lot of the mobile platforms do the color blending directly in the shader. So when you write in the fragment shader and they need to do blending, they read the render target into the shader, do the color blending and write it back out. And there's other things like alpha testing. There's more things. The point is, each one of these things requires some sort of bit of code in the shader that's not controlled by the shader that you typed out in your program. It's something that's inserted by the driver at the time that it compiles a shader in order to be able to do all of this stuff. And these bits are not controlled by your shader code, they're controlled by bits of state. So exactly what the final shader binary that you're running looks like depends on both the code and the GL state at the time. So here's what can happen. You're gonna do some rendering. So you call GLDrawRays. So you've set up some open GL state, you call GLDrawRays. And what happens? So it examines the currently bound shaders and there's horse code. It examines various bits of context state. And then it suddenly decides it needs to spend 100 milliseconds compiling a new shader. And you just missed VBlank. And your app now stutters and you have a big problem. Users hate it when they see jitter in their apps. And shader, this is what we call shader recompiles where the compiler, when the driver suddenly decides that for some reason it needs a new shader and it goes off and it compiles it. And if your driver is using a big heavy duty compiler like LLVM, it can have to spin up quite a bit of code in order to compile a new shader. And this causes a huge source of unpredictability in the API because you suddenly have these recompiles that you can't predict as an app developer. The best you can do, you can provide your shaders upfront, you can compile them, but you don't have any visibility as an application developer as to what bits of API state are going to affect the final shader source code. And you have no real way to predict this. There's some tricks you can do, but they all have their limits. So Vulkan has a different solution. It has a single monolithic pipeline object called BK pipeline that describes everything. The entire picture I showed you, all of that is described in new pipeline object. This contains all of the shaders for all of your stages, vertex, fragment, geometry, whatever else. It contains linkage information, such as information about the different locations and formats of your vertex inputs. It contains information about your render targets and any other sort of cross-stage information that it may need to figure out how to get inputs in and out. All of that's in the object. It also contains information about your textures and UBOs. It contains most of the state that you will need. So it contains all of your blending state. It contains your depth stencil state. It contains various other bits, like some stuff about what kind of primitives you're using and things like that. And so you have all of this state in a single object. The point of this is that because you have all of your state in a single object, you have absolutely everything that the driver needs in order to be able to compile the shader. So when you compile a pipeline, it compiles the shader and that's the actual shader that you're going to use and you don't have this weak compile problem. So one question is, isn't this far less flexible? Because now I have to create all these pipeline objects and they contain piles and piles and piles of information. How do I deal with this? You have to provide a lot more data upfront. Many pipelines have to be created push shader. So one of the complaints we got originally as Vulkan was in development was we had some engine vendors say that I'm compiling like 40,000 pipelines when I start my game because they have all these different possible state combinations that they had to deal with. And this is a real issue, but there's some advantages that it comes with. And one is that, like I said, you have everything that you need to compile shaders. Also, this problem of what if I have 40,000 pipelines? Well, we have a solution to that and it's called the VK pipeline cache. And what you do is you create a pipeline cache and then you create all of your pipelines linking to that cache and the driver can use that to then share as much data between pipelines as possible. So you may still have some amount of data that gets duplicated in all of your pipelines for not a whole lot of good reason, but the pipeline cache will help you reduce sort of at least the shader compilation burden where anytime you have two pipelines that are gonna use the same shaders, they actually have the same shaders and you only do the compile once. And it's proven to be very, very effective. Another advantage to the pipeline cache is that it can be easily serialized out and written to disk. So with GL, there are APIs that allow you to pull the shader binaries out of the driver and try and reuse them, but they're all still subject to the recompile problem because unless the driver somehow is able to predict every single possible binary that it would ever use, you always have the risk that it's gonna suddenly recompile. There are different drivers who have implemented different methods of trying to do some on-disk caching. That also has issues. So, but the pipeline cache works really nicely because it puts the application in control. It can create a pipeline cache, it can compile all 40,000 pipelines it wants for its application and then it serializes the cache and the next time it loads the cache in, deserializes it, compiles all of its pipelines and this time you don't actually get any shader compilation because it's already been done in your pipeline cache and it's worked out really, really well. So sort of to wrap up pipelines, a lot of the reason why they're important is because they bring a lot of predictability to the API. So shader compilation happens exactly in VK Create Graphics Pipelines. It does not happen at draw time. Drivers also have a lot less work to do at draw time so in the GL model, we have to crawl the entire GL state machine in order to decide what to do at draw time. Well, with a pipeline object, we get the opportunity to pre-bake a lot of things. So if you look at the difference between the draw command in our GL driver which is the draw command in our Vulkan driver, it's worlds apart because the Vulkan driver just looks at the pipeline and if the pipeline's different than the one before, it just does a mem copy, done. Whereas the GL driver has to crawl through all sorts of different state and try and look up shader keys and look things up in a cache and there's lots and lots of CPU time that is wasted just because we have to crawl the entire GL state machine. And also the pipeline cache, like I said, it's been pretty effective and it can basically completely remove shader compilation time from your app startup, at least on the second iteration. So for instance, one of the things that we've heard a lot from the embedded world is like the IoT guys or like the automotive guys, they have these, I don't know, 10 second or two second zero to backup camera requirements on their IoT scenario and if you have to spend a second of that compiling shaders, that's gonna hurt. If you have a pipeline cache, then you can do the shader compilation once, bake that into your image and now you don't have to worry about it on subsequent iterations and it just completely solves that problem. So the next thing I wanna talk about is render passes. So render passes are a basically completely unique to Vulkan concept. Of all the other APIs that are out there, like OpenGL or DirectX and even the newer APIs like DirectX 12 and Metal, they don't really have a concept of this. This is something that's sort of unique to Vulkan and what it does is it provides a way of structuring your rendering into passes and sub passes. So most of the time when you're rendering an OpenGL scene, you already have things sort of structured into these render passes. You'll do one pass that draws all your geometry, maybe with just depth and then you'll use that on another pass to fill in where you actually have your more complicated fragment shaders and then you may have another pass that does some sort of an HDR gamma curve application and then maybe another pass that does your depth of field stuff. So you'll have all of these different passes in your rendering process and so you already structure your rendering that way but Vulkan provides you with a method of sort of giving that information to the driver so that it has a better idea of what you're doing. Each sub pass has its own render targets. So you do, you have a render pass and it's divided into sub passes, each one of which has its own render targets. The render target information is declared completely upfront. So you create a render pass object that has the information about all of your different sub passes and that render pass object is actually passed in at pipeline create time. So your pipeline knows all of this information. All the dependencies between the sub classes are nice and explicit. So you have to declare that this sub pass is actually going to use the, sub pass two is actually going to use the render targets from sub pass zero. So the driver has that information. And it also sort of, so it's nice in that it passes all this information to the driver but it also sort of forces applications to render nicely. We'll get to that in just a second. So what do I mean by render nicely? So let's look at some rendering. So you've got, here's some GL calls and there's some state calls in the middle of here, let's assume but here's some GL calls. So we bind a frame buffer, we do a couple of draws, we bind another frame buffer, we do another draw, then we decide we need a texture. So what do we do? What we call GL text of image 2D to upload our texture. And then we do some more drawing. So what does this look like in Vulkan? In Vulkan it would look like this. So we have this render pass structure and inside the render pass we have two sub passes, one for our first frame buffer, one for our second frame buffer. And we do our draw calls inside those sub passes. One thing you'll notice is that text of image 2D had to get moved up to the top. So Vulkan does not allow you to do what's called a copy operation inside of your render pass. The only thing you're allowed to do inside your render pass is change state, bind different pipelines and draw. You're not actually allowed to do blitz or copies or anything like that. So why do we actually, why do we require this structure? This seems kind of stiff. But part of the reason is that changing frame buffers can be expensive. So especially if you're dealing with a tiling architecture, which most of the embedded world is, in fact most of the graphics world is these days. Anytime you change your render targets you have to potentially flush the tiler. Which means you have to actually make it do all of its rendering and you might have just lost some parallelism that you could have gotten back via some other method. And one of the problems with copy operations, such as texture uploads, is that they may implicitly change frame buffers. So most of the time if you're gonna do a texture upload sometimes you'll be able to map it and do it on the CPU. But a lot of the times a texture upload actually ends up being bind some frame buffers, do a draw call and then reset back to what it was before. And so doing a texture upload in the middle of your drawing is actually a pretty mean thing to do to a driver. Some drivers handle it very nicely, other drivers handle it very poorly. So instead of allowing that magic, Vulkan just says, no, you're not allowed to do texture uploads in the middle of your drawing. You need to do those ahead of time. So this allows us to improve parallelism because it removes pixel dependencies. So one of the things that the render pass was specifically designed for is that if you look at the render pass you can theoretically run the entire render pass once per pixel. You don't have to do the first draw call for everything and then do the next draw call and then do the next draw call because you don't know about pixel dependencies because the entire render pass can theoretically be run in a single draw call. The entire render pass can theoretically run for one pixel and it's correct and then it can be run for another pixel and it's correct. And one of the reasons why this is is because again, most of the GPUs out there, especially in the embedded space, are tyloos. And tiling architectures like to split you rendering up into these tiny chunks called tiles and by enforcing that the entire render pass can be run a single pixel at a time. It means the entire render pass can be run a tile at a time which really helps the tyloos in terms of being able to arrange the rendering in a way that's very, very friendly to them. It also reduces a lot of driver guesswork. So I keep saying all these things about tyloos but tyloos aren't actually all that slow. I mean, we have all these phones and things that are running on Qualcomm chips or whatever. They run pretty efficiently. Well, the reason why they're able to run efficiently is because the drivers do a lot of work to try and take the rendering that you gave it with all of these GL calls with texture uploads in the middle of stuff and rearrange the rendering in a way that doesn't actually change the content, the resulting context of your render buffer but makes it friendly to the tyloos. So instead of requiring the driver to guess and try and figure out what you're doing, this provides a nice explicit way of laying out your code that's very, very friendly to everybody. It also allows you to reduce some CPU overhead because the tyloos are spending lots and lots of cycles trying to figure out what order to put your rendering in that they could be spending drawing things or running some other bit of CPU work. So the render pass provides a very nice structure that really helps a lot of the implementation to know what you're doing better. And I say tyloos a lot, but it also helps us people who work on a immediate mode hardware because even if you don't have to worry about flushing the tylo, we still have cache flushing and things that happens and this helps us out quite a bit as well. So let's sort of shift over and talk about multi-funding. So in the OpenGL world, multi-threading basically doesn't exist. You have a concept of a context and that context is bound to the current thread. And if you want to do something in a different thread, you unbind the context from the current thread and you bind it to the other thread. You can't work on doing drawing on a single thing from two different threads. There are some things that you can do where you can have two different contexts and two different threads and maybe share the textures between them, but it's really awkward. It's really hard to deal with. Vulkan has sort of this natural, very easily parallelizable object model, but parallelism comes at a cost. And what's the cost of parallelism? Synchronization. This is too on the CPU just as much as it is on the GPU. You can't write parallel code without putting in a mutex here and there. And so one of the things that Vulkan does is it hands the synchronization off to the client because it no longer has once you have this parallel method of working, it no longer has the information to do the synchronization for you. Instead, it requires the client to do it. So one thing they have to do is they have to synchronize around a few key calls into the API. So vkq submit is one of them. There's a few other allocation APIs where the client has to make sure that they synchronize around it because they're working on some sort of an allocator object that is mutable. And then they also have to handle synchronization between GPU and GPU, CPU and CPU and GPU. And they have these things called fences, semaphores, and events that allow for this sort of synchronization. Vulkan actually allows you to submit work in parallel on the GPU. So you can have potentially multiple different cues all working in parallel. So you can have one cue that's doing compute work and you can have another cue that's doing graphics work and they're actually happening in parallel on the GPU and you can synchronize between them with these methods. The client is also responsible for ensuring that all GPU resources remain alive as long as the GPU is using them. In OpenGL, we did piles and piles of reference counting. Everything was reference counted because every time the GPU went off and started doing something with it, you had to have this GPU sort of hold a reference as it were so that in case the client happens to delete it while the GPU is off using it, nothing bad happens. But Vulkan passes that responsibility off to the client so it's your job to make sure you don't delete a texture while you're still using it. This looks bad in some ways because you have a lot of stuff to juggle, but really, if you're writing a multi-fitted application, you're dealing with synchronization anyway. You're thinking about a lot of these things anyway. And if you structure your rendering in a reasonable way so you work on sort of a frame-by-frame basis, it's not terribly difficult in the client to tag resources and keep track of what things are alive and what things are. It is a little bit of work, but it's not bad. So error handling. So a lot of APIs, especially you look at your standard computer science classes these days, they teach you to do sort of a lazy error handling method. Well, you just go along and when you get an error, you clean some stuff up and you start working your way back up the chain. A lot of people really love exceptions because you can just throw them and then you catch them later and all the cleanup happens for you. So this doesn't work on a state machine. So OpenGL is a state machine and if you have a state machine as your API, then any time you have an error that isn't fatal, you have to leave the context in a well-known state because if you didn't, then error handling just wouldn't be possible. Any error would leave the client with having to completely reset and get a new context. So OpenGL errors just don't change state at all because that's the only way you can put it into a reasonable well-known state. Most OpenGL API misuse is non-fatal. So the way OpenGL is set up, you can do basically anything you want to to the API of passing it in a valid pointer and the API is supposed to handle it gracefully and just do nothing to the state machine and return an error. This means a lot of upfront error checking and validation. So every time you do a GL call, this means we have to check for every possible mistake you could have made right there on the spot and return nicely. If you don't actually have that error, all of those CPU cycles are completely wasted. There's also a bunch of other things such as texture completeness that we have to check for because textures may be in any random state because they're not immutable and you can set random bits on them. And so there's a huge amount of work that gets done when you call GL draw or whatever else in order to make sure that you did it correctly. For well-behaved apps, so if you look at your average game and you build a debug, build a view driver so that it actually prints out every single time there's a GL error, you won't see a single one because they use the API correctly and they aren't causing errors. And when that's the case, you waste piles and piles of CPU cycles. If you look at any GL driver and you do a profile on some benchmark or whatever, you'll see the GL driver show up in some non-trivial way. And a lot of that is error checking. Whoa, what just happened? I have my slides out of order. Oh well. So Vulkan takes a different attack. Vulkan drivers just don't handle errors at all. Any API misuse, it says in the documentation, it may do anything up to an including program termination. So if you abuse the API in any way, you might get a seg fault, it's your fault. Any sort of invalid synchronization or freeing some memory before the GPU is done with it may result in wedging the GPU. So this is really, really nasty. But, and we, you know, it's kind of a playing with knives kind of approach where we just sort of give you what you want and then it's your job not to cut yourself. But, it means that you aren't wasting those CPU cycles. So in order to solve this problem, what Chronos has done is they provided a set of validation layers. And these layers you can insert while you're developing and they will give you all of the error checking you want. And they do an extensive set of API valid usage checks. They also do some of what we call deep validation checks which introspect the state in ways that maybe even a GL driver wouldn't. And can give you warnings about you doing something that might not be compatible on different platforms or it may help you find some subtle error. And we can do all of this and we can make the validation as expensive as we want it to be because you're never going to use it in release mode. And by allowing this validation to be used during development but not a release, it means that you aren't wasting all of those CPU cycles on validation on every single draw call. Instead, you write a well-behaved app, it passes the validation layers and then you go run it on the driver and it should be okay. It also means that because the validation layers are provided by Chronos and not by your hardware vendor, it means that the errors you get out of it are going to be consistent. So you run a GL app and Nvidia might have a slightly different interpretation of the spec from Intel or Qualcomm or whoever else and you might get slightly different errors and some may be more useful than others. But this is an entirely consistent set of error checks because it's provided by Chronos and not by the vendor. The validation layers are also open source so if you don't know where this error came from, you can set a breakpoint and it will tell you. They also provide much, much better error messages. So when you get a validation error, it will actually give you a link to the appropriate location in the spec that you just violated. Instead of simply giving you a GL invalid enum error on a function that takes 17 enums. And so the validation layers are actually quite a bit better. So summing up, why is Vulkan better and why should you care in the embedded world? So Vulkan is designed to be lightweight and low overhead. So pipelines give us a lot more predictability on performance and faster load times if you use the cache properly. Render pass provides some structure that voids driver guesswork. It also avoids application guesswork on what the driver guesswork is. It natively multithreads very, very nicely and we aren't wasting CPU cycles on runtime error checks. So don't waste your CPU cycles on driver overhead. It's a bad plan. So question, should you use Vulkan? This is a question that gets asked a lot and I didn't have this in my slides originally but I decided to add it sort of at the last minute because I get this question from people all the time. There's been a lot of sort of different messaging about Vulkan. Some people have come out and said that Vulkan is great. You should use it for everything. Other people have come out and said, well, you should only use Vulkan if you write a major game engine with thousands of titles shipping on it. And it's an engineering decision. So Vulkan pushes a lot of more stuff off under the client. It's a little bit harder to use. You have to do your memory management. You have to do your own synchronization. And so it is a little bit of a harder API to use. It's a lot more code to bring up a triangle. But on the other side, you get lower overhead, potentially better performance. And I think it's a nice way to work with. So in the end, it's kind of a design decision. Some people are gonna tell you that you should never use Vulkan unless you're an expert. I've seen a lot of people who aren't graphics experts who've been using Vulkan extremely successfully. On the other hand, Vulkan does take this sort of juggling knives approach where we give you this API and we are giving you explicit control and it's your job to make sure you use it correctly. And if you don't use it correctly, bad things happen and it's your job to not do that. So there's some trade-offs there and every single team that's trying to design what to do about some graphical application needs to make their own decision. So I wanted to get one last little spot before I ask questions for questions. I wanted to give a little bit of an update about sort of what does Vulkan look like in the open-source world. So, okay, so February 16th, 2016, which was just over a year ago was when Vulkan was released. At the time, there were four day one conformant implementations for Imagination, Intel, NVIDIA and Qualcomm. These are the companies that actually had the API implemented on day one and passed the conformant suite. Intel was there on day one with a conformant open-source Linux driver. That's something I'm fairly proud about because I'm part of the team that developed that driver and it was a lot of fun and also rather harrowing. Also, tools, tests and validation layers were all released open-source on the first day. We had two AAA game titles, Dota 2 and the Talos principle. Those weren't actually day one. I think technically Talos was supposed to be out day one but they were all sort of within a month. So here's a little table to sort of give you an idea of what Vulkan and open-source looks like. So one of the things that's cool about Vulkan is it's in some ways the most open graphics API there ever has been for 3D graphics because with OpenGL, a lot of stuff was closed. A lot of the discussions were closed. There's a lot of stuff that was very closed off because a lot of the companies in the graphics industry are very worried about IP leakage. But with Vulkan, things have been much, much more open-source friendly. So for instance, everything in that entire Linux column is checked off. Everything's available on Linux. Basically everybody who has a driver has a driver for Linux. They're not all open, but they have a driver for Linux. Source code is available for the Vulkan spec. It does not have full Git history for legal reasons but the source for actually building the spec is there. Our driver is open-source, full Git history, community supported, the works. Other drivers, they're available on Linux. Not so much on the open-source. The Vulkan loader, Spurvy is a binary format for shaders. So all the tools for working with that, the validation layers is all 100% open-source and they're reasonably decent community projects that are available on GitHub. The conformance test suite is open-source, Linux friendly, Git history is there. Community contributions are still kind of getting worked on because they're working on the upstreaming process but it's most of the way there as a project that people can actually contribute to. So it's actually a very, very open ecosystem in a lot of ways. Basically, instead of everything being closed except for us crazy people who implement an open-source, now by default basically everything's open as soon as it's publicly available except for most people's driver-source code. So sort of over the course of the last year, there's a bunch of things that have happened. So we've gone from four conformant implementations to seven. We're still the only open-source one, keep plugging my team. The validation layers and tools have gotten a lot better. So when Vulkan was originally released, validation layers existed, tools existed, they were, things were a little bit sketchy. You could still end up hanging your GPU with something that the validation layers didn't catch. They've gotten a lot better. Many, many more checks have been added and the validation layers are really in very good shape today. We've added another one to the list of AAA titles. Doom released for Vulkan on Windows. A lot of game engines are porting to Vulkan. So even though we only have three games available today, all the big game engines are developing Vulkan backends. I think Unity actually just released their Vulkan backend. So there's a lot of work is going on and once those game engines develop Vulkan backends, then there's a lot of content that will suddenly become available because for a lot of people, especially if you work on something like Unity, Vulkan support should be mostly a checkbox. So that'll be really good to see. So I wanted to give a whole slide to the open-source community because they've been awesome. Since Vulkan was released, there have been all sorts of open-source projects that have started up around Vulkan and it's been really cool to see. So there's been a lot of open-source Vulkan demos. There's a guy called Sasha Willems who wrote a bunch of demos that have been very popular. Some of the other driver vendors, hardware vendors have provided demos in open-source form. So there's a lot of stuff out there if you want to go learn about the API by reading some code. So those then started in the last, it got released about eight months ago, I think. There's now a community-developed open-source driver for AMD Radeon hardware that's done by Dave Ailey and Bas Nguyenhausen. And that driver has become pretty good lately. In fact, some people are actually recommending that you use that one over the closed-source one on Linux for certain things. There's also been a lot of open-source games and engines. So VK Quake is a Quake implementation on top of Vulkan. Why do you need Vulkan to run Quake? I don't know, but they decided to do it. There's also been some game engines that are open-source that have added Vulkan support. One of my favorite little open-source projects was somebody decided that they were going to implement, in the interior of 64 and PlayStation 1 emulators, entirely in Vulkan compute shaders. Because apparently those are actually really hard to emulate with GL or whatever. And so they decided to write basically a software emulator, only they did it in compute shaders. And it's been really effective. You can run games at full frame rate on embedded graphics. It's been really cool. There's a guy who's working on doing a Direct3D9 implementation on top of Vulkan. There's also been several open-source libraries and tools. So there's just a lot of stuff that's gone on. And it's been really cool to see this stuff develop over the course of the last year because you have this brand new graphics API, barely anybody uses it except a few people who are inside of the Kronos NDA. But the open-source community has really taken to it and there's been a lot of development that's gone on. So that's sort of the end of what I have in my slides but I wanted to open up for just a little bit for questions because I know there's a lot of people who don't really know what Vulkan is. So if you have any questions, feel free to ask. So the question was about Android sync fences. So Android has had this thing called sync FDs for quite some time. They recently have been basically ported to the upstream linux kernel as the form of sync files. And these kind of map to the Vulkan semaphore concept but they're not quite an exact map. And there is work going on to allow you to work better with sync files in Vulkan. It's not quite there yet but that will be happening soon. So it's not, the problem is that the sync files don't quite map to the Vulkan semaphore concept and so you have to sort of massage things a little bit. But yes, we should be able to do sync file type stuff with Vulkan very, very soon. I don't know if it's better. So I know for like OpenGR, some of the manufacturer, like Vivente, Poirvier, they have some vendor specific features inside the GPU. And so in OpenGR, they have some callback so that we can call to enable those features. Like an example is OpenGR just allow you to upload like RGB textures and some manufacturers just said, hey, you know what? You can upload a YUV texture directly from your camera and we do the rendering. So either in Vulkan, any specific API similar to this? Yeah, so Vulkan has an extension system similar to OpenGL. The Vulkan extension system is a little bit more complicated because Vulkan is designed to be able to work against potentially multiple GPUs from different vendors. And so the way you enumerate extensions is a little more complicated in that scenario but the same basic concept applies. You have extensions and you have some string that says you have this extension and then as long as you check for the extension appropriately you can call, you know, VK get proc address and it will give you a function pointer that you can call. So it's basically the same thing. It looks a little bit different in Vulkan but we do still have an extension mechanism and there's actually, I think there's something like two dozen extensions published at this point for some of them are Kronos extensions for doing Windows system integration but a bunch of them are NVIDIA and AMD and I think there's an imagination one. So yeah, it's got basically the same thing. So GPU hang sound bad? Yes. Recovery, et cetera? So Vulkan does provide a method for knowing whether or not you've got a GPU hang and the application is able to recover from it. So the, even though GPU hangs are bad, Vulkan does provide certain guarantees. So for one thing Vulkan guarantees that if you use the API correctly, you want GPU hang. Also Vulkan guarantees that even if you do GPU hang, you're not going to affect some other process. So you can't have one process that stomps on another process as memory by hanging the GPU. Also Vulkan has an error code device lost that you can get from a bunch of different entry points such as Q submit. And if you do get a GPU hang, the driver is supposed to return device lost at which point you have to go recreate all of your stuff and re-upload all of your textures and sort of reset the state back to what it was before. But it does provide the application a way of knowing that it got that hang and we covering from it instead of just dying a horrible death like it would otherwise. So that is, it is something that they're aware of and we have mechanisms for dealing with on the application side. But hopefully you write your application correctly and it passes the validation layers and we write our drivers correctly and it doesn't happen. But it always can. I have a question about compositor. What's the state of API for getting PIXMAP from other process? So, state of API for getting PIXMAP from other processes. This is something that we are aware of and are working on. It does not exist today, but there is something that we're very well aware of. So, there's multiple people who are interested in using Vulkan for compositors because it is a scenario where you're doing, you don't want to be spending a lot of time doing it and the compositors have very sort of predictable paths. It's not that hard to implement a Vulkan compositor. The APIs don't exist today. Me and Christian have sort of prototypes and APIs for what the Wayland extensions could look like. For instance, I don't think I have seen a prototype for X compositor extensions, but I would expect to see that sort of stuff coming before too long because it is a use case that people want and care about and we're aware of it, but most of the effort up until now has been getting everything working and working well before we try to add compositor and actions. It's on the to-do list. It just hasn't been on the top of the to-do list, so. Yeah, if somebody wanted to contribute to the code and Mesa, you guys' way of doing that, do you need help? Yes, help is appreciated. Our driver today is very close to sort of complete. There's only a couple of hardware features that we haven't exposed yet, so in terms of that, yes, we're in pretty good shape, but there's always stuff coming down the pipe. There's always stuff that we can use help with. There's always bugs to fix. And we are a completely community project. The Intel driver is just part of the Mesa tree, just like our GL driver is. And we use the same compiler infrastructure. We use a lot of the same stuff between the two drivers. And it's a completely community project. We have random community members who've contributed, anyone can. Kavi probably wants me to tell people that we do hire people to work on it if you want to come live in beautiful Portland. So yeah, it's very, very easy to contribute to. In fact, just recently, we had a pretty nice open source success story where Farewell Games, which is a game porting house in England, that quotes a lot of the triple-A game titles. It needed a Vulcan feature that our driver didn't support, and so they just sent us two patches. And it took about three days, I think, to get them merged, but so we get contributions from random people quite a bit, and it's great. Other questions? Thank you very much, then.