 Welcome everybody. This presentation is about the status update of the Vulkan project. My name is Jeroen Bakker. I'm developer and one of the module owners of the EVM Viewport module inside Blender. Today I want to tell you about the main reasons why we moved to Vulkan. I will be giving something about the project timeline and the milestones. And we will go over several technical challenges that I got when starting on this journey. And most of those are not that obvious as you think. When the presentation is over there is some time left for questions and answers. But we will start with a history lesson of OpenGL. OpenGL is a graphics library that was created by Silicon Graphics. Silicon Graphics is a hardware manufacturer that created workstations for graphical use cases back in the 1990s and the 1980s. They also provided 3D accelerator cards. That was how they were called back then. Their high-end GPU had 128 megabytes of memory. 104 could be used for texture memory. The rest was used by the driver. And it could handle 48 bits of integer colors. So 16 bits per channel. So no alpha. And in OpenGL 1.2 they also added the texture mapping and per fragment lighting to their API. Accelerator cards became much more mainstream. They got in home computers and were used for gaming. Games had different requirements and they pushed for programmable shaders. The programmable shaders is the ability to write a program that could be run on the GPU and used to alter the vertices and the fragments. We know them of course as vertex and fragment shaders. Since until... well, let's go. OpenGL 3.3. The shader-based programming model didn't fit well with the earlier fixed pipeline programming model. And in order to fix that they created a new programming model and that was called core profile. So often you see from OK, this is OpenGL core profile. That means that it is a different programming model. This programming model was available next to the previous programming model. And the responsibility from the developer to not mingle between those two. Otherwise dragons appear. GPUs got much faster and got more memory. 8 GB became normal. Internal bottlenecks shifted towards the bandwidth. How fast could you get the data on your GPU memory to all the different cores and execute the program you want? OpenGL 4.3 mainly focuses on improving the performance and flexibility by introducing compute shaders. Adding more efficient texture formats. Adding texture compression. And manipulation. In the upcoming blender 4.0, OpenGL 4.3 will be the minimum requirement for starting blender. OpenGL was originally designed for GPUs with around 128 MB and memory running on a host system with only a single core. And using a non-programmable pipeline. Something goes wrong. Nowadays it has two different programming models. Both are still supported in order to run the other application. OpenGL also has a lot of extensions. Most of those extensions we are created by a vendor and focus around how they understood the problem. Similar extensions by different vendors were created. And under the umbrella of the Cronus group efforts were made to standardize those different extensions into a single standard. Resulting into a new standard. The typical problem. Drivers might not have implemented the standard one. So applications should still support the older ones of the vendor specific ones. The specifications of the extensions. Those are written in a human readable change upon the original specification. That's fun. They could conflict with each other. And it was hard to read and understand what they actually wanted to what they actually do. Here's an example I got last week. For the people I was working on getting spare fee capabilities in OpenGL. And this is such a specification of the extension. Where it starts with all the dependency it has. I'm influencing all these other extensions. And in the end it tells you from get the specification. Rip out those pages and those lineas. Replace that with this one. And if you then read it, it's what we want you to read. Which is really to me a diff. This also puts a lot of weight into the GPU vendors when writing OpenGL drivers. Due to the complexity of the drivers and misunderstanding of the standards, bugs could appear and they often stick around. It normally takes around one year between a bug being detected and the driver with the fix of the bug being available to users. During that time we have to add a workaround to Blender to make sure that that specific driver is handled differently. And that also means that in the Blender codebase there's quite a lot of workarounds. Kronos started the next generation OpenGL initiative. In order to fix those issues that I mentioned and many others. Vendors were more focused on adding more similar features with similar solutions. There was also in general consensus about the current bottlenecks and the future GPU developments. The idea became to scrape away the high level graphics API which OpenGL was and make a more detailed specification very strict and gives the power to the application developer on how to use it. When the specification becomes more strict, it's also the possibility to perform driver conformance testing. So all the different drivers between different vendors they get into the process and at the end of the process they know they are compatible with each other. If we would name such a project we would call the EVM viewport module would call it OpenGL Next. But the initiative became up with a different name and they called it Falcon. I don't consider Falcon to be an API. An API where you can program against. It is more an ecosystem. It starts with how the specifications are written. There's a single online specification that is written in a testable manner. I think it's not being for this. With a testable form, everything that you can do with the API is really strictly written and it tells you exactly how stuff is working. And that makes it possible to do the conformance testing. The extensions are also included into the specification. You can download the specification without the extension with all the extensions that are already standardized or with all the extensions that are available there by all the vendors. And that is one single document where everything is done so all the diffs are applied from the extensions making it much easier to understand and develop against. Due to the testability of the specification drivers can also run validation tests. But usage validation, how does an application use the Falcon specification that's not part of the runtime. So in OpenGL you could do anything and you got back invalid argument. In Falcon you can't get it. Or your system crashes or whatever. But during development there are a lot of tools that help you with it. During the development you can install a validation layer and that validation layer will point out that will fit between the application and the Falcon driver and that will do all the validation testing for you. So because the Falcon specs are so strict and the drivers are validated it really means from during runtime you don't need to do the validation which increases performance of course. But it also means that the error codes of the Falcon is more from okay it works, I'm out of memory. Those two kind of error codes you get. Falcon is also shading language neutral. Everyone is talking about the GLSL language but it's not really a language. It's a lot of different dialects that look the same. Actually they're quite different. Falcon wanted to solve that by saying okay we're only going to support SPRV which is a binary format and give the application developer the freedom to choose the shading language that they want to use. Do you want to use GLSL? And if they are going to use GLSL then here you have the really tightly strict version of GLSL that we support. It's just one because the application is compiling it. It's not a vendor decentralized compilation anymore. But you can also exchange the GLSL with HLSL or in other higher level shading language. And then we get into from okay we're now in a lower level API so memory management isn't part of the Falcon driver. Memory management is the responsibility of the application. Okay that's something interesting. I want to go over the history a bit of the Falcon project within Blender. The reasons why in Blender we want to migrate to Falcon isn't performance. It is platform stability. Each year we're adding work routes to the Blender codebase and it takes around one week to one month to develop those work routes and validate that we didn't broke any other platform. When the Falcon was introduced most vendors were quickly focusing onto Falcon and ignoring that they also had an OpenGL stack to maintain. So the OpenGL stack, the quality of the OpenGL stack went down. We received more bugs. The vendors are currently solving that by wrapping the OpenGL calls on top of Falcon or DirectX which is then from their point of view we only have to solve one part. That's the Falcon part. But it also adds the complexity that some features in OpenGL aren't compatible with Falcon. So it's always a challenge to get OpenGL working. The second reason that is new GPU features are not available in OpenGL. If we want to use hardware ray tracing or we want to use mesh shader or any other feature that is modern, you can't. You just can't with OpenGL. You have to use Falcon. And hopefully at last we will get some performance improvements but it is not out of the box. We have to do a lot of stuff for that. In order to comply to the way how Falcon can run the drawing of Blender in an optimized manner. Any... For Falcon to draw something on the screen you have to create a recipe. A recipe they call it a pipeline. That pipeline that is then loaded on the GPU and you provide the data to draw with using that recipe. That's different than how Blender works. And I will come back to this on a later stage in the presentation. A Falcon project started around 2019. We encapsulated all the OpenGL calls that we had in the different areas of Blender to a central component which we call the GPU module. At this point it was only allowed to use any GL code inside the GPU module. There was also an initial prototype of a Falcon back context but it wasn't able to draw anything on the screen. It could create a context and that's it. But it was a starting point for the future developments. For the encapsulations we created a lot of abstract classes which in the end you could create a concrete version form and then your GPU could work. For the Falcon project we should just implement those classes and we would be ready. Sadly for the Falcon project in 2021 the priority shifted towards EVnext. A lot of work went into the Asset Browser and the Texture Painting project. The Falcon project in 2021 only received some minor changes not really noteworthy. But in 2022 Apple worked on adding the metal GPU back end to Blender. They used the same context so they implemented concrete classes inside the GPU module and they fixed the API where it could be improved, maturing the API and that's something that would benefit the Falcon project as well. Also during this time I started convincing people from what we actually should invest in the Falcon GPU back end resulting in an official commitment from Tan that I would be working on this project and became one of the main developers and the project became one of the main developments for 2023. Looking at the task that needed to be done we looked at what's a realistic goal for the project. We target Falcon 1.2 not 1.3. The reason for that is that most of the GPUs that we currently support also support the Falcon 1.2. So there wouldn't be that much often difference between platform support. We support Windows and Linux. The initial prototype that we did also supported Apple devices but it is too hard to maintain as it is not fully Falcon compliant. And from performance point of view we wanted to be as close as possible to OpenGL that we have currently. We know that we need to do several iterations to get to that speed but those were things that we back then thought from that duo. I looked at the approach that Apple did when they did metal back end implementation and quickly found out that I would not follow the same approach. Their approach was to implement every class make sure that everything works and then at the final stage of the project we can start Blender and we can see that we did a good job. That's nice but that's not how I want to work. I want to work with small steps, being able to test every step that I take and therefore I looked at what's the quickest step to do. The quickest step was to get Compute Shaders working. Blender already had a unit test for Compute Shaders but before we can actually use Compute Shaders we need to do the GLSL to spare V cross compilation and we need to have some basic memory management otherwise we can't do anything. The first task was to get that up and running and afterwards we extended that using a unit test to implement the graphics pipeline. The project team basically most of the time is just me. Bas Yen is acting as an admin in the project which means that he will be looking at the project and see if it depends on other areas of Blender so they need to be informed or decisions need to be made and Clermont which has amazing experience of GPU development. But there was also some community people involved. Most noteworthy was Kazachi. At the beginning of this year he already presented a Blender version which ran on the Vulcan backend and he is still working on that branch and we learned a lot from him. How did he approach his project, what kind of challenges did he make and it was really an inspiration for us to see from how we could design some of the features of the Vulcan backend so we can land it into main. So where are we now? In the beginning of October this year we added the Vulcan backend as an experimental option to Blender 4.1 alpha builds. Vulcan backend will not be available in the release yet. We do this to get some feedback from does it start on my computer or did we miss something that we didn't know about. During development I test the Vulcan backend in 20 different computer configurations but there are thousands out there in the world that have covered them all and it's very easy to test even in Steam. Every day a new version of Blender is uploaded to Steam. You don't have to stick to the release versions and you can just add from OK GPU backend Vulcan to installing and you can install Blender. It's just Blender but it's just OpenGL. Or is it? No, it's Vulcan. That's the current state of the Vulcan backend so we have a running Blender but we aren't there yet. Some of the bigger Blender features are not working yet. Even next cycles. They use GPU features that we haven't implemented. EV uses really advanced rendering techniques which we are handling one at a time. For cycles, sharing memory between CUDA and Vulcan that works differently than we anticipated. It works the other way around where cycles need to allocate the Vulcan textures and the current cycles does that the other way around. Development currently focuses on stability and features not yet on performance. It is better to do the performance part in the end as we have all the features in place and we know what needs to work together. Doing those developments from start would lead to many rework at the end of the project which we don't want. Eventually the focus will shift on performance. After the performance is similar to the OpenGL backend we will hand over the project to the EV and Viewport module which are the same people. This would also make it possible that the Vulcan backend will be available in a released version of Blender. When we are there we will select the version, not now. This was the status update of the Blender Vulcan backend project and now I want to continue and check on some of the gotchas that we might come across when we work on Blender. You don't need to be an expert in Vulcan to start a project. Basically it is my first Vulcan project and I took something like let's migrate Blender to Vulcan. In the next sections I go over some of the gotchas that I came across which I was aware or wasn't aware or didn't fully understand when I started the project. Let's see what we can learn from that. An area which is often taken for granted is memory management. In Vulcan you can allocate two types of memory. You have images and during the allocation of the image you already define it has this many dimension. It uses this pixel format. It has a layout. Ignore that for now. But it also has a usage. You have to identify I'm going to use this image in this manner. For buffers that's the second resource type. You have something simpler except that buffers are not typed to a data type. But it has a usage. When performing allocation in Vulcan you have to take into consideration the physical layout of the GPU. GPUs have texture memory, can have a cache unit. My node has GPU cores. But how are they physically attached to each other? That makes sense. In Vulcan you can query from what kind of memory heaps do I have. I did check from an AMD Vega GPU and I got a list of four memory heaps back. Which has some visibility from the host. Am I accessible from the host or not? But same from the GPU. Some memory is just not accessible from your host system. So in order to upload memory like an image to that part of the memory you first have to upload that to a part which is visible by both the host and the GPU. Which can then be copied towards the not host visible part of the texture memory. Most of the time GPUs, typical GPUs have a small part of memory which is used for that data transfer. But that can differ per system. Also the performance from how far can you write to that part of memory or read back to that memory. That can differ. So the choices that you have to make during the allocation of the memory is really important for the final performance that you get. Gladly large part of the problem is really common. And there are libraries out there that exist that can help you implement this part. The Vulkan memory manager is rightly used in gaming and applications and tutorials everywhere to solve this problem. It solves most of the problems but you still have to say from K. I'm going to use this texture in this way so please select the right memory for you. Now let's allocate a depth texture using an OpenGL compatible data format. 32 bits where 24 bits are used to store the depth and 8 bits that can be used as a stencil buffer. And we want to use this as a frame buffer attachment only. Now NVIDIA success. Intel success. AMD not supported. So let's talk about data formats. AMD doesn't support any depth 24 bit texture formats. Everywhere it says false. In Vulkan every combination between a data format is usage and this layout is optional. Vendors typically only support the data format in a way where it is also being supported by the actual silicon on the GPU. If Vendors says from K. I have to use less space on the physical space on the GPU it can remove some of those silicon and then it's not supported in Vulkan. Games and tutorials mostly select the combination that are widely supported. For Blender this is a bit harder because we also have addons who actually rely on those texture formats. So we always need to add a fullback for any texture format out there. It's not that hard so far I found out that mainly the 24 bit texture formats or the 3 channel texture formats are not well supported but you always have 32 bits or the 4 channel texture formats which you can use. Even when textures are supported it might still not support every usage. Blitting for example is a special hardware feature that can copy between different texture formats applying some scaling to it and filtering. Blitting already existed in typical home computers in the 1980s so it's really old technology. But in this case we're able to use the texture format as a blitting source but not as a blitting target because it has the blit source bit but it hasn't have a blit destination bit. If you look at other cases like this one there are also data formats that cannot be supported by a texture but only as a buffer and then even it can't be used as a storage buffer or a uniform buffer only as a vertex buffer. So there's quite some things when you create a texture or buffer that you have to take into consideration. When uploading data the data must be written in exactly the same format. That's no problem for data formats that also natively supported by the CPU like a float or 32 bit but what about how formats or encoded formats or really specific GPU formats. So Blender has to convert the data it wants to store in those textures by themselves. We added a function that can convert any host buffer to a device buffer during the upload to the GPU to save some cycles. Otherwise you would first do a conversion loop and then an upload loop. This one does that in the same go between all the different textures for formats. For float conversions we looked at available libraries like you have some libraries that really are dedicated to a single float format like the half data type. Or you have libraries that are really generic which you can use in any case but adds a lot of overhead. We used some C++ templating where we can generate the conversion function on the fly when we need it and still be fast. So something in between. Something that we also do here is that we clamp the value if it doesn't fit on the texture formats. In OpenGL if you upload a value that doesn't fit in half data type it gets an infinitive value which will show really incorrect renders. We solved that normally by clamping it first to the maximum value that fit in that format and then upload it. But because we do the conversion at the same time as the upload we also can do at the clamping to that same process. So it will be loaded to the CPU once and will be converted and clamped and then sent to the GPU in a single go. Just to increase the performance. In the beginning of the presentation I mentioned that the performance of Vulkan relies on the recipes you could reuse. These recipes those are stored in pipelines. Vulkan has two kind of pipelines. The compute pipeline and the graphics pipeline. This section is about the graphics pipeline. As the compute pipelines are really light and doesn't have the issue that graphic pipelines have. Pipeline consists about a lot of settings that you can set. This is all the different settings that blender sets inside the pipeline. If any of those settings is changed then the previous pipeline is removed and a new pipeline needs to be created. So if you change the shader, new pipeline, if you change the blend mode to your frame buffer, new pipeline, if you change the frame buffer, new pipeline. There are extensions in the making that changes this part in the Vulkan and is really to support the case like that we have for blender. Sadly these extensions are only supported by 4% of the platforms and would always require a fullback which is basically doing what Vulkan previous did. To reduce the code complexity we will stick for the first version just by creating pipelines for the time being. In the upcoming period I want to improve the graphics pipeline. We got already some working but it doesn't perform well. I want to start with reducing the number of pipelines that we have to create when we start up blender. So this I did account from clicking the Blender executable to the first time we see the default cube and it uses around 1000. It creates 1000 pipelines. Most of those pipelines are similar or the same. So I definitely think that we can reduce some of the pipelines there. Vulkan also has a pipeline cache but I just want to add that for a specific cases that we can't handle up front. It is better to add caches only in places where it matters. That's for many things in Blender. One reason why we have so many recreations of Blender is that during Blender 2.8 we created an immediate mode wrapper that could translate the old legacy OpenGL calls to the core profile GL calls. We call that IMM. That's really good for developers who not actually know how to develop towards the GPU module. But it also gives us now it now gives us a problem from okay we have to solve something there. IMM code is really readable for users for creating editors to draw the user interface for Blender. So they are widely used. So need a fun project? The Vulkan code is already in main. What you saw, what I did, that is just the main branch of Blender. No special compilation options. Just any 4.1 you can download and you can start it with the Vulkan backend. I use an iterative task approach. Find the problem. Discuss that problem. Analyze that problem. Write in PPR and get into contact and find out how to get stuff fixed. One tool that we always use from GPU development is RenderDoc. So get familiar with RenderDoc would be a good starting point. If you need some help with that we are always available there to help out. You can actually compile Blender with a compilation option with RenderDoc which doesn't more tightly integration between RenderDoc and Blender. So you can actually select the code which you want to debug and see only that part in the debugging tool. This is the end of the presentation part.