 Γεια σας, είμαι στοespanelis I am going to present and overview of the development of the Vulcan driver for the Raspberry P4. So I am going to start with a summary of the contents of this presentation. I am going to start with the development story for the driver. I will move then to the current state of the driver, what we support, what is working, what is not. I then will mention a little some of the challenges that we had while we were doing the implementation. And I will close the presentation with a future plan for the driver and also about how anyone could contribute for the driver. So starting with the development story. The name for the driver is B3DV. That is basically the same name that the OpenGL driver for the device adding at the end of B for Vulcan. We thought about other names but in the end we decided that reducing the OpenGL name and B will help to avoid confusion. The development started in a public fork of measure. In fact it started as private while on the early stage because as we started from scratch there were a lot of moving pieces. And then we moved to a public fork and this month, last week, the driver was matched on Mesa Masters. Right now the development is on Mesa. The driver reused several of the system pieces from Mesa. Specifically it reused the Windows system interface implemented on Mesa. It spans the compiler that was initially created for the OpenGL driver. In this case most of the Mesa drivers at some point represent the status on the intermediate representation called NIR. So we already have a backend that compiled that from NIR to the assembly of the GPO for the Raspberry before. So we were able to mostly use it but span a little for Vulcan specifics. And we are using the same kernel interface as B3D is using. The thing is that as a starting point our plan was using the kernel interface in order to reduce the number of pieces that we needed to modify. But in the end and right now the driver is more functional we found that we were able to do everything needed for Vulcan 1.0 without modifying the kernel interface. Perhaps in the future we span in order to get an interface that fits better with Vulcan but from now having the same kernel interface is working for us. So moving is trying to explain how the driver evolved through time with Masters. The development started around November as of the first month it was mostly about analyzing all the effort that it was needed. Because our plans at the beginning was starting with the core functionality for Vulcan 1.0 and only the mandatory fitters. So we started listing what we wanted to do with the priorities and started to investigate a little on compiling with OpenGL driver and the specification of the GPU how that could be done. The real work coding started around December. By January we got our fees the classical triangle demo working. So then between January and May we were working mostly using test, individual tests, specifically we were trying to use as much as possible CTS. I will talk about that later. So we were not using or visualizing too much on screen, we were just using tests that say just it works, it's not and that way we really focused on really specific features. Around May we tested the driver using the Sasaburian demos. Sasaburian demos are really popular demos using Vulcan. They are using both as tutorials and as drivers to test them and it's really common if you want to show something with your Vulcan driver or if you are trying to learn Vulcan to use those demos as reference. So around May we got tested there by the first time and we got several of them already working. As I mentioned before we started on a private repository mostly because as I mentioned we started from scratch so we were moving a lot of things, moving in the sense that we started some development and we then realized that the design of that solution didn't cover all the cases that were needed. So in a certain case we needed to rewrite, we wrote a lot. So we preferred to first work privately before moving to somewhat open development over an unstable base. So around June we moved the development of the repository, at that point I'm a public fork. Still, at that point we already knew more about what we wanted to do and how. We still had a lot of things that we were changing back and forth. So we thought that it was not still a point to ask the driver to be made upstream and then moving to the usual MSI workflow. Around July we, one of our colleagues, asked if we tried some applications, specifically the Quake games, basically because they are popular. And we checked and we saw that the advantages of those games were that they were small enough and open source and they were still well maintained and they have obviously a good comport. So around July 20 we tested and we found that we didn't need too much effort to get them working. I'm working on them a little and we got those running. And around August we got all the features needed for Vulkan 1.0. So in 10s we were mostly focused on getting all the tests needed to get a conformance. And around October, so these months, we moved the domain of Mesa upstream. So right now it's not on a fork but it's part of the upstream project. So going now, at the end of the day, initially my stone was to random hardware and the object was trying to use the most simple test possible. And the thing is that for that case it's usual to show the most basic Vulkan test, the famous triangle that just runs a color triangle. But in that case we decided to go even simpler. The thing is that for that triangle you still need the compiler, you still need to define the vertex. So we found that it was possible to write a Vulkan test more simple than that that was doing just a clearing, a Vulkan clear. The advantage of that is that that also allowed us to work in parallel. I mean for this project we were working mostly with Jaguar and myself. So while Jaguar was working on getting this really simple Vulkan test to work, I have been working on plugging the set of compilers. So the advantage of going so simple is that we try to get working somewhat more complex test. The problem is that we will have several pieces of the driver that they needed to work. So if it fails that could mean that you need to debug all of them. So we wanted to have a test that needed a cold pad with so little piece as possible. So after that then yes we moved as I mentioned to the triangle and then we moved to other tests. But after getting some basic test working or objective was trying to move to use Vulkan CTS as reference. As for what it was CTS is the official test suite from Chronos that Chronos is the consortium that defines Vulkan. That is a really detailed test suite that in addition to be needed or to be mandatory to so a driver could be conformant to the spec. This also helps a lot to treat on development. But the thing is that it requires some minimal functionality first. So basically in order to check for the output. So for example if you are testing that the CTS are working. So you are rendering for using a given CTS operation. Then you need to get the AMAS and compare against reference. And the same with buffers. If you are doing an operation on all your tests you need to save the outcome on a buffer. So after those basic tests we moved to get the minimal functionality to get CTS test running. USB O's, USB O's, copying between AMAS and buffers etc. Additionally we were also using the CTS for regression testing. So in overflow as soon as we have some features working every time that we added more tests, more features. All commits we were not allowing to address other parts of the driver. So at the same time CTS was a really big test suite. It has around half a million tests. So what we were doing was getting using the more relevant tests for the features that we were developing and creating a subset. And at the same time we were also using a tool called ParallelDKLunar that allow us to run the test in parallel. And also handle the crashes better than the CTS itself. So as I mentioned our workflow was working on a feature and when we got the patches test again the subset of the CTS. And it passes, integrated the patches and also increased the subset. Right now with the Vulkan 1.0 almost ready we have around 10,000 tests working. And they run in around 10 minutes. But at the same time we are not including all the possible tests. So during the week we run the full CTS 2 or 3 times per week more or less. And the good news is that right now as we don't have too many crashes the time needed for that is not too much. Because as I mentioned one of the reasons to use ParallelDKLunar is because it handles better the crashes. But in any case if you are doing a full room with all the tests and you get too much crashes the times are really bad. So at the beginning around June, May you needed like 10 hours for a full room. And right now as we are almost passing all the tests in 4 hours you have the full room. During the development we also have a philosophy of a set as much as possible. And when we have set as much as possible I mean mostly for fitters that we are not implementing or that we know that they are going for code path that we are not implementing yet. The reason for doing that is that we want for any error or any problem or any failure on the test we want the test to file as soon as possible. I want more code because then it's more easy to track. And at the same time in general for GPU it's a bad idea to try something that is not properly or yet fully implemented. Because normally that goes to a GPU hunt. So in the previous case that I mentioned about doing a full room if those crashes instead of being crashes on our code were crashes while trying to run the test till the end a GPU hunt will just stop the room in the middle. Additionally to the development we were providing updates through our blog posts. It's true that at the beginning of the project we didn't provide too many blog posts but it's basically because we needed a lot of the basic stuff. All the outcome was not really eye candy. In general blog posts are better if in my opinion if you have image. But since March May when we started to get something was working especially the Sasa William demos. We started to provide more updates around one to two blog posts per month. So the current start of the project. Right now as I mentioned at the beginning of the presentation our objective at the beginning of the project was getting PUCAM 1.0 complete. And we can say that we are fulfilling that right now the mandatory feature set is complete. In addition to that we are also supporting some of optional features mostly for features that were easy enough to not consume too much time. Because for some cases as I mentioned before we were using the V3 compiler. So in some cases some specific standard operations were optional on Vulkan but we were already supporting that on the compiler that we were using from OpenGL. So since then also we have all the features implemented. We still have some tests and bugs when we run the full CTS conformance. And right now we are passing around 100,000 tests and we have only four phase to go. And for those four we already have patches for two of them. They are under review right now. As I mentioned we have some testing with some Vulkan posts of the Orinia Quake Trilogy. And they are working with a reasonable performance. And we also, after we didn't test directly, the Raspberry community has tested the driver with PPSSPP that is a PSP emulator using Vulkan as vacancy. And in addition to the Quake games it's also working for OpenArena that it's a game which is based on Quake 3. And as mentioned we have several of the Sasha William demos working. In fact right now I think that only 10 are failing, more or less. So most of them are working. Until now we didn't do too much performance work. Mostly that performance work was made with the Quake games. Because the thing is that when, as I mentioned, we tested a little the Quake games, the Vulkan port of the Quake games. And we got them working but the initial outcome was like 10, 5, 5 per second. So we realized that there were obviously some performance issues there. But in fact as soon as we started to work we solved all the evidence or all the more important parts of the performance. And we have patches that went from 10, 5 per second to 40 and from 40 to 80. So we made some work and for example we wanted to compare the Vulkan port of the Quake 3. It has two renderers included on the same project, the OpenGL1 and the Vulkan port. And comparing both we got a really similar equivalent performance. In fact it's likely better, much faster in the case of the Vulkan drivers and the Vulkan renderers. In the case we are aware that there are some load paths in the driver. Specifically when we talk about transfer operations. And we think that one of the things that this transfer before GPU includes is a TFU unit. That is a texture unit that allows to do some copies directly. The thing is that for using it you need to fulfill some constraints. And if not you need to use, we use a shader, a bit shader. So basically an internal program that we use just for copying. The thing is that although we are using the TFU we know that probably we are using it. So for the future one of the plans is to try to use the TFU more often. So a little about the implementation changes that we found while recording on the driver. One of the first things is that Vulkan spets everything to be executed on the GPU in order to make the most of the GPU. But for our case this is not quite possible in some cases. That means that you need to implement some chan and you need to add some coordination. Usually that coordination means that you need to do some flashes that means some weights. This is not an idea and we also think that probably it would be possible to avoid some of those cases. But we are not sure that for this architecture it will not be possible to avoid some CPU-GPU coordination during the process. Another challenge was related with the Liner display pipeline that was very before. The issue is that B3D cannot sample for linear image. So that means that for now we don't support sampling from on this web sense that is assumed by default. Also it's optional so we disable that feature. In theory we should be able to sample in a window when moving to the compositor. Well that's for sure. So in theory it would be possible to allow sampling to be changed in one case or the other. But we are not sure if that makes sense to do that because it can be confused and we are not sure if the developers could realize or would not get confused by the fact that some features are supported on one case and not on the other. So for now we are going to keep out and disable that feature. Another challenge that we found is that the Vulkan pipeline in the state is not always sufficient. The thing is that on Vulkan the idea is avoiding the need to recompile shaders when you are drawing rendering. This happens on OpenGL as it's a stage machine that in some cases the stage changes that you need to recompile the shader in order to reflect that change on the state. The idea on Vulkan is avoiding that so when you create a pipeline you provide all the state needed to build the shaders among other things. But the thing is that from our case we found some cases that that is not sufficient. The more clear case is the case of the textures because depending on the texture format the output, the return size of when you access the texture could be 16 or 32 bit but you don't know the format until the descriptors are bound. So what we do in order to avoid our compile during drawing is to recompile two shader variants in advance. So we compile an optimal case that is assumed that the return size of the textures will be 16 bit that in general will be the case for most applications on games. But we also recompile a fallback that is 32 bit so when rendering or when the descriptors are bound if we found that we need a 32 bit format we don't need to recompile we just need to switch from one of the shader variants or the other. The other challenge that we found is that as we mentioned at the beginning we were using some pieces from Mesa. One of each is the widow-shutter interface for Vulkan. The thing is that on the current implementation the optimal path requires PCI GPU, a specific extension but the Raspberry Pi display device is not PCI. So right now we have a request with a solution proposed that is still under discussion. It is worth to know that this is not one of the reasons that we are failing tests. This will improve the performance but things still work without it. So we are still working on getting a solution but with the driver working. So about the features for the driver. As I mentioned before right now we implemented all the features for 1.0 core. So our short term objective will be to get all the CTS tests passing so we could get the conformance from Kronos. So this basically means fix those tests, make a full run of all the tests suite and then send to Kronos in order to get the driver to be conformant to 1.0. So after that we have several to do items. As we mentioned for transfer operation we have two main code paths. One is using the texture unit and the other is having a bleed shader. So we want to explore better if we can use the TPU unit more often. As I just mentioned we have still this issue about the window system that we want to improve. Another item to work is improve the implementation of input attachment. The thing is that input attachment is one of the features that was added on Vulkan and thinking a lot on GPU architecture based on tiles. Like the one used on the RPA Raspberry 4. But our cooling implementation is basically implementing it as a special texture without going too much into getting the most from the tile architecture. So we would like to improve that. As we mentioned we are already supporting some optional features so we will need to evaluate. If we want to support more feature extensions probably we will need to evaluate which ones are the more popular in quotes. And maybe although we still have a lot of testing to do maybe evaluate if it makes sense to start to work or start to make a plan for Vulkan 1.1. Long term we are still thinking about improving the reuse of the cold combined with the OpenGL driver. Because the thing is that as I mentioned we were using some pieces that are already being used like the old driver. But for some features that are similar but not exactly the same. We didn't try to do too much work on refactoring. Because we didn't want to enter on a cycle of refactoring that basically was like ok this is really similar to Vulkan OpenGL. Less refactoring the OpenGL implementation and then found that we needed to generalize another tile here, another tile over there. So we had some features implemented on the Vulkan driver that are similar to the OpenGL driver. And at this point now the things are more stable and we know better what the difference between one and the other. Probably we are on the point that we could reuse a refact of both solutions after trying to use just one. In the same way as I mentioned before for Vulkan we were focusing on the mandatory features for 1.0. That includes some features that are also part of OpenGL ES or they have equivalents but we didn't implement yet. So in this case we implemented some features on Vulkan before the OpenGL ES. So it would be possible to port to OpenGL ES. For example, hardware with a simple resolve, simple reshading and robust buffering access. But for the long term one of the most important things that we need to do is doing more real-world testing. As I mentioned we were evolving the driver basis on the official test suite from Kronos. And we also did some testing with the Quay Trilogy with Vulkan port on the Quay Trilogy. But we still need to do more work testing all the applications we use in Vulkan. Because in the end it's really likely and almost for sure that we will find bugs doing that. So for the people that are interested in contributing we are trying to provide a stable contest. So to enable a stable contribution as much as possible. But one of the issues is that the V3D documentation for the GPU is not available for the general public. But the good thing is that there is already available an OpenGL ES 3.1 driver. That we think that that make up for the lack of documentation. Because at the end that driver is implemented by the documentation world. In some cases there is a lot of relevant information from that. For example the packets. For implementing the Vulkan driver the packets that we send to the driver, included on the DM messages, are the same. So it's a really good reference for anyone that wants to contribute. Additionally while we were working on the driver we were letting several fixings of the source code. A lot of different types of fixings but in general we ran down things that are pending to do. That's a fix me by definition. But we also tried to write down the details of what is missing. So anyone that wants to jump in on that fix me knows how to contest. Like for example if we have an algorithm and we know in advance that probably it could be implemented in a better way. Or for example fix me on where to implement an optional feature if we wanted to implement that. So for someone that wants to contribute one way to do that is to do a web on the source code for the fixings and just pick one that they think that they will be able to contribute with. And as we mentioned we were focused on the 1.0 core features for Vulkan but there are several optional features pending. So one thing that anyone that wants to contribute can do is just use book and info to list all the features that are pending to implement and just pick one and try to implement it. And the good thing is that the CTS already has tests for those so they will be able to jump to the code without needing to write the test. And as I mentioned one of the things that we need to do more is test the driver with a real application but because we already did with some. So one thing that people that want to contribute can do is to test this driver with applications and provide feedback on behavior if there is any book or if there is any problem with the performance at some given point. And for the feedback we are usually on an IRC channel at Free North called Slash Video Core. Anyone can send an email to MesadetMellyList and as I mentioned before the driver is now an official driver on MesadetStream so that means that if anyone finds an issue they can use directly the GILLAB issues that is the way the project tracks their issues. The good thing about this movement to the upstream is that if the bug in the end is not built on the backend but it's on some of the pieces that we use from the Mesa project it's easier to ping all the developers for those. So now that I'm finishing the presentation I want to end finishing with the special songs. As I have just mentioned we are releasing a lot of work done already on Mesa so we want to thanks to all the people that have been working on Neer, on the SPRIVITOR later, LIGATION EXTAN, on the Windows system integration beach etc etc. We also want to thanks the system Mesa book and driver developers. The thing is that right now there are four book and drivers and also obviously they don't match exactly from one GP or from the other it was really useful to have some book and drivers as a reference and to get ideas about how to implement some specific features. So thanks a lot for all those people. And we really really want to give a special chance to Eric Ahol Eric Ahol was the maintainer, the original maintainer of the OpenGL v3d driver. So for example the compiler, the v3d compiler that we use for the book and driver was writing basically by him, we have been just extending it and he also helped us a lot during the process, answering our questions and he was the main reviewer for the patches that we sent for the part that we need an upstream review. And finally we would like to thanks Dave Emmett that was our contact of the Broadcom that was really a really helpful contact each time that we had some specific doubts about the Broadcom GPU and with a really detailed e-mails. So that's all, I hope you like this presentation and if you have any questions.