 Hello, Fosnum. I'm Dmitry Malyshev. I'm a graphics engineer at Mozilla, and I'm going to dive into the history of WebGPU with you, fortunately, and if you survive, I'm going to show you how we use Rust in it. So there's also quite a bit of bright colors here. Beware, you have a history of epilepsy or something. So this is a fairly long agenda. I hope we are going to get at least 2.4 here. The most interesting stuff is at 0.5. So we'll try to keep the pace. This is the screenshot from Red Hat Redemption, and what I want is for us to be able to write code, graphics code, which would produce a similar picture on all platforms including the web. Today is not impossible at all. If you use a commercial engine like the one that Rockstar uses, then you may be able to reach consoles and some of the desktop platforms, not all of them, and you will not reach the web by that. So this is a very, very unreachable goal today, and this is what we are trying to fix in the WebGPU working group. The API that is today that is portable is OpenGL. Unfortunately, OpenGL is dying, and every vendor tries to implement their Vulkan drivers in such a way that OpenGL is just an afterthought, and nobody optimizes for OpenGL, and it has very many limitations that I try to express. The only good thing about OpenGL is that you can target the web with it today, with some limited subset of it. Many concepts of OpenGL survived through the beginning, and they are very far from either the way hardware works or the way we think about graphics programming. So it's not a protocol for us graphics engineers to talk to the modern hardware today. Microsoft is not officially supporting OpenGL, and it's forbidden in the Universal Windows platform. WebGL actually works via D3D11 on the most popular platform and Apple officially deprecated it. So you can make conclusions from here. This is a more technical slide. We can't use multiple cores. We can't predict where the driver is going to try to freeze us for hundreds of milliseconds. Optimizing for mobile is a challenge. Data transfers is an art, and WebGL 2 is not even portably supported. OpenGL evolved in such a way that allows you to skip it and do more on the GPU, and that path that does not allow rich roles like the one that I showed you in the beginning where there is like thousands of small objects with different materials. This is just not acceptable. So the first question, has anyone heard the fact that Apple started WebGPU? Yes, two people. Great. This is not correct. It couldn't be further from truth. The only thing that Apple started with WebGPU is the name which the group took from Apple. So this is how it started. Summer's 16, Vancouver. We gathered around the browser vendors, Intel, a few more folks, and we didn't know what we were doing. We were trying to figure out, is there a way to design an API that wouldn't be as complex as Vulkan while still be portable and maybe target the web? Because WebGL is technically a Kronos project, and OpenGL and Vulkan are both Kronos projects. So this was in the Kronos meeting, and browser vendors came with their own proposals on what this is going to look like. We, Mozilla, brought something called Web Metal. It worked on top of Vulkan. Apple brought something that's worked very similarly like Metal, and they quickly implemented it in Safari Preview. Now all of that code that they prototyped is gone. Our code was implemented on top of Servo and it's also gone. Google brought something original that they called NXT, and there were some heated discussions on where do we go from there. The main conclusion was, we couldn't work with Kronos because reasons. Those are something that we cannot explain. So Apple created a W3C group and invited pretty much the same people as have been working on that. So we moved to the group and started debating basic principles of how API should be designed. In 2017, we agreed on the binding model, which is critical for us because we didn't want to agree on the shading language and having the binding model allows us to decide that later or never, I don't know. Maybe even next week, we'll see. So in 2018, we agreed on implicit barriers, which is important for setting the direction of how the API, how simple this is going to be used. Because Vulkan and D3D12 require you to explicitly transition your resources between states and synchronize between pipeline stages. In 2018, we started Rust project called WGPU, which implements WebGPU for native platforms. In 19 a year from there, we started integrating it into Gecko and use it as the basis for WebGPU implementation in Gecko. Gecko is the engine for Firefox in case. I haven't seen that. The basic principle is we try to intersect the APIs and see which parts are portable, which parts can be executed on all of them. In the end, we converged to basically metal with Vulkan binding model. But again, this was not coming from a single API. This was coming from outside and trying to find the sweet spot. You know the comic about the 15th standard? That was the main feedback we got from developers. Please don't do any more APIs. We're, there's too many. Sorry, we did. And it is going to happen and we believe in it and we believe that that new API is actually needed because it's going to be the new era OpenGL sort of thing. It's going to be something that you would be able to target all of the platforms with not much effort, which you can explain to your kids and they can understand your code. So today, as I said, only OpenGL is close to that, but it's dying. So that's what WebGPU is trying to be. Principles we followed for designing it. Obviously, performance, everybody says performance. Nobody knows how to judge it between the others. So security means that we validate all the inputs. We don't allow anything to come to the driver that the driver doesn't know what to do with. So in case of Vulkan validation, we require the workloads to reach the driver with zero validation warnings or errors. It also means that we need to sanitize the shader, without bound access and lots of things, which may potentially make your programs slower, but we're trying our best to reduce this overhead to the minimum. Importability is the most tricky part. At least it took me a while, maybe a few years to understand what this means. So this means that programs are not only compiled in old platforms, they behave similarly. Like if you have a use case and your API tells you that there is an optimization for your use case, then you can expect that it's not gonna be slower on some platform because of that optimization. So this is not the case for any other APIs. Any other APIs require you to discover the features on the platform you are running on and use those features. So we expose features in such a way that if it matches your use case and you use it, we can guarantee that it's gonna be not slower or likely faster. And obviously we want it to be usable as a side. We want it to be a teachable API. We want people to understand what they write. These are early native benchmarks of the Aquarium demo. Many people may have seen it in WebGL world. Done by Intel. Not very convincing, but it's basically 10 to 30%. It's very early. Benchmarking is one of the areas where we are lacking and we're gonna catch up to that. So the other numbers Apple gave us are more convincing but I don't know if we can seriously take them. You can see like sometimes performance on number of triangles. And I'm not sure why number of triangles would be different because it's anyway reaching the GPU and GPU is gonna do this, the work but that's the numbers they have. Now we're gonna go through a example of arrest program in written against WGPU-arrest, explaining what this means and how it works. So this is initialization. Which means we create an adapter and we create a device. An adapter represents sort of a physical instance of your hardware, you can say. And the device is a logical instance that you work with that owns all the resources. So we can say we want no power preference and we want any kind of backend which the backend can be D3D12, Vulkan or not all. And we request the device with some features and we get the device. These operations are in the API they are asynchronous. In our wrapper so far they are synchronous, we're gonna change that. Now the swap chain installation is completely separate from the device. This is for any of you who only know OpenGL can be weird because you don't have to have a swap chain. You can do compute only workloads without even caring about presenting anything. So this is optional. If you need to present something you create your swap chain based on the surface that you get from the window. We create the data and the important bit here for buffer creation we specify what this buffer is gonna use it to, in which case the vertex means it's gonna be used as a vertex output buffer. And this is a descriptor of the vertex data that we're gonna use later on. Know that the descriptor is separated from the actual data. In OpenGL where you provide those layouts you provide the pointer to the data which doesn't make sense for the hardware. Explicit APIs is a long topic but I guess by this point you realize that the answer is no, like all the headlines. This is an example in Vulkan in order to create an image you have to go through four steps or five. You figure out what the requirements are, you create logical image, you find your memory pools, allocate memory and then bind them to image. So there's a lot of going on and this is not something simple. People make mistakes all the time in that. And in WebGPU is just one command. Of course, by hiding this we introduce some level of uncertainty. Maybe you will not have enough memory and we will have to allocate more. This is the balance that we've reached so far. And this is a highly subjective scale that will allow you to reason about what WebGPU is and why we are doing this. The most performant versus the ease of use API is Vulkan. It's close to the hardware, it's not exactly hardware. And OpenGL is sort of neither, it's easy but it's not performant. Bentle is pretty good, it's both. And I can talk more about metal after the talk. And WebGPU is sort of trying to get the best of the world here. So we're separating data from the layout of the data. And this is the bind groups. Bind groups is sort of a descriptor set in Vulkan or a range of descriptors in D3D12 which has a number of resources that you bind at once. That's why it bound as a group. And we create a bind group by providing the exact buffer that we just created. And this is the explanation of why we need the bind groups. The shaders communicate with bind groups at the bottom. They get input from the vertex buffers at the left and they render to render targets at the top. Now we're gonna talk about pipelines. Why do we need pipelines? If you ask, if you were used OpenGL only, historically the graphics hardware was just having fixed function pipeline. It knew how to shade simply some triangles. And then some of the stages became more programmable then they became even more programmable and then they became united into compute stages and then more and more fixed function stages that were left. Internally, the driver started to convert to code and link to the whole shader stages that they have. So when you switch your blend mode in OpenGL, this is not a simple switch. The driver may need to internally rebuild your complete program, reallocate registers and do crazy amount of work just to let your next throw call to pass. So the pipeline is a concept that gets the shader with all the dependent structures that it needs to work with and forces you to declare them in advance. So when you create the pipeline, you expect it to be slow and you have to specify what your vertex buffer layout is but not the data, what your render target format is but not the data. And again, what your bind group layouts is. Finally, the vertex buffer thing is something that we consider to not have. Try to imagine yourself programming model without vertex buffers. Maybe you will have to. I don't know. The problem is that it's very difficult to make them secure. Out of bound access is not covered by out of bound extensions for vertex buffers and it's complex. So we're still trying to figure this out. We will try to keep them. This is how we create pipeline. As I said, we have to specify all of the state that your shader depends on, which is the your render target information, vertex buffer information, your shaders and pipeline layout, which includes all the resources that your shader sees based on the stages. So you declare all of that at once. Again, this is Rust code. It uses enums. It uses references. For me, it's pretty readable to work with. Now we have rendering has rendered passes, which means that you don't just bind a target and draw render passes are needed because mobile devices where there is fast memory and they don't want to work with the whole screen at once. Loading pixels into this fast on chip memory is heavy. It's slow processing them is cheaper. And so, typically it would go tile by tile and it would load all the pixels that you have in the target. It would run your shaders on it and then it would store them back into the tile. Loading and storing is memory intensive operation that is very slow. And therefore, you can control it. You can say, don't load anything for me. Don't store anything for me. Just clear it to this color, which means you're saving a lot of memory bandwidth. This is a thing that is acknowledged by all of the native APIs today. It was from the beginning in both Vulkan and Metal and lately it will be added to D312. The latest version also has render passes. And you can see here how we are creating a render pass and we're specifying the load operation to be clear, which means we don't want to be loaded. We just want the clear. We specify the clear color. Inside the render pass we do all the rendering and we set all the state, which is the pipeline, the bind group, index vertex buffers and issuing the draw call. The beauty of this setup is that by the time we call draw indexed, all the state that it depends on is already set. Draw index is blazingly fast. There is nothing that's going on in the draw index other than maybe some vertex buffer validation that I mentioned. And you are supposed to be able to issue tens of thousands of draw calls in big, fat render passes. This is the goal that would allow us to produce those shiny pictures. See that multithreading is something that all of the new APIs support. In this example, there is three threads involved. On the first thread we record this big render pass that we just talked about. On the second thread we record some compute work and at this stage where we compute it, we don't know when exactly it's gonna be executed. We just tell it what to do. But then we send it to the thread that owns the queue and we say submit first and then second. This is where the order is set up. So all of these can be on separate threads. This is especially important again on mobile devices where we have lots of threads but maybe not so powerful threads. This is how it's done in code. You create an encoder, the encoder can then have some compute or render passes. You finish encoder, you get the command buffer which you can send to the queue or just provide to the queue directly and submission is getting an array of them. Implicit Bediars is something I mentioned as well. So here you can see that we have four different pieces in the command stream and the usage of texture and the buffer changes between the operations. Some of the usages can be combined. These are really only usages like in this 3D12. Mutable usages cannot be combined if you try to combine them within a render pass which is the synchronization scope for us. We will trigger an error and you will not have what you need. WSL, is that our shading language? I mentioned that we haven't figured out yet. No, it's not. It's just a proposal from Apple. Mostly we're discussing whether we can accept it or not. We don't have shading languages I mentioned. We don't support multiple queues which would be important for a sync compute which you kinda can do already but multiple queues make it nicer. And our data transfers between CPU and GPU are not optimal yet. It's hard to figure those out in terms of the security, not allowing CPU and GPU to race to change and inspect the data because this racing introduces non-portable behavior, potentially unsecured behavior. This is the main question. You think it's only for the web? Who thinks it's only for the web? Nobody. Great, because that's the main point of confusion I've seen. Every person that has heard of WebGPU says, oh, it's another web thing. I'm not gonna touch it. Okay, so the working group is designing it for the web. But what we realized, as in Mozilla and Google at least, is that it's a very good API for native platforms as well. As I mentioned, the new kind of OpenGL thing. So we are working on shared headers, the C headers, that you will be able to link to so that you can use interchangeably either our implementation or Google's implementation on native platforms. In fact, people have already shipped applications on App Store using our implementation on native only without any web stuff. So, oh, this is supposed to be demo time. Oh, oh, oh. That's gonna be fun. Okay, here. So this is G-Rating Meetmaps where I will explain why it's complex later. This is some two shadows. You can see some validation errors which were discovered to be validation error, validation layer bugs, which is fun. And this is, I wonder if it's, oh, it's loaded on my machine. Great, how do I show it to you? Yeah. Oh, great. It didn't go well. Okay. So this is like 10,000 objects with some ray tracing and a lot of other good stuff and physics and it's all running on Vulkan on my machine. It's equally well running on D312 and Metal. Thank you. Now I hit something that I wasn't supposed to. If you don't know what GFX is, I encourage you to rewind back time two years before where on this track, me and Marcos from GFX R&S team were describing what this is on a very similar slide. So one problem that we had with GFX that is solved here is that if your rendering depends on the backend and we use genetics for the backend, then it becomes contagious and you have to have all of your game depending on the backend type, which is very inconvenient and you can switch back ends at runtime if you want to. What we came up with is a macro. So there is a EC function and we expose the C API that guess the ID, figures out what the backend it needs and just checks the backend in a match and calls the appropriate genetic function device create buffer here. So this is what we use internally. This is just a means. For your perspective, nothing is genetic. You just work with web, you don't even know what backend it is until you ask. So it will choose one that works best for you automatically. This is what ID looks like. You can see the last three bits is the backend. In the middle there is epoch so that we can reuse indices in case your resources perish and we need to recreate them. And those tables of actual data that we have, they are backend specific. So we have a table of buffers for Vulkan table of buffers for metal and so forth. Usage tracker helps us to figure out how resources transition from one usage to another. In this scheme we came up with this that each tracker knows about correspondence to indices to states and the state is again a map of sub-resources to usage. And why do we need sub-resources? Sub-resource, a texture can have many MIPMAP levels. Each MIPMAP level can have many layers. Each of those can have its own states. If we go deeper, depth and stencil in some variations of APIs can also have different states. That's the third dimension. This sounds trivial but it's fairly complex to implement. At least Google hasn't yet and we have. And that's why I mentioned that MIPMAP generation is sort of hard because you change the usage and one MIPMAP does not equal to the usage of another MIPMAP when you generate it. So this is how we define the scopes. The errors I inserted is where we try to insert synchronization in the command stream. And the interesting bit here is that render pass is a whole scope, copy operation is a single scope but confused pass is not a whole scope. And that matches behavior by default in metal and it's really tricky because we are trying to minimize the risk of races and what we do, we synchronize between dispatches if there is the same resource used as a storage, for example. And you can use resources for different usages within a pass but you can argue that within a single dispatch call there may be multiple races on the GPU and we don't protect you from that. So there's always this scale of where to draw the line and in this particular case we decided to draw it between dispatches. We may introduce an option to have compute pass as a single scope. It's not a big deal for us but it affects you. So that's how it looks today. And when we patch between those scopes we know where each scope expects the usage to be and that's the expected value and we patch between those between those. So the bind group and render pass usages are all combined, they can be mutated inside and the command buffer and device usages are transitioned one from another. So this is a different logic and what we were trying very hard to express is using the Rust type system, right? Between this is different logic. Why don't you use genetics, you ask. So we had three attempts to implement it using genetics and you can get it to work but it's not looking pretty at all. So what we ended up doing, this slide should not be here. I think I explained it before about NeapMaps. So what we end up with is just runtime thing. As you can see the initial state is an option which means if you want to transition it to a different one we remember what the initial state was and you may wonder why it's so simple what are you showing me here? As I mentioned there was like multiple tries to use Rust type system. They ended up being very complex, hardly maintainable code. So this solution is super simple. It's straightforward, everybody can see what it's doing. It's more generic. It's an example of how solving a more generic problem gives you a simpler solution and I think it's a good example of that. We track resources for the user. So user has one link to a resource, device has one link to a resource and then everything else like commands and bind groups have different links. Once those other links including the user one go away and device is the sole owner of the resource, we figure out what was the last submission the resource was used and we associate it with the submission. Then periodically we check the submission to see oh is it done yet? And if it's done then that's the time where we actually release the resource and try to recycle its memory. This is all done automatically for you. This is the structure of WGPU. Many ways here, I'm trying to explain the main ones. So the one that we have now is the only one going from WGPU-RS which is our Rust wrapper to the native which is a C implementation or C API implementation that goes into core which goes into GFX. What we will have is that WGPU-RS will call websites and will allow you to target the web directly without going through M-scripten or anything. Alternatively, you can use M-scripten to target the web or you can use the shared path between the WGPU-native and Dawn because they share the same C headers and you can run on top of Google's implementation. In AMS are beautiful, I think everybody understands that and they work very well for us. Render pass resources, the problem we have is that within a pass we don't want to reference resources but the API requires us to hold them alive and that's tricky because we don't want that overhead. As I said, you can have 10,000 draw calls in the same render pass. We don't want for each draw call try to lock those resources and keep them alive. So we enforce the lifetimes. They give it to us or at least for us for free but you pay for the headache because you're gonna use that and lifetimes is not something that people love to see errors about but actually it's fairly straightforward. So what we ended up with is a system where Rust allows us to express those optimization guarantees with you and if you need them to be held, you do it on your side. Typically you don't, so you don't pay anything for that. Command encoders are, they have special marker that does not let them to be moved or synced automatically because send and sync are auto trades. So this prevents those auto trades and this allows us to enforce the guarantee that when you're encoding, you're not moving between threads or you're not doing it unsynchronized. We use a lot of borrowing and the pretty thing is that when we generate the C header, we can still use extern C functions with borrowing. It translates to a pointer but not slices, just regular things. This is one of the most complex bits we have. So as I mentioned, our low level structures are placed in arrays and we lock those arrays in order to access them. And you can imagine a case where the locks go in the wrong order, that did happen. So we ended up with type level lock order protection which is here the talk and I try to highlight including the one that failed to highlight. These are all different types as in they depend on the different genetic and that genetic changes, it shows the code what is accessible to lock. So if you're trying to lock something weird, your code will not compile. If your code compiles, it doesn't have phrase conditions and deadlocks. We use parking lot which was mentioned already, great stuff, various arrays, C bind gen that I mentioned. For example, we use winnit and of course we based on GFX and a bit of Frendy, just the descriptor and memory stuff of Frendy. As I mentioned, you can have externity function with the borrowed pointer but you can't have slices and that's very annoying. We have a lot of slices and always converting between slices and pointers is not pretty. So I would love that to be better. And the one lesson that I want to tell you as like Rust community is that we concluded that genetics are not always good. Maybe it's obvious for you, we used a ton of genetics in GFX and on WebGP we're trying to minimize that. Genetics slow down the compilation time and they limit the user quite a bit. They make it difficult to work with the code. So we're trying to do less of them. So I mentioned we are gonna target WebCs directly. That's one of the things. Error handling is bare bones. We have asserts somewhere results but all of that is gonna be replaced by the proper error model in the API which is the internal nullability. Which means if you're trying to do something illegal to an object like you created a buffer with the wrong descriptor then it may be internally invalid. You can ask it whether it's valid or not but you might as well just continue your recording and pretend that it's valid only to discover it later. And the reason it's done is because we can't give it to you synchronously and we can't let you block. So we kind of let you go and then when you're ready you're gonna check whether your workload is valid or not. On our side we're not gonna do any invalid work of course. If we see during command buffer encoding that used in valid buffer your cool command buffer becomes invalid. We very much would like to target OpenGL as well. It's technically a target for us but it's rough. We don't recommend people to use it. We don't even expose it via the Rust wrapper yet. So more work coming. There is a summer of code project that is related to that. So maybe in half a year the situation is gonna be better. Ideally we would target WebGL 2.0 by WebGPU which is gonna be fun as well. These are some links of the projects. At the top there is a spec repo. I'm one of the editors for the spec and there is a WebGPU headers at the bottom and there is our previous talk about GFX at the bottom. So now we have time. We're gonna talk about browsers. This is how GACO is structured today. You can see it in Nightly. We have a small crate which is called WGPU remote which allows you to record passes and create IDs and allow the content process to manage IDs and passes and the GPU process to actually execute them. So this is a fairly simple structure but the point is WebGPU is used across multiple processes. This is an example of how it's done. So the client creates an ID. It sends it to both JavaScript and the server and the server will only after wise actually create the buffer. Render pass recording as I mentioned is completely on the client side. So while you create your render pass it's super lightweight. It doesn't take anything. We use Peakpoke serialization library. I know everybody uses Serdei so Peakpoke is a kind of dark horse alternative. It's not recommended to be used. It's only used by WebRender today and WebGPU. We'll try to make it more accessible. It's just slightly faster. And then when render pass is done we send the whole thing via shared memory to GPU process which actually executes commands. There is a lot going on here that you don't see. Imagine that the barrier between processes involves a code generation change in C++. The barrier between JavaScript involves a code generation phase in JavaScript bindings and between each of those there is a C to Rust boundary. So we cross like four different boundaries across this stack. It's fairly annoying and we're trying to get it better. This is subjective measure of completeness between browsers. Chromium is fairly good. They have a large team and they have a lot of API surface implemented. We just got the compute working. You can get the nightly, go to the Prefs enable WebGPU and run the compute example that Chrome does. We will work on presentation and rendering next. And Servo is also catching up. For Servo, catching up is fairly simple because it's pure Rust. You don't have as many barriers that I mentioned but it's using the same infrastructure. It's using WGPU. And I'd like to thank some people, my GFX team, WGPU community, Josh especially, and Karensin from Google for helping out on the spec and with the slides. So that's it. I think we've even finished early. You mentioned that you don't have any final proposal for shading languages. So what are the proposals, the most possible ones, that you use instead and which models, which proposals are for shading languages? The question is what are the proposals for shading languages? So we are considering three different things. WSL is what Apple is proposing and it was originally called by two different names. I think they call it whistle now. It was based on WGLSL originally. Now it's something not original. Oh, now it's something original. And the other alternative is PRV. The problem with PRV is basically the same problem as why we are not doing it in Kronos. So because it's a Kronos thing and we are concerned about fork in PRV and whether it's gonna, we will be able to like synchronize with the main development of PRV, how our interests are gonna be represented and so forth. And we are exploring something that would be like PRV or it's semantically close to PRV, but it's gonna be textual so that you can see the text but you can do a single pass and convert it to or from PRV which may be the compromise that we are looking for. Yes? I guess if I was seeing a bungee or something and thinking about writing the kind of thing at the beginning, like destiny or something where you've got multiple passes to do all your lighting but I'm gonna ask you where the diff sets are. The metal doesn't allow you to do the different sets right which Vulcan does and that's one of the key performance things for generating your next frame, your pipeline and you don't have any of those. Sorry, is your question about descriptor sets? Yeah, but the diff, you know when you can take the diff sets so you can just say which state has changed and what hasn't and then you can update that without having to sync the whole thing. Metal doesn't have that, metal too doesn't have that but Vulcan does and yes. Oh, your question is about the diff sets for the pipeline? Yeah, yeah, exactly. Oh, no, we don't have the diff sets. I think that's a very, very like low level thing that is not exactly, we would need to go through the process of doing the benchmarks and seeing what platforms is effect, how we can port it to other platforms and we have much more bigger things on the table now than this one but yes, we kind of keep that in mind. The API is not finished, it's working progress still. The later this year we are hoping to get the minimal viable product of the API that people can try and is implemented in three browsers or maybe four with servo. So we're still working on that. Is there any interest in standardizing texture formats? We are gonna have a list of supported formats but all of the compressed ones are going to be extensions because they're not supported on any particular on all the platforms and we are interested. We are going to be looking at more elaborate texture formats as a follow up. What would be the alternative API for providing versus data? So we have the shader sees resources in bind groups and you can technically just bind your buffers inside some bind groups and read them directly which is no worse. In fact, we did some benchmarking and on many platforms this is what the drivers actually do. So we don't see any performance penalty but on some very few especially older platforms which still have some fixed function hardware this is slightly slower. And the main problem is that it's just inconvenient. Nobody is used to that workflow that you don't have third six buffers. The GLSL for WebGL 2 is much simpler than probably whatever we're going to make for WebGL 2 and the compilation performance is still absolutely terrible and usable pretty much. How do you, I mean I talk to people from Chrome and Apple and they are promise is going to be solved but do you think it's real progress being mad? Because I know you have a lot of validation for the compilation but do you think there's new discoveries or ways of how to actually implement that because that's in my opinion the biggest linear on doing something quite large on the web. So that's my question. Okay so the first note was about the number of bind groups and we have the limit structure that you have from the adapter and you can see that the adapter supports more and you can request more. So you can design your application so that it's not gonna run on mobile if you want to. But if you go by default you only get four bind groups. For another question was how do we handle slow pipeline creation performance because of the shader building and because of the fact that we need to transform the shaders to do bounce checking and a lot of other safety transformations. So our measurements so far show that at least in case of SPRV the time spent on doing the transformation passes on our side even converting from a whistle to SPRV is negligible compared to the time that the driver spends that building the actual microcode for our shader. So measures that we are gonna take internally like aggressive use of pipeline caches and that stuff. But the other thing from the API side we will have it asynchronous. So you request the pipeline to be created but you don't get it right away. You get a notification where you can use the pipeline. I don't think there is much we can do besides that like once we pass the SPRV or MSL to the driver there is nothing we can do. But we have very good relationships with vendors from Kronos. So we are gonna be talking to them as well. Thank you very much. Let's see.