 I will try to make an effort to speak loud. I have a very soft voice. But so I'm going to make my presentation a bit different. I'm going to start asking three questions. And hopefully I can get some feedback from you guys. Who here does not know what a GPU hang is? Carlos, you're a bit louder. We are not hearing you. OK. The microphone is OK. It's OK for the audience. Yeah. So I was saying, who here does not know what a GPU hang is? OK. So I mean, this is the graphics room, right? So I would expect it to have a bunch of graphics people. But actually, a GPU hang is basically something that happens with it due to the applications of the driver where the GPU just basically hangs. It's very simple. Now, who here has ever faced a GPU hang? A lot of people. And how many of you here has faced it on an Intel hardware? OK, good. So quite a bit of you. OK, so maybe this presentation could be a little bit of interest to some of you. My name is Carlos Santa. I work at the Intel Open Source Technology Center. And I support graphics drivers for Chrome OS. The topic of my presentation is a low latency GPU engine base reset mechanism for a more robust UI experience. It's a very long title. But from here, you can probably infer low latency GPU reset engine and then UI experience. So what I'm trying to do here is basically improve the user experience by introducing a different reset mechanism that we have in the graphics drivers. The agenda for today, a little bit of the problem statement, what is the limitation of the GPU driver, the proposed solution. It's called timeout detection and recovery. I'm going to try to explain that in a little bit. It's a low latency base reset mechanism, which means there's a timer somewhere. So I'm going to talk about a little bit of how low that latency can be. While we implement this mechanism, we need to also think about preemption. So I'm going to talk a little bit about preemption. And then the latest would be the status in Open Source about the TDR. So a frozen UI, a black screen, or both, or the worst case would be a whole reset of the system. So we would run the use case. It was basically a texture streaming video type of use case. And what made it really hard was that the issue could happen from, let's say, minutes to hours. We could be running this thing for 10, 20 hours, and eventually we'll see this issue. So that made a really, really bad problem. And so I was tasked to fix this problem. And so I basically dived into this issue by trying to understand the graphics architecture in Chrome OS, because that was the platform that I was working on. And then see how I could solve the problem. Now, I know this is a very busy slide, but this is an overly simplified graphics architecture in Chrome OS. The architecture is divided between the server side of things and then the client side of things. In Chrome OS, we have two main processes. One is the GPU process. And the other one is the renderer process. The GPU process is a special process in Chrome OS in which it's the only process that it can program the hardware. As you can see here, it's the one making the GL3D calls into the GPU driver. The clients, the context of the 3D clients, also reside in this process. And then on the renderer side, we have the compositor as well as the client side of the app. In between them, we have a share memory buffer, which acts as a proxy between them. Now, this architecture sort of works, but it's assumed that nothing bad happens to the GPU process. If something bad happens to this guy, then the whole thing breaks down, which was actually happening. We were running the video app, which is here. It is a very simple text streaming app. But it's one of those apps where you have the graphics API as well as the video APIs, both of them at the same time, and that was causing issues. That application was crashing. Once that happens, the default mechanism in the driver is to do a full reset of the whole GPU driver. Because the GPU process is a 3D process, if you reset the GPU, then the context goes away. And if that goes away, then that goes to your UI. So that was basically what was happening. We then realized that there was no really a need for us to do a full reset if the application was a video, if the use case was a video use case. I mean, looking through the logs and all the investigation that I did, it turns out that the problem was this guy. Something in the stream was damaging the media engine. And even though this was the issue, we were still reset in the whole thing, and that was causing problems. So that was basically when we realized that the solution to this problem for this particular application was to detect that it was, in fact, a media engine work, and that it was the media engine who was causing the hang. And at that point, we would just reset this particular engine. I looked internally, so I worked at Intel. We have so many people, but I looked internally, and it turns out the solution was already an open source. I mean, in the mailing list, in the DRM mailing list. It just so happens that those particular patches were not being merged for whatever reason. But so that's when I came in. I basically took those patches, and I'm trying to make the case that by using those patches, I can actually improve the user experience in Chrome OS. I'm talking about Chrome because this is what I work with, but this solution can actually apply to Ubuntu or any Linux based operating system. So the proposed solution is called Time Out Detection and Recovery. It's a new feature. It's still working progress. It can increase stability and robustness by allowing applications to detect when the individual batch buffer has corrupted the GPU. So that is the main thing that we are bringing here. It's like from the application side, you can detect or you can check early on whether the batch buffer that you're sending is causing the actual hang. Whereas before, when we realized that we were hanging, it was already too late in the path line. Here, the last sentence here. This is the basic implementation. Generally speaking, the implementation introduces a new IR2 handler in the i9-15 driver as well as two new GPU watchdog command instructions. I'm going to talk about that in a little bit, but basically on each batch buffer start sequence, we inject a GPU watchdog command star. And after that batch buffer start sequence, we inject a GPU watchdog cancel. Here's TDR step by step. I'm going to try to explain this in more detail. But basically, that dotted line is the division between user space and kernel. The media driver acts as the application side. And the media driver is the one that sets the watchdog threshold from each and every one of the batch buffers. So imagine you're decoding a video stream, VPA, or whatever. On each of those batch buffers, you can set the watchdog threshold. And then you can enable the watchdog timer. Then the media driver, watchdog timer star, and a watchdog timer cancel. The start instruction will actually kick the timer. This timer, it's the threshold. It's actually set by the media driver. So you, as an application developer, you are in charge of that threshold. Of course, you have a knowledge of what the work is going to be, so you can set it to whatever that workload is. You can set by the media driver, and that will start ticking. If the timer reaches the threshold, so it goes from 50 milliseconds down to zero, when it reaches zero, an interrupt gets fired and then gets handled by our RQ handler. And at that point, we've detected a GPU hang. That's basically how we're catching that the batch buffer cuts the hang. If the batch buffer, on the other hand, completes before the threshold value that you've set and it reaches the cancel instruction, then the watchdog timer gets canceled and then nothing happens. So that means that batch buffer is fine, and then there's no hang. So again, going back to the Chrome OS solution, GPU process, the media driver, the video application crashed. Now, instead of resetting the whole thing, we just reset the media engine, because we're setting the threshold to very small value than the default value from the graphics driver, then we can come back from that reset fairly quickly. We're talking about milliseconds instead of seconds, which was the default value before this implementation. Now, it's a GPU reset mechanism that is based on time, but how low can it be, right? As I was saying, the media driver can set that value, and you, as a developer, have the option to set it. But you can't really set it too low or be too aggressive, because then you end up with too many false positives. On the other hand, if you make it too big, then you're defeating the whole purpose, because then the driver itself has a default reset value that is outside of the GPU watchdog. As a guideline, this is the values that we are using right now. For 1080p, we're using 15 milliseconds. For 4K, 100 milliseconds, AK, 500, and 16K, 2,000 milliseconds. Of course, this is just theory, right? We don't have such a big display right now, but this is the values that we think we should use. And of course, this is still under investigation. And if people have other ideas or values we should use, then feedback is welcome. We're about preemption. So again, going back to the watchdog step by step, we have the media driver setting the watchdog time out. We flushed the bath buffer. We initiated the watchdog timer. We initiated the watchdog timer. And then the graphics drivers started processing the bath buffer. Now, what happens if during the execution of the bath buffer, we get preempted? And then we finable the watchdog. Then when we go back to that other new process that got us into the preemptive state, then that other process will then have the watchdog enabled, which is not what we meant, right? So then basically, I think what we're trying to do now is the driver itself must cancel the timer during the preemption sequence. Then another question would be, OK, you've enabled watchdog timer. You've preempted that process. You've disabled it. Now you come back to that process. And then what do you do, right? Did you enable it again? What happens to the watchdog timer that you started with? Let's say you started with 15 milliseconds. Then you tick that. Timer goes down, let's say, 30 milliseconds. Then you get preempted. So when you come back from preemption, did you start at 30, right? Or did you restart the timer again? So those are the questions that we are discussing right now. The last thing will be how a compositor could benefit from this feature. The way I made it is very like a general architecture. It doesn't follow any specific compositor. The only requirement would be that the compositor uses 3D acceleration, and it has access to the hardware through the KMFDRM API. As I was saying before, the VAPI driver has that new knowledge of detecting when a batch buffer has gone bad. And that knowledge has actually been exposed to user space, in which case the compositor can now detect earlier in the sequence when a particular batch buffer has hung the GPU. The way I could see this working would be, for example, for example, you're running a video. You're decoding the video. And so the compositor is basically sending those frames. Now, at some point in time, the video application crashes. But instead of the compositor blindly displaying that batch buffer that got damaged, it can actually use that information that we're giving the compositor and then show, for example, the previous good frame that it was able to render. This is the latest status in open source of this feature. Basically, it's a work in progress. But the discussion is actually happening in upstream. I was able to prototype that solution in Ubuntu OS using the i9-65 media driver, which is the legacy media driver that we have, and the one that we're using from. And also, at Intel, we have a new open source driver called the IHD, which is going to be used in future generations. Sorry, it's an open source driver, but it's a separate driver. And so I was able to get it working on both software stacks. I validated it using the FMPEC binary by decoding, believe what VPA that I was using. And in Chrome OS, I actually developed a very dummy video app in which I would just decode in VPA again through the Android on Chrome stack. This is, if this interests you in any way, these are the links to the open source current implementation. There's work happening in the kernel. There's also work happening in the i9-65 media driver. And I'm trying to upstream both of them. But of course, any help you guys can give would be really appreciated. That's it. I think I have five minutes, so questions? Yes? Yes, so the video application is in some way a very nice one because you can predict on time that your submission takes quite well based on the size of the image, right? Right. And have you looked into this at all for 3D submissions? 3D submissions, I have not. To be honest, I have not because the use case that we have was video and that was the one affecting products. And then also, if you go back to the Chrome OS architecture, that GPU process is 3D. And so this thing works for different engines that we have, could be the 3D engine and the media engines. But if you reset the media engine, then that GPU process will go away. You see what I'm trying to say? Yeah, so if I go, oh, man. So the solution, you can reset these guys independently. That's fine. We are focusing on the media engine because that was the use case that we started with. You're asking if we can reset this guy. The answer is yes, we can do it. But I'm just saying, going back to my use case in Chrome, because the GPU process, it's a 3D app. If you reset, I mean, I guess, if there's an issue with that process, you can reset and come back earlier because you're resetting the timeout, the watchdog threshold to a smaller value. I guess that's the only benefit you can get. Any other question? Yes? Is there a plan to this recovery in the hardware itself instead of doing it on the drive? In the hardware itself? Yeah. Yeah, OK. Yeah, is there? I don't know. I don't know, but I don't think so is that simpler. But yeah, maybe, but I don't know. Yeah? Could you please explain why does it require different delay times for different resolutions? Right, right, right. Because, right, because. For the user, whether he's using 4K or 8K. Right, right, right. You're right, for the user, it doesn't matter. It's transparent to them. But for us that we know what's happening, it matters because from a 1080p to 4K, you have four time-as-many pixels. If you go to the 8K, then you have eight time-as-many. So there are more pixels that you have to process. And that, in that. You're floating your frame in the media engine. Right? The video called it. But it's still. Is it the same amount of money to display the 60FPS? No, from a 1080p to an 8K display, of course, it's going to take more time to decode that stream. But 500 more seconds. Yeah, as I was saying, that was just values that we think are the right ones. But yeah. If you have better ideas, feel free. Yeah, yeah, yeah. So there is also offline transport so that you can do. You don't have to do it in real time. So you can use the media engines also for offline transport. Yes, but you're not talking about offline. You're talking about real-time playing your video. The user is playing something. And it just hangs there. So whether the user experiences that from a 4K or a 16K, I don't even know if there is one in the market. But 500 milliseconds for 8K, 500 milliseconds is small, in my opinion. Before this implementation, the default value will be 12 seconds, which is the time it will take the GPU to come back. So from 12 seconds default to 500 milliseconds for 8K. A good job. It's all straight up. Any other question? All right, that's all. Thanks.