 Thanks for coming to my talk. It's a really nice opportunity to come to the Embedded Linux Conference. This is my second time talking here. I came to the Embedded Linux Conference for half a dozen years in the early 2000s when I was building Embedded Linux products. And I have to say, it was such a helpful conference. And so I feel like every year we travel around the sun. We come back to the same spot. And Tim Bird is still here, welcoming us back and doing the Embedded Linux conference. It's been going for a long time. And I got a lot of good advice here. Yesterday, a colleague that I worked with on a distributed real-time embedded audio system reminded me that we actually did the audio system in this room. And there's a little controller back there that we made. So like all Embedded products, it is unlocked or has the default password. So if you want to go back and Rick roll me, you could probably plug in and play some music for us. But definitely, you get a lot of great advice here. And it's great to see Lucas Roussak here. Last year, I talked a little bit about why and how you would want to use open source graphics drivers in Embedded products. And we talked with him about, if you don't need compositing, just bring up a full screen rendering context in KMS. And he works with the Cody Linux project and went and did that. And it simplified a lot of things for him. And I saw him at Fozdom, where he presented on what a great simplification it was for their multi-platform media solution, which is used by just tons and tons of users. So it's great. I mean, the advice you get here is real. So hopefully, I can give you guys some good advice. So like I said, I've been working on Linux for well over 10 years. And just on graphics for the past five years or so, I worked on the graphics performance analyzers team, starting around 2011, to help them address and target Android. And so my former technical lead is here as well. And I'm terrified because of all the people, he can totally destroy my talk with some tough questions. And to give him credit, in fact, everything I've done has really been along the lines of the product that he envisioned and created. And so his work, I think, really has enabled Linux Intel graphics in a way that most people can't say. They helped any product. So he's really a critical person. And hopefully, I haven't done wrong. But we'll see what he thinks of my demo. I've been working on Mesa for several years. Mesa is a very successful open source driver for 3D graphics used by millions and millions of people. I mean, 10% of the market in China pre-ships with Mesa as their graphic solution. So it's big numbers. We get a lot of users. And I've been working on this performance tool, but more on automating the developer process for the team, which has been very helpful as well for them. OK, so just before we start, my tool is really focused on deeply investigating performance problems in a frame that you're rendering with the GPU. You really don't want to go and investigate at that level until you're confident that that's where your bottleneck is. So all of the graphics rendering happens with two sort of asynchronous processors where the CPU will enqueue a whole bunch of work for the GPU to then consume. And so if your CPU cannot produce the work for the GPU fast enough, then your GPU will be idle. And there's no sense in going and looking at your GPU performance. You need to go and use the standard tools on the CPU side to fix your CPU bottlenecks. But then, if you are bottlenecked on the GPU, that's when you want to go use a tool like Framery Trace. So you can use top to see what your CPU utilization is. There's a new tool called GPU top that OTC has been making that works on Linux. And there's similar tools for every single graphics card out there that'll tell you how busy is the GPU. Rappel is a tool that'll tell you where your power is going. So typically with embedded processors, you might have a thermal budget and the BIOS will turn down your clocks if you use too much power. So you need to look at Rappel. You might have 100% GPU utilization, but in fact, your clocks are running at half speed because your chip can't support that amount of load. So yeah. So you go look in the GPU side. If you have 100% GPU utilization and your CPU utilization is low, you could probably do some simpler things, like just set MesaDBug equals perf, and it'll output a whole ton of advice for you when you've made bad rendering decisions. So we all know how to optimize performance problems on the CPU. I think the tools here are fantastic. They're one of the reasons why Embedded Linux is successful in Linux as a whole. It's because it's so wonderful to be a developer on the Linux platform. But on the GPU side, you have a completely different story. Typically, you have vendor-specific GPU analysis tools that they've made to help their developers target their platform. And they're not interested at all in enabling another vendor for improving performance. And in fact, they'll often try to disrupt each other. You might see one large GPU vendor provide a free tool that helps you generate a scene. And it does all of this metaprogramming for you. But in fact, what it's doing is injecting hundreds of millions of vertices into what's a flat wall, because they know that their competition can't deal with a whole lot of vertices. So when you go and you want to improve performance for your GPU workload, you're dealing with tools that are more in the interest of the vendor sometimes. So QAPI Trace is an open-source, multi-platform tool that you can use to go and investigate and debug a frame. So that's an exception. Another great exception is RenderDoc. Valve has really been investing heavily in supporting RenderDoc. And really, because of their investment, I think a whole lot of GPU vendors are forced to go and participate in that tool as well. They all do the same thing, though. If they're performance analyzers, they'll leverage the hardware counters on the GPU to help you understand the cost of the asynchronous work that you're enqueuing for the GPU. Without that, you really have no hope of understanding why your workload is slow, because all you get is frame rate. They will often have other features, like give you some live experimentation, so you can twiddle around with the workload and try to figure out, OK, well, how do I fix this performance bottleneck? But generally, they'll let you dive right down into the details of a graphics workload. So the real problems with most of the tools is, like I said, they're hardware specific. Really, most of them are coming from a Windows background, because to be frank, that's where 3D is. All of these high-end AAA titles are running on Windows. That's what the vendors want to support. And Linux is really an afterthought. So maybe you'll get a Silicon vendor to go and port part of their tool to Linux. But really, they might port just the collection side, and so you'll still need a Windows host to run a forms-based UI to go and instrument your embedded target. But another thing that can happen with them is, I mean, in DX, I think the tracing and retracing is perhaps easier. When you get into OpenGL, especially above OpenGL ES, there's a ton of extensions that can be optionally supported. And so if you have the task of understanding the state of the GPU and you want to retrace it, you might be using extensions that are not. I mean, it works on NVIDIA, for example, and you can get the tool working on NVIDIA. But if you go to AMD, then the tracing breaks horribly, because those extensions are not available for querying the state. And that happened to a tool called Vogle that, I think, was also working on and struggled because of the tracing support. The other thing is that, frankly, there's just low numbers of users. I think graphics programming is a niche within computer science. But Linux graphics programming is a micro niche. If you're programming graphics on Linux platforms, you are the exotic few who are so intriguing. My six users, actually, I don't know. I mean, I think it's going to be growing. The fact that the Google Play Store is coming to Linux platforms on Chrome OS, that just means you have hundreds of thousands of 3D titles that are going to be running on Linux. For the first time, there's a stable ABI where commercial vendors can go and write applications and charge for them and have them running on a Linux platform. Another reason why these tools haven't worked on Linux is because the counters haven't been there. We really haven't been enabled until recently to expose this information for whatever reason. And that's been true for other vendors as well. But Radeon is now exposing GPU performance counters in Mesa, and so is Intel. So that's what enables this work. So I have written a tool I'm calling Framer Trace. It's based on API Trace, which is perhaps the most widely used OpenGL debugging tool out there. It has really high quality retracing, and it's used by tons of vendors. It was written by VMware. They use it all the time to verify their DX to GL translation when they're running a Windows guest on a Linux host. And for that reason, it's a really healthy project. Typically, for example, Valve might find that Day of the Engines does not trace and retrace properly. And because of that, they cannot debug their own application. So they'll enhance API Trace, or they'll enhance their own application to enable that. API Trace is cross-platform. Like I said, VMware is using it for Windows analysis. And because it's cross-platform, that means that my tool also is cross-platform. And that's a huge benefit to driver developers particularly. It's hardware agnostic. It was written for Intel, because that's where I work. Intel is now shipping AMD GPUs on package in the Kaby Lake G processor. And so AMD has kind of been enabling a framework trace to work on their hardware. And that allows us to go and analyze that platform. There are other platforms. I gave a similar talk at the X Developers Conference in the fall. And other Mesa driver trees are adding support so that they can instrument their driver in this tool as well. It's already been used by the Mesa team pretty heavily to go and find gaps in our driver. For reasons in the demo, maybe I'll explain exactly why. It's so easy to use this tool to find the gaps in the driver. So it's effective. It's in use. So here's a list of some of the features that we have in the tool. I'm not going to go through them because really the demo will go through them. But if people are looking online, they can see this in the notes. All right, so let's switch to a demo. All right, I should have done this before. All right, so let's open a file. So this opens a file. You can specify the host that you want to connect to. If you're working on an embedded platform, you typically don't want to render the graphics for the tool actually on the local host. You'll want to connect to it. This is a really crusty, old benchmark, which unfortunately people still pay attention to. But because it's old, it's been hacked in tons of ways. And there's plenty of performance problems to look at. So I will show you some of those. So just a shout out to my former tech lead here. He will recognize this UI immediately because I totally stole it from his very successful product. And I hope it's an homage to him. So here are some of the metrics that we support. You can see there's just tons of metrics that's almost too many to look at. Typically, you'll just want to filter down. So if you want to know how much things cost, you graph the number of clocks there. And so you can see right away that this is the most expensive render. So the render target will show you the render at each draw. So if you select, say, some renders here and you highlight them, it will tell you what those renders are coloring. You might stop at the render to iterate through the frame and see how it's being composed with each draw, or clear before render just to see the pixels that that specific render is touching. So this helps you figure out exactly what's going on in the frame just by looking around. There's API calls. I mean, it's pretty standard stuff. But if you were looking for clear calls, you might type in clear. So these are the renders that are calling clear. I think the UI could use some work. It'd be nice to have different colors for the bars if they're clear so you can really see them easily. Some other nice features. So for every render, I capture the batch disassembly. That's the actual binary instructions that are sent down to the GPU. And I don't think this feature is really available in any other driver where you can instantly go and get a dump of all the information. So for driver developers, there's a lot of features that need to be added to this tool. But for driver developers, they can find anything they need to understand the details of a specific draw call, whether it's using one texture format or another. In the shader view, we will capture the vertex in the fragment shader. But we also display the intermediate representation in the static single assignment form. And then the 78 or 7816 binary compiled shader that's sent down to the GPU. So that can be helpful for driver developers or compiler developers who want to understand why this shader has compiled in a certain way and is slow. What else can I show you? We should see the metrics. So as I said, there's tons of metrics. And they have longer descriptions that you can look at. If you don't really know what a DS stall is, you can see that it's domain shaders. That's not really something for the T-Rex demo. But there's tons of them. And if you're looking at a specific render or a set of renders, it'll tell you what the cost is of that selection relative to the rest of the frame. So this is a pretty helpful way for you to quickly narrow down what's expensive, how expensive is it, and then you can proceed to go and experiment. So for example, this expensive shader, if you wanted to replace it with a simple shader, let's look at the shader real quickly. So the vertex shader is nothing because, in fact, you can tell it's just drawing two triangles to present a wreck to the screen. But the fragment shader is a little bit more complicated. If you replace it with a simple shader, the shader just turns to draw the pixel pink. That's about as simple as you can get. And the render target is now completely pink. The metrics will show that the cost is much cheaper. Not that cheap, though. It's a lot of pixels. So that's the way the experiment is. Or you can disable the draw completely and see what the overall frame rate would be if you weren't doing that draw. All right, so the reason I picked this is because this particular shader has an expensive motion blur effect. And if you look at the render, so here's the render target. If you look at the renders before that, what you'll see is kind of an ugly green screen. And what that's doing, it's kind of changing the pixel value based on how fast the object's moving in the render. And then in the final shader, it'll sample from those pixels. And based on the value, it will figure out how much it wants to kind of swizzle the texture state a little bit to make a blur effect. So the problem with that is if you looked at that texture, it was mostly zero. It mostly is not moving. And what this shader does is for every pixel, cut it down by 25% and then sample from the adjacent pixels depending on what that value is. So there's a really simple hack that you can do, which is I'll just paste in another shader. So you can edit this. So if I type something in and compile, it'll give me a syntax error. But if you select it all and paste in a completely new shader, what this will do is if the motion value is zero, don't bother sampling four times from exactly the same value and then adding them together to get the same number that you would have gotten, just sample once. So watch the cost of this render when I compile it. It goes down by 30%. So that's enough for 7% or 8% for this whole benchmark to change that one shader. So this is not something that we can put in Mesa because it's an optimization designed specifically to alter the score of a benchmark to make our hardware look like it can process this bad shader. But if you have a proprietary driver from a vendor who's giving you a large 40 megabyte binary blob, you have to ask yourself what's in that blob. And probably what's in it is a lot of shaders for benchmarks that they've rewritten the shaders to try to give you if I happen to be running benchmark X, don't do what it says, let's do it the right way. Let's avoid our own glass jaw and rewrite the shader completely. So like I said, lots of dirty tricks that you can play with a proprietary driver stack. All right, in fact, if you look at the render target, it is exactly, it's pixel the same, right? It's equivalent. We did have a bug when we went and tried to implement this. We weren't getting the performance benefit. And it turned out we did our math wrong. And so just an example of how you can experiment quickly if you say, OK, well, let's just make those pixels black for if I'm in this if case, and you hit compile, and you look at the render target. And it's actually interesting, I wouldn't have, from the green render target, I wouldn't have noticed that the waterfall was also moving slightly. But you can quickly change the shader, recompile, and re-render, see the performance difference, and try to find the bugs in your shader. So this is a performance tool, but it's, I think, even more so a shader debugger tool. And that's really what's missing in the Linux graphic space. All right. I think that's enough for just make sure I'm OK. So let's go on quickly to a more complicated demo because that workload is not that interesting. So for this one, I'm going to enable the shader cache. We have a new transparent shader cache in Mesa, which makes things much faster. And that's lovely for all of our users. Unfortunately, when you're not compiling the shaders, this tool can't collect all the intermediate assembly and give you the binary, the SIMD16. So we'd still be waiting right now if I didn't have the shader cache enabled. Those of you who are writing Mesa Master, I think that's good for you. The shader cache is there for you. And I think it will stop your games from jittering as you enter the room where new things are rendered. OK, so this is Aztec Ruins. It's the newest GFX benchmark that people are working towards. But again, you can see in the profile that the performance is totally dominated by a few renders, specifically this one. And in fact, it's always the compose the final frame. What's this frame doing? I'm going to zoom in a little bit. If you look at the renders leading up to it, what you'll see is different blurryness. So it's rendering the same thing over and over again with different blur effects. And then in this frame, it's composing them completely. You can see maybe in the back here, there's a little thing that's far off. And if you look at the previous one, it should just clear up a little bit. So it gets clear. It's a depth of field effect. So it's trying to make it look like a camera is focused. Still expensive. If you look at the render, you'll see that vertex shader is nothing but in the fragment shader. It appears that GFX Bench has noticed that everyone is replacing their shader and making it more efficient. Because in this case, they actually do check to see if there's anything. And it's only in the else case where they iterate through and go and sample for the blur effect. So that's good. I really want to look at this frame, though, more to show you some of the other debugging features. Well, there's experiments. I mean, it's interesting if you disable this and look at the render target, it's the same. But there's also a memory or a barrier. I mean, I think if you're the author of the program or a real expert in GL, you can kind of read into this stuff and dissect the program. There's similar tools for DX. And actually, it's pretty interesting online to go see someone go dissect Grand Theft Auto and tell you how all the shaders work and what they're doing. So this tool is a great way to learn OpenGL. And other shader debugger tools are great for learning DX, whatever. All right, so I didn't show you uniforms. These are the constants that are attached to every program that parameterize the execution. So typically, they might rotate a triangle in space and put it in the right spot for rendering. Before I do this, I want to locate the character in the screen. So let's look at the number of triangles. And that'll kind of give us a hint where there's lots of triangles going on. OK, so that's her face. I didn't show you the multiple render targets. So typically, more complicated applications will render to more than one frame buffer at a time and then use those later in the frame to compose things. So it's kind of interesting to see that. But let's zoom in on this. This is OK. So let's go here. One more. OK, so this is the character. And if you look at the uniforms, so here's the projection. And if I change a value and hit Return, it should re-execute. You can see, actually, her head has kind of been moved over to the side because I've changed where those triangles are transformed. And it's kind of eerie. Her eyes are still in place, which is a little creepy. So you know, some like the eyes there. I don't know. OK, yeah, so if you have a problem with uniforms and you're writing a game, you can go and figure out what they're doing. Change them, find your bugs. So we'll put her head back on. But there's, so yeah, that's uniforms. And the other thing I've been adding is entries for the GL state. So this is not a complete collection of GL state, but it's what I've been able to add so far. Kind of wrote my own QML hierarchical tree thing. And so the problem with putting things in folders is you can never find what you're looking for. So if you're looking for something like the scissor state, you can just filter to find what you're looking for. But for example, some state has multiple red, blue, red, green, blue alpha. Some has like near far. All of the sort of indices for the state are correctly displayed. And if you have an enum, the correct values are offered to you for selection. And you can change those. And yeah, so if we go down and look for the right mask. So if red is enabled, that means that when you're rendering a red pixel, the red channel is written. And if you disable it, that means the red pixels are not going to be written. It's going to keep whatever red value it had before. And so look at the render target. And the character is kind of see-through because the red in her frame has not been written. It's got the same value it had before. So here's an example of that. You can go and culling is another thing. So culling is on. And it's culling the back face of triangles. So in a model, there's a bunch of triangles. If she turns around, a bunch of triangles are facing the opposite direction. And you don't need to run the shader on those triangles because they're facing away. So it calls the back face. If you call the front face and go back to the render target, it's just kind of turn her around except for her eyes, which is also creepy. But she's decided that it's too dangerous to go after the gem. And she's going to turn around and walk out. Yeah, so you can change any of this state. And I think it's pretty easy to add new state items. You just kind of look through the GL spec and collect it and then override it when you're retracing it. If you go to the end, it's interesting. She hasn't turned around in the final frame. But that turns out to be because if you remember, we turned this shader off with the geomemory barrier. And if I turn it back on in the render target, she's rendered around. So you can hack the stuff. Yeah, so I think there's anything else I wanted to see. You can do a scissor state. Forget about it. Let's move on in the interest of time. OK, so I want to show you a really complex frame. Not a benchmark, not something simplified just to exercise your GPU, but an actual Unreal demo. And we see what frame 29. So this is like OpenGL 4.4, I think. And it's not the fastest thing. It's intended to provide a lot of photorealistic effects. This is the kind of thing that a lot of retracing, these are the workloads that they're going to fall over on. So if you don't have a tool like APR just built on top of and you're Unreal, you're going to want to go look at this complicated workload. It's not going to work. So it takes a little while for it to iterate over the frame to give you all the metrics. So again, you're going to see dominated by a few bars. So this is where the time is going. And if you want to go and give it a second, here we go. So if you want to go and look at the shaders, what you'll see is there's actually not a fragment shader or a vertex shader, it's a compute shader. And the reason why it's expensive is it's a pretty, do you guys have limitations on how far you can indent code when you're contributing? Because yeah, this one is a monster. So that's why it's expensive. I think the other one is the last one. This is not a compute shader, but actually it's just one call and it's flush. So again, if you want to go and find the performance bottom, I think you simply cannot look at the frame rate. You really have to know and understand what's going on in the frame and experiment and get that feedback. If you have a flush, it might mean that you've got two different threads that are ping-ponging back and forth, one's preparing the GL commands for one frame and then the other one's gonna do the other frame and then they're gonna be rendered in a single context. And so they're probably trying to work around underpowered CPUs and I would recommend that they use Intel processors if that's a problem for them because we have really powerful CPUs and our GPUs are not as powerful so you can get by pretty well. You don't need this kind of crazy programming technique. Yeah, so if you go and let's disable these things and kind of bring up the other. There's not a lot of demos at ELC because it's risky. But I'm living on the edge here. So all right, if we look in this range, what we'll see if we wait long enough is that this is sort of rendering the fight scene. And so these are the renders that are kind of drawing out the actual geometry. And so this is another example of when you want to, let's say stop a render. So you can see it's kind of drawing the floor in the background, kind of moves forward, starts drawing some of the stuff and then eventually starts drawing the characters, the little neat weapons that they have on their belt and their heads. And so this is really an elementary performance rendering mistake that is easy to do. And if you program GL, maybe you can recognize what's costly or wrong about the way they're rendering the scene, does anybody know? They're rendering back to front, right? So when you render back to front, you have to draw all the pixels and then you're gonna render the thing in front of it, it's gonna draw all those pixels on top of the pixels you already drew. And GL has this thing, it's called the depth test. And they've got the depth test on and if they had started at the front and drawn the characters and then drawn the wall behind them, most of the pictures in the wall would have not passed the depth test, it would have recognized that those triangles are further away and you wouldn't have had to go and execute the fragment shader, so it would have been faster. So interestingly, overdraw, that's called overdraw, that's like the feature that Unreal was asking from this tool, it's like we want an overdraw representation so we can figure out when are we doing things in the wrong order. Yeah, okay, so I don't wanna spend too much time on this, the features are there, but it's just an example of a more complicated workload that you can go investigate and analyze and find mistakes as a game developer. Let me see, is there anything else I wanna show? Okay, so things that maybe this demo did not show off, I talked previously about the Windows support. This is critical for the driver team, if you can envision taking a single trace either on Windows or on Linux, playing it back on the two platforms and having the exact same UI with the precise set of metrics that are exactly the same, you can easily find out where your driver stack has a gap or your other driver stack has a gap and so the cross-platform nature and multi-platform nature is gonna enable a lot of competitive analysis of these workloads, but yeah, for Mesa developers, we can both make recommendations to the Windows drivers like hey, there's this optimization you're missing or we can find that hey, we totally miss this feature in our hardware enabling and that's why we're slow on a certain platform. So it's been the key to finding and fixing problems in our driver. Things that I need to add, which really I'm taking as a template, the graphics performance analyzers tool which I think is just a fantastic product that Matt made, but displaying and experimenting with the texture state, displaying the geometry mesh so that people can see the vertices and understand what each draw is drawing, depth buffer visualization, you can mess with the depth state, but like actually seeing what the values are and the depth buffer is very helpful and overdraw and hotspot, if you can think of the render target as something where the color rarely represents how expensive that each pixel was, that's kind of the thing you need as a game developer to figure out where to go look. UI improvements, I'm not a QT, QML developer, this is my first sort of attempt, so I'm trying all the time to make it a little better, but also adding support for more hardware and Android specifically. API Trace dropped support for Android because it didn't seem like anyone was using it, but it's really the thing, Google has kind of signed up to re-enable Android support and API Trace and that'll let people like Mesa developers or game developers go and figure out like how can they fix the performance problems. It's not a huge project right now, it's mostly me, but I have a bunch of people who've helped me, especially Robert Bragg and Lionel Lander-Willan, they definitely have enabled the performance counters that this is all based on, and so that's great. I'm actually gonna try to show Lionel's tool next as well because that's pretty neat. One problem is that when you replace a shader in a program, you have to recompile all the source and you have to make a new program and you have to attach every single thing to that program, all the constants and all the vertices, and so that can be kind of intricate and the workloads that I've looked at at works, but yeah, I mean, it's gonna be a process of getting more and more users to use the tool and report things that I can fix. The other thing is not all the workloads have single frame run loops, so Dota 2 is like this where the way they ping pong their threads back and forth, if you just choose a single frame and iterate over it, it won't render properly and so that's another thing I need to fix. Yeah, so what's new is AMD performance monitor support, so engineers from AMD have helpfully implemented this interface and getting it into frame retrace is pretty trivial, but now we have a whole pile of metrics that'll help us go and look at the Cabulate G. Unfortunately, one of the things about the way they implemented this extension, they didn't really do it as written, what they did is they exposed just the raw registers from their performance fabric and so if you say how many counters do you have, it'll be like, I have 50,000 counters and you need a third party tool to go and interpret that and turn it into real metrics. Luckily they've open sourced that as well, it's called GPA, which is a little bit confusing for those of us who worked on the GPA team at Intel, but since they put it on GitHub, it was not too hard to go and fix it. What I found out though when I went and tested that is that that GPA library that AMD Mesa has never ever run on Mesa, it's only run on the Catalyst driver for Linux, so this is why we need open source tools because you can't depend on your upstream producers to solve all your problems for you. You need to be able to take their implementation, put it in your embedded product and figure out why it doesn't work and then go and fix it. It's really critical for everyone here. So that's what's happened with GPA, so this is fresh out of the oven. It'll burn your fingers. I think I'm the only person who's ever gotten AMD metrics off of Mesa, so if you run right now to my Framery Trace branch and look for a GPA branch next to it, you could be the second person in the world to look at Radeon metrics for a graphics workload on Linux. As a side effect though, Raspberry Pi and Nouveau, they've both added support for the AMD Performance Monitor, so all those developers are eager to get the same support for their platforms and those will be arriving soon. Okay, I have a little bit of time, so I want to talk a little bit. If you remember, I mentioned you use GPU top to see the system load, and I'd like to show that for you. So I just need to, since it's collecting metrics from the whole system, it needs to run a server as root, so you start GPU top and then start the GPU top UI. And Lionel has done a fantastic job of adding tons of features to this. It used to be GPU top was kind of like a web JavaScript thing you could connect and view in your web browser, but he decided that this rendering engine called IAM GUI is like the coolest thing, he loves it, and it's amazing how fast he can get things done with it. So if you haven't seen IAM GUI, especially for the embedded world, it sounds like it's, it sounds like a rocks, so I would take a look at that. All right, so if we connect to the server, again, you'd want to run this on a different system because IAM GUI has its own graphics and that's going to cause GPU work to be running on your system, but my system is a carry like GT3, so there's three slices each with eight EUs. So let's see if I can remember. So if we choose the render basic, that's kind of like the general metrics you want, and then we start sampling. This is up top, it's just the CPU utilization, but if we show the live counters, there it is, it'll give you a nice little, this is pretty much the same counters as API, as the frame retrace tool, you can choose a different counter set for different tools, you can see whether the EU is active or stalled and how busy the GPU is, there's not a lot going on. There's also a timeline, let's not do that, what is it? Oh yeah, timeline, so if you add a EU stall, any EU active and GPU busy, it's kind of graphing it over time and so I'll go and start a workload, so again, this is our benchmark where she wants to go and get the gem and you can see that the GPU is much busier, we actually have quite a bit of stall here and so it seems to me that we have some opportunity to try to figure out on our hardware platform what exactly is stalling and try to optimize this workload quite a bit. So you can see, oh, she's got the gem, but now there's gonna be bad things happen. So yeah, one other thing though, if you notice over on the CPU land, the CPU is pegged at 100%. So in fact, the GPU is not at 100% and so this is a classic case of where you don't wanna go and optimize the workload too hard because it's not gonna improve your frame rate. It's just not gonna improve your frame rate because that's not what's slowing you down. So you have to understand this is an API trace of the benchmark, it may be that running the benchmark itself is not GPU bound, but I just wanna show you an example of why you need to look at the system a little first. All right, so this is the part that's really interesting, so if we say, all right, this is what Lionel's working on now, it's really cool, these are events that are happening and so if you stop sampling, you can kind of zoom in and see what's going on in different rendering threads. So more to come here, I think this is really cool. If you're missing a vSync event and you're stuttering because you can't get the graphics workload done in time, there's gonna be events like when was the working queued, when did the context switch, when did was the last frame flipped and that will give you hints as to why you're missing that deadline. So in order to actually show that, I need to run a kernel and a brand new kernel which is not on stream and I did not wanna risk my demo on that particular kernel. It's still in the oven, it's not cooked and I don't want the raw chicken to cause my demo to puke all over but I think you guys get the idea that there's a lot of features there and there's gonna be a lot of information for you to look at. All right, that's it, those are the tools that we have and I hope you find that it's interesting or helped you think of things to look at with your driver stack so you have any questions about the product, yeah. Yeah, I mean, before performance counter support, you click on the metric drop down. So the question is, is it just the performance counters that we need in order to make the tool run on other platforms and I mean, the bar graph is the most important part of that tool if you're looking at performance. If you don't have performance counters, the number of metrics you get to graph is one and it's called no metric and it just is a flat bar graph. You can still use the debugging because OpenGL is OpenGL and this intercepts at the OpenGL layer. So I know that the Broadcom driver developer has fired it up to go look at workloads just for debugging. Anything else? All right, well, thanks for your time. I hope it's interesting and I hope you enjoy the rest of the conference.