 Good afternoon. Thank you for coming. I'm Justin Osad. I'm a software engineer on the Chrome team. And this afternoon, I'll be talking to you about ways of cranking the performance in your graphics-intensive web apps and games. For those of you who haven't read the abstract, just to make sure we're clear, this talk is going to be mostly about Canvas and a few other things. Here's what we're going to be covering today. We're going to look at ways to get smoother image loading. This is because image loading is often a source of jank in web apps. And also, we'll look at the problems that we face when we're aiming for 60 frames per second and how to solve these. And also, we'll look at moving web apps closer to the performance that we get with native apps. And we'll also be looking at ways for memory efficient and CPU efficient background rendering. And finally, we'll look into streamlining some important use cases with WebGL, in particular, multi-view rendering and WebVR. The first topic, synchronous image decodes. So the current practice when we load an image asset into an app is to start using it as soon as we get the signal that the image resources is loaded. So this is what we do with the onload handler. So onload guarantees that trying to use the image is going to work if we use it immediately. But it does not guarantee that it's going to be fast. So the first time that we use an image resource, usually there's a delay due to the decode overhead. And that can cause apps to be janky, especially if we start loading images in the middle of an animation. And examples of use cases are the first drive an image to a 2D canvas using the dry image method, and also a texture upload to a WebGL context using the text image 2D call. So the solution to this is to decode the images in the background. So image decode is not something that web developers are used to controlling, because we kind of just lead that up to the browser and it just happens. But you can take control of this using a new API that's called Image Bitmap. So this was released earlier this year. It's available in Chrome already, and it's also available in Firefox. Here's how the spec defines what an Image Bitmap is. It's there's a few words in there that are a little interesting towards the end of that set and undo latency. What exactly does that mean? It's kind of subject to interpretation. But what is clear is that this is a performance primitive, and what we do know is that image decode is a source of latency that we want to get rid of. So in Chrome, we interpret this as we want to get rid of decode latency. So here's how it works. The use case we're looking at here is we have a web app that runs an animation. And while we're in the middle of the animation, we have a new image that we want to load from the network and start using it. So we can do that fetch asynchronously using the Fetch API or using XML HTTP request. And while that fetch is pending, the animation loop can keep running. And then once we get a blob back, we can call Create Image Bitmap. This new API. And that will initiate the image decode. And the decode is going to happen on a separate thread inside the browser so your animation loop can keep running while that's happening. And finally, when we get the image back, you can use it in your WebGL context or in your 2D context. And when you're done using an image bitmap, like once you've uploaded it to a WebGL texture, it's a good idea to call close. This is not mandatory. But if you close the object, it's going to free the resources that are behind it immediately instead of waiting for garbage collection. So this can help keep down the memory footprint of your app. It can also help reduce the frequency of garbage collection because if you have all these dangling, large objects, they're going to have to be collected more frequently. But once you clean up an image bitmap object, it has a tiny footprint. And you can collect these over many frames. So here's the old way of doing things. This should look pretty familiar, just using the image interface and loading it. And once it's loaded, we use it. The new way is not a lot more complicated. It's just maybe an additional line of code. In this example, we're using the Fetch API, but you could also use XHR. And there's nothing new about how we fetch the blob here. Just once we have it, we call createImageBitmap. And that function returns a promise. And the promise resolves once the image is completely decoded. So here's some videos that, well, I'll just let you watch them to kind of speak for themselves. So the left one, there's a bit of a pause when the image changes. And that's the decode that's causing that jank. And the right one is smoother. These were captured on a Nexus 5x device. So this is a phone from last year. And we were using one megapixel images. They're not that large, just 1k by 1k. Now let's look at the DevTools timeline and see what was going on. So in the first case, where we're using an image element, there's this long image decode. And on the top row of this graph, we see frame times. What's interesting is there's one frame that took 233 milliseconds to render. That's terrible. That's definitely noticeable by the user. So what's causing this is the synchronous decode inside the call to text image 2D. So that's what this flame chart is showing. The image decode is nested inside. And this is what happens the first time you use the image resource. If we were using a 2D context with a draw image call, the chart would look essentially the same. It would just be draw image instead of text image 2D. And we have the same problem. But the second time we use the same image resource, then the texture upload would be a lot faster. But in WebGL, you don't really acquire an image and then upload it multiple times. You really just need to do this once. So we're not really taking advantage of the decode cache. And also, images are not decoded in advance for a really good reason. Because people often ask this, how come when I load an image, clearly I'm expressing intent that I want to use this image? So why doesn't the browser just go ahead and decode it right away so that's ready for when I need to use it? But if the browser were always doing this with all images, it would run out of memory. Because a lot of web pages have tons of image resources and we can only really afford to decode them as they're used. So that's why we're stuck with this problem. Now, this is what happens when we use the create image bitmap API. On the top row of the main thread, you see there's a bunch of small little blue lines. That's the overhead for fetching the blob from the blob store. And the big green rectangle at the bottom, that's the image decode. And the image decode is happening on another thread that's called Worker Pool. And what's the important part here is that while this decode is happening, the main thread continues. We can see all those little animation frame ticks. So it keeps doing its work while this long task is happening on the Worker Pool. And now, this little blue arrow you see right there, that's pointing at the text image 2D call. So this thing that was the big source of jank before is now reduced to about three milliseconds. So it's no longer a big issue. Now, another use case where we experience jank from image decodes is image injection. When you have a page that's just adding images to its existing content by injecting them, well, it's not really a good idea to start using image bitmap for that. And examples of kinds of pages where this would be an issue are really long scrolling web pages with lots of images like a news website or a social media feed. If we started using image bitmap for all the image resources, we could run out of RAM because there's just there could be too much content in there, especially if we're decoding them on a mobile device. So there's another solution for that use case. This is currently under development. So it's not yet in the browser, but it's going to expect to see it soon as an experimental feature in Chrome. This is a new function on the HTML image element interface. We're adding a decode function. When you call this, what happens is it triggers a decode that can also happen on a separate thread. And when it's done, there's a promise that gets resolved. And that tells you now's a good time to use the image to, for example, append it to the document. And that won't cause jank. So the reason we want to use this instead of image bitmap is that this API does not pin the image resource in RAM. The image, the decoded version of the image is evictable. So it's a lot friendlier for situations we're under memory pressure. So expect to see that soon behind a flag in Chrome. So next topic, animation smoothness. So UI performance is always a concern in all types of applications. For graphics intensive apps and for games, it's a constant battle. We add features. We make things prettier. And our rendering time increases. The app gets slower. We lose our beautiful 60 frames per second. So we have to reel it back in. On mobile, it's even harder. We're dealing with slower CPUs and GPUs, less RAM, and also the touch interfaces. They make lag more perceptible to the user. So this is what it looks like when we're trying to drive animation using the DOM. So we have our 16.7 millisecond budget. This is if you're targeting 60 frames per second. And there's the blue rectangle there, the JavaScript slice. That's the time your app has to do its work, whatever it needs to do at each animation frame. And we also have to keep a big chunk of time at the end where we're not planning to do anything. It's just idle padding. And we need this because there's a lot of stuff that your app cannot plan for that the browser just does. For example, a garbage collection could just happen at any time. There could be thread scheduling issues. If you're on a CPU that doesn't have a lot of cores and there's a lot of work to do, the main thread could get descheduled. And all of a sudden, you have one frame where everything takes a bit longer to execute. And also, you could have event handlers, event listeners, I mean, and you can have a burst of events that happens all of a sudden, and you have to deal with it. And this is tricky. So you need to have some padding if you really want to be able to keep that 60 frames per second rate. Then there's the browser overhead. There's style and layout. This is just the cost you have to pay whenever you're touching the DOM directly in JavaScript. And then there's the painting and compositing overhead. This is the cost of just re-rendering the content of the web page. And there's no way around this. On mobile, the fixed costs that are imposed by the browser, they get stretched because you're on a slower device. So that squeezes even more your JavaScript time slot. And that makes your frame deadline even harder to respect. When we're animating with Canvas, that orange rectangle that we had for style and layout, that one just goes away because we're not touching the DOM. So things get a little bit better. This timeline, it's also valid for use cases where we're using CSS animations, well, certain types of CSS animations. Like if you're animating opacity or simple 2D transforms, the style and layout calculations don't need to be recomputed at each frame. Then there's the painting and compositing step. It's usually a little bit later when we're using Canvas because the browser has less work to do. So in general, when we're doing sprite-based animations, we don't really need all that extra support that the DOM and CSS give us. And you can get a significant performance boost by using Canvas. And this is what a lot of 2D game developers have been doing recently since HTML5 Canvas appeared several years ago. Now, this is what your timeline looks like when you're making a native app. First of all, you have more predictability. Therefore, there's much little idle time is required because you won't get any surprise garbage collections. The threads are under the control of the developers. So you have control over how many threads you have and what's the workload that they have. So you don't have to worry so much about scheduling if you're doing things right. And also, your events, like your input events, they can be coalesced so the time that they consume is more predictable. So that's nice. Now, how do we get the mobile app experience closer to what we have on native? Well, the first thing we want to do is get rid of some of that overhead. Now, the painting and compositing overhead is necessary with Canvas animations because we have to go through the same presentation process as the DOM does because we're keeping all of our content synchronized. But many apps that use Canvas, they're not touching the DOM. So what if we just broke that synchronization constraint in order to get rid of that step? Well, there's a new API for that. And it's called Offscreen Canvas. So Offscreen Canvas is kind of like a Canvas element except that it's not a DOM element. It's just a native interface. Now, it has no style, no layout. It's not directly paintable. It's not composable. So this means it can't be a part of a document. And by now, you're probably wondering, so if it's not paintable, how do I do anything useful with it? The whole point is I'm animating something that I want to show on screen. So how do we display it? Well, we use a placeholder. A placeholder is a regular Canvas element that lives inside the web page. And the Offscreen Canvas is connected to it. So we can inject the content that we rendered in the Offscreen Canvas into this placeholder location on screen that's reserved by the Canvas element. And the process of the commit process is very lightweight. It bypasses the DOM update mechanisms. And the presentation actually in Chrome's implementation, the presentation all happens in the browser process. So it doesn't consume any time on the main thread, which is where we're trying to save time for your application. This is what it would look like to use Offscreen Canvas in JavaScript. So the first thing, we're not constructing the Offscreen Canvas directly. We're calling transfer control to Offscreen Canvas. This is a method on Canvas elements that creates that placeholder connection. You get a new Offscreen Canvas that's connected to the placeholder Canvas. Then we create a rendering context the same way you would on an HTML Canvas. And then the animation loop, well, things look a little bit different there. There's this commit method that we call. And this is what pushes the content update to the placeholder. And you'll notice that there's no request animation frame. That's because request animation frame is tied to the DOM graphics update mechanism. And we're trying to bypass that. So we're using something else. Commit returns a promise. And the promise will resolve when it's time to render the next frame. So that's how this works. So this is commit. And it also plays the role of request animation frame at the same time. So let's see how using this impacts our CPU usage on the main thread. This graph shows two devices, a powerful workstation and a typical phone. So the orange bars, they represent the time it takes to present 300 by 300 pixel Canvas to the screen. And what we see here is that in both cases, the time it takes when we're using an off-screen Canvas is faster by about an order of magnitude regardless of the device we're using. So on mobile, we were taking 2.7 milliseconds out of our 16 millisecond budget just for pushing pixels to the screen. And that overhead is not necessary in most cases. So this kind of seems like, OK, this is obvious. We should just always use off-screen Canvas. But it's not that simple. It's not a free win because we lose synchronization with the DOM. And some apps really need that. But if you're doing a game that runs in a full-screen Canvas, for example, you most likely don't care about that. So you can just go use this and get a perf win for free. Now what happens if your main thread is busy? For example, there's a lot of event traffic happening on the main thread. And certain events have to happen on the main thread because that's where the interfaces are. Well, off-screen Canvas can be used in a worker. This way, we give the animation loop its own CPU core. And it can just run without being disturbed. And this way, we're reducing the idle time that we need. So that's another step we can take that brings us closer to native performance. So we're just giving the Canvas, well, our rendering loop, actually. We're giving it an isolated execution environment. We get more predictable runtimes for each animation iteration, therefore less idle padding. Here's how it works. The first part of creating the off-screen Canvas, that's the same as before. We just call transfer control off-screen. And at the bottom of this page, the animation loop, that's also the same as before, except that it's running in a worker. The difference is what happens in between. After we create the off-screen Canvas, we need to send it over to a worker. So we use post-message for that. And you'll notice the way post-message is called here, we're using the second argument. That's the transferable list. And that's really important because the off-screen Canvas object is not cloneable, but it is transferable. And when you transfer it, it preserves that connection with the placeholder. So even though we've moved the object into a new realm, if we commit from the worker, the content still ends up in the placeholder Canvas. Now, here's a simple example that, oh, there's something wrong with the video. Let's retry that slide. OK. What we're supposed to see here, I'm going to picture it. So everyone close your eyes, try to imagine this. On the left, there's a Canvas with an animation that runs really poorly. And it's happening on the main thread. And on the right, there's another animation. Oh, here we go. Thanks. All right. So the animation on the right is going nice and smooth. And the one on the left is janky. So the one on the left is being rendered on the main thread. And the reason it's so choppy is because we deliberately inserted long tasks on the main thread. And those long tasks are preventing request animation frame from reaching its target frame rate. So the point that this makes is even though request animation frame isn't running at full speed on the main thread, the worker can still pump out frames at 60 frames per second with no issue. And that's the whole point of isolating it. So that's what that showed. Can we switch back to the slides now? Thank you. All right. So by now you're probably thinking, all this seems nice, but there's one little problem here. I'm already using a framework, and it just uses Canvas. It doesn't use off-screen Canvas. Well, maybe. But is that really a problem? There's something really interesting is that the web platform is highly hackable, right? People do polyfills. Well, with off-screen Canvas, often we can just drop it into frameworks that usually use an HTML Canvas element and hope for the best. And maybe things just work, or maybe you have to do a little bit of bootstrapping to get things working. Kind of like a reverse polyfill mechanism, that makes sense. So I'm going to walk you through an example of this. Using the 3JS library. So 3JS is just, for those of you who don't know it, just a really popular framework for doing 3D rendering on the web. And 3JS is meant to work with Canvas elements. What we're going to see is not only can it easily be set up to work with an off-screen Canvas, but we can also use it in a worker without too much work. And a quick little disclaimer, the steps I'm going to show in the following slides, they're not a complete full-featured solution. They're just some ideas to get the ball rolling, if any of you would like to try this. So first step, getting it to work with an off-screen Canvas. Usually, 3JS renderers, the way most people use them is that we let the renderer create its own Canvas. Well, we can override that behavior because the renderer constructor can take a creation parameter called Canvas. And we use that creation parameter to specify our own Canvas. Just say, OK, now don't create your own. I have one. Go and use this. But this parameter expects to receive an HTML Canvas element. But if we send it an off-screen Canvas instead, will it just work? Well, it almost just works. There's one little thing we need to do. Off-screen Canvas doesn't have style, right? That's not a DOM element. Why would it have style? And sometimes there's a bit of code in 3JS that tries to write the width and height style. So here, I just created stubs. But you could do something a little more clever. You could create a getter and a setter that forward the style to the placeholder Canvas. That might make sense for your application. Maybe, maybe not. But anyways, the stub is enough to just get started and show stuff on the screen. The other problem is, how do we get 3JS to work in a worker? 3JS has dependencies on some DOM interfaces. The most important one is HTML image element. But we can work around that. And we're going to work around it by using the same texture loading mechanism that we saw in the first part of the talk, where we use create image bitmap. So that's what this diagram at the bottom of the slide shows, using XHR or fetch to get a blob. And then a blob, use create image bitmap, get an image bitmap, then upload that to a texture. Here's one way of doing it. This is a really simple, quick and dirty way of doing it. Of course, we could just tweak the 3JS library to make it do the things the way we want. But here, I'm not touching the 3JS distro. And there's a good reason for doing it this way. If you don't hack it, then you can keep getting 3JS from your favorite CDN. So here, what we're doing is we're hacking the prototype by redefining what image loader load does. And it turns out that 3JS already provides a file loader interface. And file loader can be used in workers because it uses XML HTTP request under the hood. So we can just wrap that. But that's not enough. We also need to produce an image. So here, we use create image bitmap. And once we get our image bitmap object back, we send that to the onload handler. And the onload expects to receive an HTML image element. But we're tricking it. We're giving it an image bitmap instead. And of course, this happens to just work because if we're using the image for a texture upload, well, the OpenGL text image 2D call doesn't care if what you're giving it is an HTML image element or an image bitmap. So it can just make this substitution and it works. Now, if you're making a game engine, you're probably thinking, well, to get everything working on a worker, I also need input events. And there are no input events in workers. So how do I get those mouse clicks and keyboard actions going? Well, the solution is to forward events. But this is not a great solution. It has a serious downside because if you just have your event listeners on the main thread, when they receive input events, they post message them to the worker where they can be used. If your main thread is busy, you'll be introducing input latency. So this is still not ideal. It's an unsolved problem. But we're determined to make this better in the future. This is like a future next step for this class of solution. Now, let's look into some other uses of off-screen canvas. Background rendering. This is an interesting one. People are already doing background rendering today. This is, for example, situations where you have a game where it needs to do some asset preparation on the client side. For example, maybe you're rendering text labels that you're going to be using as textures. If you're doing a lot of this work, you're probably doing it with today's technology using an HTML canvas element that is not attached to the DOM. That way, it's all hidden. It doesn't appear on screen. Just doing your stuff. But we can do a little bit better. We could move that work over to a worker. That way, you can batch all of your background rendering. You don't need to worry about dividing it into small chunks so that your UI can remain interactive or anything like that. And this is how you do it. So in your worker, you create an off-screen canvas. But this time, we're creating it directly using the constructor without a place folder. And when we're done doing our background rendering, we want to capture the results to send them back to the main thread. So that capture can be done using this method called transfer to image bitmap. As the name implies, this is a transfer. That means we're not making a copy. We're just tearing off the pixel buffer that the canvas rendered to. And because we're just tearing it off, this is basically just copying a pointer. Well, what this means is it's going to be faster because there's no copy. And it's going to be leaner in memory because we don't have duplicate copies of the same image in RAM. And that's really going to help with memory bloat. And for sending the rendered results back to the main thread, or maybe we're sending it back to another worker if that's where your rendering loop is, well, we're using transfer semantics there too. And it's important to note that image bitmap is transferable, but it's also cloneable. So if you don't specify that second argument there at the post message, it's still going to work, but it's going to be slower. So remember to do this if you don't mind what the transfer implies, that you can no longer use that image object in the current context. And you'll get some good performance games. So just so you all know, we've been getting feedback from app developers telling us that it's really hard to write applications that do image manipulation and that work on mobile. The second you try to use a high resolution image, like, for example, a picture from a DSLR camera, it's really easy to start running out of memory. And this is due to all the multiple copies of the same image data that we end up with in memory just because of how the current APIs are designed. So we're really trying to move towards zero copy interfaces. And that's what these transfer semantics are about. This should really help with imaging apps. And going forward, we're going to continue to provide new zero copy APIs. Another use case for off-screen canvas, multi-view rendering with WebGL. So imagine a CAD software or a 3D modeling software or animation or something like that where you have multiple views of the object, like what we see here on screen. Well, when we're rendering multiple WebGL views that use the same resources, so the same vertex buffers, the same textures, the same shaders, it's useful to share resources between these views. Now, WebGL does not allow resource sharing between different canvases. It's just designed that way. So the solution that most people are using is to have a single canvas that's in the background that we're going to use for rendering all the views. And when a view is ready, we just copy it to the presentation canvas. So this can already be done today using an HTML canvas element that's in the background that's not attached to the DOM. But we can make it a lot more efficient by using transfer semantics. This is how it works. So the background canvas, we can use transferred image bitmap. So that's a zero copy operation. And to present it, we can use this new type of rendering context that's called Bitmap Renderer. The Bitmap Renderer context is a very simple canvas rendering context that has only one method in its API and it's transfer from image bitmap. So that, again, is a zero copy method because the alternative would be to just use a dry image to a 2D canvas. And then you're paying the cost of compositing and you have an extra buffer in memory. And this is yet another use for offering canvases web VR. This is still a work in progress. We're not sure exactly what this API is going to look like in the end. But it's being discussed right now and what we do know is we do want to get web VR to run in a worker because we're looking for that advantage of having the execution context that's isolated from the rest of the browser so we can hit that 90 frames per second or 75 frames per second on desktop. And also, just in general, we want to be able to squeeze more content to get those rich user experiences and the worker can help with that. So finally, if you want to try off-screen canvas, well, we recommend trying it in Chrome 60. It is there in earlier versions of Chrome, but it's not quite ready for consumption. Chrome 60, it's starting to look pretty good. So you can get it for Android on the Play Store. Just search for Chrome Canary. You can also find it on the web to download it onto your favorite desktop platform. And you can install this alongside an existing Chrome installation, so it's perfect if you just want to test it in a sandbox. And once you have Chrome 60, well, you have to turn on the experiment. So you navigate to Chrome Flags, and then you find this item called Experimental Canvas feature. Click Enable, and restart the browser, and you're good to go. Now, if you're interested in following the developments with off-screen canvas, here's a link to the bug. Just follow that link, and there's a little star in the corner. You click that star, and you'll get email updates. So you'll know what's going on, if it's ready to ship, what are the outstanding bugs, that sort of thing. And if you do try it out, please provide feedback. If you find issues with it, submit bugs. So on crbug.com, you go there. And not just about off-screen canvas, by the way. Please, web developers, help us help you. Go submit bugs. Help make Chrome better. And this is the last slide. Let's just summarize what we've learned today. So please, yeah, I see a few people with their cameras there. This is the one slide you want to take home with you. We learned that you can get smooth asynchronous decodes with ImageBitmap, reduce your rendering overhead using off-screen canvas, isolate your rendering in a worker to improve your smoothness, improve your throughput, and reduce your memory bloat by using transfer semantics. And we saw examples of use cases where transfer semantics come in handy, like background rendering and multi-view WebGL. So that's it. Thank you for listening, and I'm looking forward to seeing what you're all going to make with these APIs.