 My name is Shubhi Panaker. I'm a software engineer working on the web platform in Chrome. And I'm Jason Miller. I'm a DevRel for Chrome. Our talk today is about a key strategy for runtime performance of web apps, and that is scheduling of JavaScript on the main thread, as well as approaches for moving script off the main thread. Jason and I have both been deep in this space, exploring gaps and APIs for what we are calling achieving responsiveness guarantees. We're excited about the opportunity here, both with existing primitives, as well as the new APIs we will show in our talk. Right. So to get started, let's illustrate this problem of space using a demo. So this is a simple application that searches through photos as you type. And you can see here with this JavaScript-controlled red spinner animation, it's doing a fair amount of blocking work on the main thread. And while this is happening, the app can't respond to input, so typing gets queued. Looking closer, what we see if we pulled out the profile or something like this, a sequence of long tasks that block the main thread and cause that input queuing. We can see this on a simplified view here. So if we receive input and we start doing some processing in response to that input, say, searching photos, rendering some list items, we're skipping frames already. But in addition to that, if we receive additional input while that task is running, it will get queued, and it only is able to execute once that has completed. So this is data captured from real users on real websites in the wild. It shows a breakdown of where Chrome was spending its time while handling input. So there's a lot of interesting data here, but we don't really have time to get into it. The main thing to look at is the amount of time that we're spending in this v8.execute task. That is Chrome running JavaScript during touch handling. And it's clearly the biggest contributor to touch input latency, both on average and also in the worst case. So a problem with our search-as-you-type example app is that there's just a lot of different types of work happening here. And all of these different work have what we are calling grade deadlines. So for example, the user is typing in that search box. Their input has to be responsive. There's ongoing animations on the page. They have to render consistently and smoothly. And then there's the heavy lifting of fetching search results, post-processing, preparing, rendering these search results in time so that it's relevant to the user's typed inquiry. The difficulty is that it's hard for apps to balance these competing needs, to reason about all these different deadlines and keeping everything meeting these timelines. Right, so we have a bunch of different types of work, and each of those types of work has a different deadline. And what we need to be able to sort of work through this is priorities. So there's a couple high-level approaches to try and achieve responsiveness guarantees. The first approach can just be doing less work. And there are ways of doing this, such as in an infinite feed, you might only render what's visible. We just saw a strategy with a virtual scroller talk right now. Now, this is not always possible. Modern apps often just have a ton of work to do. So a second strategy here is chunking up work and prioritizing these chunks of work. In practice, though, this is also very difficult. It can be impractical to achieve this manually on your own as an app developer. And we think there's a real opportunity here for frameworks to step in and help their users. Frameworks are in a great position to ensure chunking and prioritizing of work. So stepping back a bit, what we need here is some way to provide our chunks of work, our tasks, to a system that can hold them, say, in a task queue. And then this system can make good decisions about when to take tasks out of the task queue and execute them at an appropriate time, based on everything that's going on. And this is the definition of a scheduler. So Google Maps is a really great example of an application that uses a scheduler to keep its interactions smooth. This app has to manage multiple different types of interactions and events. And this can all happen concurrently. They do this by scheduling all work and giving a much higher priority to input response tasks. We can see that here. So let's say I'm panning the map. And as I'm panning, additional tiles are coming into the viewport and need to be loaded. However, if I stop panning and I pull up the drawer at the bottom, all of a sudden that is far and away the most high priority task to execute. So those map tiles being loaded need to be deprioritized. So a key aspect of a good scheduler is its ability to execute work at the best time. And this is an appropriate time based on everything that's going on, various factors like what's the type of the task, what's important to the user right now, what's the overall state of the application, what's the internal state of the browser, et cetera. So to understand this notion of best time, we have to kind of step down a level and look at the browser's rendering pipeline. The browser is periodically pumping frames, typically every 16 milliseconds for 60 frames per second display rate. And each frame has a set of things that happen in sequence. For instance, we have request animation frames, followed by style, layout, and paint. In Chrome, input handlers are aligned right before request animation frame callbacks. So the point here is that there is limited time to do the urgent work that needs to happen in the current frame. And then the app has to immediately start thinking about preparing for that next frame. And the third type of work here is idle work, which might be left over in the current frame, or there might be plenty of idle time if no frames are being rendered. So this is the terminology we are using for these three types of work. We have user blocking tasks for the current frame. This is typically to provide the user an immediate acknowledgment of what they're doing. So in our example app, this might be keeping that typing interactive in the search box, keeping those animations going on the page overall, keeping the page responsive overall. Buttons should be toggleable. Default work is this next category of work. This is typically user-visible. And this is preparing for the next frame or a future frame. And in our example, this would be the work of the post-processing, preparing the search results, rendering them in time. And finally, the third category, idle work. This is typically work that is not user-visible. This can be at the end of the frame, or if no frames are being rendered, things like analytics, backups, syncs, or indexing. So on the right here, we've sort of listed some existing primitives, existing ways how a developer might be able to submit work to the browser to target these priority levels. So for user-blocking, input handler, and request, and information frames are great for this. It's also worth noting that micro-tests are suited for user-blocking urgent work. They do not yield to the event loop. And we've seen some bad cases where developers are accidentally doing non-urgent or large amounts of work without realizing it's blocking rendering. The second thing, default, we have things like set time out zero post-message. These are really hacks and workarounds. There isn't a real primitive here, and we are working to fill this gap. And finally, for idle, request-idle callback is a great API. So JavaScript schedulers can be built today using these primitives. Now, while it's possible to build a scheduling system in JavaScript, they suffer from gaps, primarily because they don't have enough control on signals to properly control scheduling. So we'll go through some examples. So for example, we've seen JavaScript schedulers are trying to estimate the frame deadline. So they're doing a whole bunch of bookkeeping, trying to guess at it. But they're doing it poorly, because it's just not possible to do this well without knowing browser internals. So we're considering exposing an API for that. Is input pending is a really useful signal for schedulers? And we are actively exploring an API. Then there's other coordination work. So for example, handling fetch response priorities is pretty relevant. If you're doing urgent work for the current frame, you don't want your low priority fetch responses to come in and interrupt that. In practice, though, there's a lot of other work that's happening in the browser. The browser might initiate various callbacks, such as ready state change for XHR, or a post message might come in from a worker. There's internal work, like GC-ing. And it's just not possible to codify priorities for all of this and expose signals. So this got us thinking, how about moving the scheduler one level down and integrating it directly with the browser's event loop, where we already have most of these signals and a lot of great information. And this would solve an additional problem. That is this coordination problem between multiple parties in the app. If you have third party content, or embedded libraries, or legacy code, or even other frameworks, they can all coexist and use the same scheduling system with consistent priorities. So this is a very early sketch of what an API might look like. The key thing here is a set of global task use targeting each priority level. And so this is really simple and straightforward compared to using a myriad of different APIs. The second thing is we think it would be useful to have a notion of user-defined task use, or virtual task use. And this would give developers more control over managing a group of tasks and doing bulk operations, like updating priority, canceling all the tasks, or flushing the task use if the app is going away. So here we can see a simplified version of that map scheduler that we looked at using this task queue API. So first we hook into the user blocking and default task queues just to give ourselves a high and a low priority queue. And then we start listening for pointer moves events. And each time we receive one of these events, we enqueue a pan task with the coordinates of that pointer move. The pan task translates the map tiles, obviously. And then it might, let's say, enqueue a low priority task to detect any tiles that have moved into the viewport and potentially load those tiles. The thing to note here is, if we receive a new pointer move event before we've invoked this load more tiles task, that would be given a higher priority than loading more tiles. And that's exactly what we want. We give higher priority to input-driven tasks. And let's say the team behind maps needed to track analytics or do something in response to pan gestures, that would be a good case for something like an idle priority task. So here you can see most of the frames are in green and getting rendered in time. And this is what a well-scheduled system looks like. The work is chunked up. There is high priority work happening at the beginning of every frame, followed by style layout paint in purple and green, just immediately followed by default priority work in yellow to prepare for the next frame, as well as idle priority work being properly interleaved. And this time is adequately utilized. Next. So all the API proposals we showed today are super early stage. We actually don't know what the end game here is going to be. This is a really great time to give us feedback and help us chart the course here. For web developers, we really think that there is an opportunity here with improved scheduling, even with just properly using existing primitives. For framework authors, we want to urge you to consider a scheduling system and collaborate with us now to develop the right set of APIs in this space. React's work on concurrent and time slicing has proven that frameworks can really play a good role in terms of helping apps improve responsiveness of apps. And so we're already working with React and actively looking to form partnerships with other frameworks and apps. This is a link to our GitHub repo. Filing issues on the repo is a great way to get that feedback dialogue going. Right. So what about work that can't be chunked, though? What if we have a bunch of JavaScript we need to execute and it's really difficult or even potentially impossible to break that work up? Here's an example that illustrates what I mean. Let's say we have a text editor that does something like live JavaScript bundling as you type. If I load in a decent amount of code here, things start to get a little bit slow. So every time the bundling process kicks in in response to my input, it blocks the main thread. And this causes the cursor to freeze and it queues up my text input until bundling is completed. And this really disrupts the typing experience. You can see that in the CPU profile here on the right. So it would be really difficult to break that work up into 50 millisecond chunks. And that's for two reasons. First, I didn't write any of the bundler code. So modifying that would be a lot of work, particularly for me. Plus, there is a whole bunch of different libraries that are being used to actually make these things happen, those dependencies, and downloading, parsing, and evaluating those dependencies on the main thread blocks. So using background threads lets us offload that work and get it off the main thread so the main thread can sort of just keep handling input. There's a few use cases that lend themselves extremely well to this approach. If you're building a computer-aided design tool, a game, or doing encoding, these are great places to just start with threads. Same thing for AI, machine learning, crypto. If these are the types of things you're doing, you should start here. In the browser, our primitive for threading is the worker. So if you haven't used workers, you haven't used them in a while, they're basically threads. They have a simple messaging interface, so you can send a message to the worker and you can receive a message back. They have no DOM access whatsoever and a very limited global scope, kind of just fetch and modules stuff. And they shipped around 10 years ago and they're available essentially everywhere. So the API for workers looks like this. You will instantiate the worker constructor and pass it the name of a script, and then we can listen for messages coming back out of that worker and we can send messages down to the worker. So here, we're sending in a message that's an object and this describes that we would like to say, invoke a compute hash function and we're going to pass it the contents of a file expressed here as an array buffer. The second argument to post message is interesting. This tells the browser to, rather than structured cloning the array buffer, it will transfer it in. Finally, once compute hash has completed, it will say post message back to our thread and will be dropped into the message handler on line three. So under the covers, this post message of the data is incurring a serialization on the main thread since getting queued up, hopping over to a worker thread followed by de-serialization. And end to end, this is called a thread hop. And this thread hop has a cost and primarily from the data being subject to what is called structured cloning, which is a copying behavior while recursing the JavaScript object, the size of the data is relevant to the cost of the thread hop. So one downside of the post message API is that it doesn't have a notion of statefulness between the request and the response. So if you make a whole bunch of requests, you'll get a whole bunch of responses back and it's hard to correlate those responses to requests. Right, so we've seen how to communicate with a worker using post message. There's actually a number of ways you can do this. A second way would be to use message channel. Message channel is something you can instantiate and you get back two ports. You can pass your other port to some other contexts like a tab or a frame or a worker and you can message between the two. They have the same interface as we just saw. Another one would be broadcast channel. This is sort of like a message channel that's shared to all contexts associated with an origin. So all tabs, frames, workers, service worker. And all you do is you instantiate a broadcast channel with a channel keyword and you can message without having to pass ports around. Soon we're actually going to have a fourth way to communicate and this is transferable streams. It lends itself really well to things like audio and video where the format you would want to use to express these things is streaming. The thing with all of these APIs is that they're message-based. And based on some of the common usage patterns that we've seen and what we've heard from developers, we think there might be a case here for a higher level API. So we've seen solutions to this in user land through libraries like comlink, greenlit, workerize, via.js. These all help coordinate messaging across boundaries by abstracting that post message using something called proxying. So messaging certainly improves over a post message. I'm sorry, proxying improves over a post message, but it comes with a number of downsides. Every method called to a proxied object incurs this cost of a thread hop and this can come as a surprise to developers. Platform gaps can cause memory leaks in these APIs. They don't, these APIs don't really have a notion of a backing thread pool or a concept of managing threads and resizing the pool. Embedded libraries are not able to share the same thread or thread pool. And for complex APIs, it can be impractical to recreate this API surface cross thread. So this raises the question, is there an opportunity here for better integration with the browser? Is there an opportunity to provide a more compelling API? Right. So this, we think there might be a use case here for something that looks something like this. So here we're passing the name of a function in some other context and some arguments to a theoretical post task method. This post task method would return a promise that eventually resolves to the return value of that function somewhere else. This abstract code helps us move from a message passing model to a more task oriented model. So in looking at the requirements for a better API, we considered other platforms. We looked at iOS and Android, which have plenty of precedent for usage of background threads. In particular, iOS has an API called Grand Central Dispatch, which is a very stable, well proven API that's loved by iOS developers. Android, amongst other things, has something called Async Task, which is a very minimal clean API. We talked to framework developers and experts in usage of these APIs who are deeply familiar with the pitfalls and we learned things. Some key things we learned in terms of the basic requirements we want for our model is number one, good ergonomics, a way for developers to offload work by just thinking in terms of submitting tasks versus coordinating over threads. Secondly, a native thread pool that's shareable with embedded libraries and other parties in the app. And finally, system controlled thread management, where the system can be in control of making decisions on resizing the thread pool or decisions on where to run which tasks. So we set off on a path towards building a basic task queue-based API, inspired by Grand Central Dispatch, and a naive API might look like this. Let's say we have three tasks, A, B, and C, and let's say each one depends on the results of the previous task, and we can submit these tasks from the main thread over to worker threads and then we'll start getting responses back. So here for three tasks, we paid the cost of six thread hops. There's a few downsides here in gotchas. So for one, these thread hops can be expensive on lower end devices, and depending on the data size, it can be up to 15 milliseconds. And this can add up. This means that if these hops are in the path of user interaction, this can add up to multiple frames worth of latency now. On Android, we've actually seen this in practice in the real world in their usage of async task. So one conclusion here is this notion of default posting back results to the main thread is not a good idea. Besides the latency issue, it can cause congestion from queue buildups, and then you might remember from our earlier main thread scheduling talk, we're doing all this work to carefully chunk up our work and execute our high priority and our default priority work, and all these post messages coming in at random times messes with main thread scheduling. A second thing to note here is that default posting even to the current thread can be pretty bad. And we saw this in ground central dispatch with their dispatch get current queue API. Right, so this brings us to a new proposal we have that incorporates some of our learnings from other platforms. It lets developers avoid sending data back to the main thread. It lets you chain tasks together without data transfer and pay the return cost only once. It also minimizes thread hops using a built-in sticky thread pool. What we want is the experience that you see up here on the right. So let's dive into that. If we revisit the code editor that we showed earlier, the one that bundles JavaScript as you type, if we do this using task worklet, we can leverage some of these features to improve performance fairly considerably. Because task worklet avoids transferring data between threads, the bundling and minifying tasks in this demo can actually all reuse the same AST that is generated from that initial parse task. In the end, only the resulting minified code, which is a relatively small string, actually gets sent back to the main thread. So the implementation looks something like this. First, we create a worklet module and that registers named task processors. It just classes with a process method. Then over on the main thread, we can coordinate that data flow using this post task. So we're gonna parse the code and then pass that resulting AST through the bundle and minify tasks. And the important thing to know here is that none of these variables are actually holding values. These are just pointers to data that exists in the thread pool. Data transfer back to the main thread only happens when we await the result property of that last task. So doing this in a typical worker's implementation would normally take six hops as we saw. We executed three tasks. We need to pass a message down and back up for each of them. In task worklet, this is only two thread hops because we can transfer data between tasks. Task worklet is also backed by a thread pool. So let's say we start off with a task that produces a large set of images. When we post a task with some of those images as its argument, it will attempt to run in the thread where that data is already available. So data is never transferred between threads in this case and that leads to fewer thread hops. To take advantage of pooling though, if there's no optimal thread available, we will resort to transferring data between threads in order to get parallelization. And then finally, let's say the result that we're looking for here is actually just a comparison of the number of cats versus the number of dog photos since that's what's important in the end. In this case, the only thing we ever transfer back to the main thread is a single integer. And as you can imagine, that's extremely cheap. So we've been thinking a lot about what the future of web development off the main thread might look like. Today, we have libraries like comlink that use reflection to kind of emulate the interface of some code running in a worker so that it could be called from the main thread seamlessly. In the future, we think we might move towards a task worklet model where developers approach multi-threaded web programming in sort of a different way. You have a thread pool that's managed automatically, named tasks, and this concept of a task graph that optimizes execution and data flow. So this is a really early proposal. We are looking for feedback and we are looking for real-world use cases. There's an implementation available in Chromium behind the experimental web platform features flag. And also we have a polyfill and some source code and demos available at this GitHub repo. There'll be a link at the end of the presentation as well. So there's been a lot of interest in this idea of multi-threaded JavaScript over the last couple years. There have been several independent explorations by various frameworks and apps. So we dug into this in the last few months to understand how far can we get with just using the worker API as a way to achieve threadedness. And to set some context here, a new worker doesn't just spin up a raw OS thread. It actually creates its own JavaScript environment on top of this. And part of that is what's called a V8 isolate, which has a non-trivial weight in addition to the weight of the OS thread. A key implication here is that the worker, by creating its own JavaScript environment, is not able to share data or code with the main thread. And this is fundamentally different from background threads on other platforms and other languages. So this has implications in terms of using workers in a mainstream way. And by that, I mean when the worker is in the path of user interaction. In particular, we looked at two app development models using worker. The first one is doing state management in a worker. And this is where you can do sort of the heavy lifting business logic-y stuff in a worker. And the second model goes even further and does the bulk of rendering in the worker. Now, while worker doesn't have access to the DOM, there are libraries like worker DOM so you can do virtual DOM updates in the worker and then ferry the diffs back to the main thread. So real apps have been built using these models. However, there are some significant challenges that we wanna sort of highlight here if you're planning to go down this route. The first thing is that it's hard to have synchronous access from to a worker, but real apps need synchronous access to their app state. So what this means is sometimes you wanna look up your app state on the main thread and sometimes you might update that app state on a worker. And this means you now have to maintain and replicate this app state in both places and synchronize it continuously. And this has a cost in terms of thread hops. The second thing here is that the worker has to be bootstrapped with all the script and modules that it needs. Because like we said, it cannot share code with the main thread and this has implications for startup delay. So we run benchmarks and to kind of dig into this base cost of a worker and these are some numbers from a medium Android device. Startup takes upwards of 10 millisecond. This is again a Chrome on Android. A thread hop varies anywhere from one to 15 milliseconds depending on the device and the size and type of the data. Look out for a blog post that will be accompanying this talk in the next week or two and we will have some detailed links to our benchmarks and data there. We also set up more realistic benchmarks. These were, we built apps that were representing the app development models we mentioned that is the state management in a worker and rendering in a worker. And we did a ton of runs on real mobile devices both with and without worker. And we looked at a variety of metrics. Everything from loading metrics, memory metrics to rendering metrics such as frame rate and input latency. And we approximated input latency using cycle time. So again, the blog post will have more details on this. But I do wanna highlight one bit of interesting data. So this is basically showing runs with an app that is representative of rendering in a worker. The red runs with worker and blue is runs without worker. So what we are seeing here is that on worker, we are seeing a higher, more improved frame rate. But on the flip side, we are also seeing a higher input latency. So there's a fundamental trade-off here between improved smoothness versus user latency. Workers are able to sort of free up the main thread by offloading work. And so they can sort of free up that main thread to focus on rendering. And fewer, less grip means fewer long task hiccups on the main thread. Again, on the flip side, input latency suffers from thread hops. And the worker environment is a limited environment and doesn't have APIs. And it's not just the DOM. There are many other APIs that are still not available like media, audio, et cetera. So the key thing to take away from this is workers might be able to make your rendering smoother, but they might do it at the expense of a bit of input delay. There's cases though where this is completely worth it. So AMP Script is a great example. AMP Script renders using workers in order to sandbox potentially misbehaving JavaScript. Slower or problematic code that's running in the worker in this emulated DOM can't negatively impact the AMP document. And so for AMP, the benefits they get out of sandboxing untrusted code far outweigh the latency that they get from transferring events. So we wanted to summarize when to use workers, but it turns out there's no perfect rubric for this. So there's a couple of hints you can use though. If you have code that blocks for a long time, if you have code with small inputs and outputs, or something that follows the simple request response model, you might be in a position to start off with workers. However, if you have code that relies on the DOM or is directly in the path of input response, or just code that needs really minimal overhead, you might want to start off with a different solution. You could approach workers later. When adopting a threaded approach to state management, make sure that your state management and business logic outweighs the cost of creating a worker and sending and receiving messages. Make sure that your worker is pulling its own weight. So we're at the beginning of a fairly major shift in how applications are developed for the web. We're excited to explore new possibilities for effective scheduling and threatening, and we hope that all of you are too. And so we just want to leave you with some of these key messages from our talk today. It's hard to achieve responsiveness guarantees because there's so much work happening in modern apps, and we think scheduling is a compelling strategy for tackling this. There's an opportunity here for improved scheduling with existing primitives as well as new primitives, and frameworks are in a good position to play a big role here. In terms of offloading work from the main thread, you can think of using worker as an extension to better main thread scheduling. Some types of work are better suited to worker than others, and we think new APIs like Task Worklet are going to be compelling to utilize worker for scheduling. So that's about it. We'll have a blog post coming with more details. These are, again, the links to the GitHub repos. Issues on the repos are very welcome and appreciated in a great way for the feedback loop, and do not hesitate to reach out to us on email or Twitter. Thank you.