 Back in the old days, when jQuery had just taken the web development world by storm, most developers had a very clear idea of what a good build process for the web looked like. An application had a list of scripts to be included on the page, they all got concatenated into one big bundle and a static script tag in the HTML pointed to the latest version. But over time, user expectations changed. Guests were expected to do more and at the same time, they were still expected to load fast even on slow mobile connections. Lately everybody seemed to agree that the answer is to split the big bundles of the past into smaller and smaller chunks. But is smaller always better? Is there a limit to how small we can go? Is there such a thing as a minimum viable chunk? Hello everyone. My name is Jan. Some facts about myself. I'm an occasional collaborator on Node.js. Inside of Node.core, I'm working on better support for ES modules. I'm gainfully employed as a software engineer at Google and if you want to find me on Twitter, my handle is at Jan Krems. Before we get started, two disclaimers. First, there is some practical advice in this talk, but it is not a step-by-step instruction for the perfect webpack setup. If that is what you're looking for, I'll be linking a pretty good guide at the end of this talk. And secondly, these are not the opinions of my employer. And with that out of the way, let's jump in. Since this is about finding a minimum viable chunk, let's quickly recap what a chunk is. For that purpose, here's a very crude illustration of building a web app. The build process collects all the source files that belong to the application, analyzes them and then distributes the code across a number of output files. And we call each one of those output files a chunk. Because it contains a chunk of the application code. Let's stick with this illustration a little longer though, because you might have noticed something interesting. If we remove the colors, the left side and the right side are eerily similar. It does look like we are taking code that has already been split up into nice individual units. Combine it just to then split it up again into various files. We could make our lives a lot easier if we kept the separation already present in the source files. Why is that not our minimum viable chunk? To answer that question properly, we'll have to look at what makes a set of output chunks viable. What makes a set of chunks viable? Fortunately, the answer is simple. And it's my two favorite words. It depends. I know, it's not a very satisfying answer. And I think we can do better. On what does it depend? We could look at how the way the code is split into chunks affects user experience. There are many different factors that influence the final experience, but here I want to focus on three. Each of them is directly affected by the way we decide to bundle the app. And each of them could lead to an unacceptable user experience if we ignore it. The three are download efficiency, cache hit rate and code execution time. The download efficiency determines the time it takes to transfer the application to the user's browser. Or at the very least the parts of the application that the user needs right now. If this is too slow, the user may not stick around long enough for anything else to matter. But no matter how fast the download is, we don't want the user to load everything from scratch if we can help it. To provide a great user experience, we want to serve responses from cache as often as possible. This will determine if our app is fast enough on average, not just in the worst case. After the code has been loaded, it still needs to run. It doesn't really help the user if the code loads super quickly, but then takes too long to execute. This part is even relevant when all code came from the cache. It's all about how quickly we are done running it. And there are two aspects to this. On one hand, we need access to all relevant code. If we only discover that we need additional code halfway through execution, it will count against the execution time. On the other hand, we really don't want to accidentally run code that's not needed right now. That will cause delays as well. Download efficiency, cache hit rate and code execution time are, at least to some degree, mutually exclusive. One fact, this kind of diagram is called a ternary plot, which I learned while making the slide. The point is, we can't fully optimize for one of them without sacrificing something about the others. If we are somewhere in the middle of this triangle and we want to move closer and closer to the download corner, eventually we'll have to move away from both execution and cache hits. What does that mean in practice? Let's ignore execution and just look at download efficiency and cache hits. If we are focused on getting more cache hits, we might come to the conclusion that smaller chunks are always better. We'd start with a chunk of any size. We'd see that a change towards the end of the chunk inveterates the cache for the rest of the chunk. And so we'd split it up. As long as there's any way to split chunks into smaller pieces, we would have to continue. At the end, we'd have the perfect cache hit rate. Every tiny section of code gets its very own chunk, cached independently from the rest of the program. But what happens when we don't hit the cache? What happened to our download efficiency? It got real bad. And one reason is compression. We usually don't send the actual file contents over the network, especially for text files like JavaScript code. What we send instead is a compressed version. The principal idea is to try to find patterns in the input to reduce the size of the output. And that is a problem as the inputs get smaller and smaller. If an input contains 10 function declarations, there's an obvious pattern that can be used. If it's a single function, not so much. Which means the same amount of code split up into many small chunks will need to transfer more bytes over the network. Each of the chunks will be harder to compress individually than when they were still part of the same response. And so eventually we get to the point where we have to make a choice. Do we keep splitting chunks to get higher and higher cache hit rates? Or would the download efficiency become unacceptable if we do? A viable solution would have to fall somewhere in between. We can repeat this exercise with the next edge of the triangle, the line between cache hits and execution. Let's take this example. We have two snippets of application code. On the left side, A. And on the right side, B. We'll assume that each one gets loaded on a different page. Because they both reuse the same logic, we put that shared code into its own chunk, C. Whenever A or B is loaded, C will be loaded as well. Let's check how we're doing on cache hits and execution. Things are looking good for cache hits. If anything changes in one of the three chunks, only that one chunk gets invalidated in the cache. And actually, no issue from an execution point of view either. Whether we run A with a common chunk or B with a common chunk, we're only running the code that's necessary. And even if we have client-side routing and both of them eventually run in the same window, there's no line of code executed twice or unnecessarily. But the structure of the code may change over time. And that is where it gets tricky. Let's make a small change. A removes one of the two imports. Now we have faced with a conundrum. We didn't change B or any of its dependencies. If we're interested in cache hits, we would want to preserve the cache of B and of C. Neither of them has changed after all. But wearing our execution time hat, we don't want to run code that's not necessary. And if C doesn't change, then whenever we load A, we will run code that isn't necessary. So we have to choose again. Do we change or restructure the common chunk so it always reflects the latest state of what is actually shared? Or do we need more cache hits? There's an inherent friction here between allowing global optimizations for fast execution on one side and stable chunks that stick around in the cache on the other. If you kept track, there's one more edge we haven't talked about. And that is the one between execution and download. At first glance, they may seem like the same thing. If we download more code, we execute more code. Intuitively, they're the same. But it's not always a one-to-one relationship. Sometimes it can take downloading more code to execute less code. Let's take these two modules. Entry is an entry point into the application. And sometimes that entry point imports M. But it is only known at runtime if it will import M or not. In real code, this could be because it depends on certain browser features or because it depends on data that was loaded from an API. For this example, we'll say it's random. And to clarify, this example assumes that using dynamic import isn't good enough. In the cases where we need M, we can't wait for another round trip to get it. It's a crucial part of the initial user experience. Our build process assigned both of these files to the same chunk. And because we wanted to have the fastest possible download, we decided to use a technique called scope hoisting. Both modules are merged into one combined module. We saved all the bytes from setting export properties and calling require. And it certainly looks like execution should be cheaper as well. But in this particular case, we introduced a problem. Before we merged the two modules, calculate value was only executed when the value was actually needed. Now we're always running it. If that function is expensive, we just made the average execution time a lot worse. A practical example would be top level code that builds a complex data structure. Think a big static JSX fragment that get rightfully moved out of a component's render function. If we needed a lower execution time, we might want to preserve the module boundaries. This is a lot more code to download. But because we are running the module body of M, only when it's actually needed, on average it will execute more quickly. Now, this was very situational. Usually, scope hoisting reduces the download size and also leads to faster execution times. Also, in this example, we were talking about running the entire module body of M lazily. But the same idea applies to any kind of lazy value. For every lazy value, additional code has to be shipped to handle the lazy calculation. But if the value is needed, or until it is needed, execution can wrap up more quickly. What did all of this tell us about what's viable? We've seen that having more fine-grained chunks can mean that the download becomes too inefficient. We've seen that maintaining high cash hit rates may prevent us from adding important global optimizations. And we've seen that sometimes we need to download more code to ensure execution is fast enough. So, any viable solution will have to make some trade-offs between those extremes. With that, we are ready to draw some conclusions. We know what a chunk is. We know what makes a set of chunks viable. So, how small can we go without running into issues? Let's start with a throwback to our definition of chunk. We saw source files on the left and chunk files on the right, and it seemed awfully convenient to use the existing source file boundaries as our chunks. In development, doing just that can be absolutely viable. If you're using ES modules, you can try it out today. You might have already heard about Snowpack or ES Dev Server. Both of those tools effectively treat your source files as chunks. So, there's no need to run an extensive build process. And in development, build speed is often more important than a realistic user experience. If we want a minimum chunk that's viable for end users, we have to take one more look at the triangle. It turns out the triangle is actually a pyramid. In the earlier slide, one of the corners just wasn't quite visible. You might say there was some hidden complexity. I'm so sorry. This new corner represents how concerned we are about introducing more complexity into our system. If we want to get the smallest chunks that are viable in production, we'll have to accept some additional complexity. But it starts with small changes. This is an example of a manifest or digest file. It's a file that lists all entry points into the application and maps them to a fingerprinted file to be loaded in production. In this case, there's also an explicitly listed common chunk that will be loaded for all entry points. So, on the homepage, there'll be two script tags. One for the common chunk and one for the homepage chunk. The problem is there's only two ways to deal with code that is needed for multiple entry points. Either it has to be put into the common chunk or the same code has to be duplicated for each entry point that needs it. A huge improvement is to remove the assumption that there's a one-to-one relationship between an entry point and a production chunk. This can be done incrementally. Step one, add square brackets. Step two, move the common chunk into each entry point. This is also where we remove the hard-coded two script tags and use a loop instead. Step three, time to cash in. Now we can update the build config to create more granular chunks. For Webpack, this could mean setting split chunks to all. I have some good news at this point for users of frameworks like Next.js or Gatsby. All of this may already be taken care of for you. It may seem like a small change, but when Next.js and Gatsby rolled this out, many larger websites saw their total JavaScript size drop by 20 to 30%. So far, we haven't touched the application code itself, but what if we did? We can go one step further on the complexity scale and design our application for progressive fetching. Many applications already use route-level code splitting, which is a basic form of progressive fetching. It groups similar pages together and builds a special entry point to be used when loading that kind of page. This works well as long as all pages of that type are very similar. But when the pages are assembled dynamically using a variety of components, it isn't quite optimal anymore. Maybe there's a form that is only visible to locked-in users. Maybe there's an optional video player. Maybe the page has a comment section that's below the fold. Progressive fetching means designing our page in a way that loads the code we need as early as possible, but only the code we actually need. In this case, we may want the initial page to reference the form code, but only if the user is locked in. And we don't want to load the code for the comment section until we run out of other things to load or until the user actively scrolls down. Doing this requires actively designing the application to allow for this kind of fine-grained control of the load order. Adding dynamic import calls is a start, but to prevent waterfall behavior and unnecessary delays, it's likely not sufficient. Without that kind of architectural change, we'll quickly run out of meaningful chunks to create. But let's say we did all that. We have all these small chunks and our application can leverage them effectively. If we stop here, downloading the resources for our website may be highly inefficient. Unless we get very lucky with caching and also ignore first-time visitors, we need to deal with the download side somehow. We touch the build, we touch the application code, time to take a closer look at how we are serving chunks. The simplest solution is that we serve each chunk as a static file from a CDN. There's one strip tag per chunk, one HTTP request per chunk, and one HTTP response per chunk. As we covered before, transferring each of these small files on its own isn't the most efficient way to download the contents. Enter dynamic bundling. In its most basic form, the idea is quite simple. Instead of serving each file individually, there's one HTTP endpoint that accepts the ideas of multiple chunks. And all of the chunks are then sent back in one response. Congrats. We just reinvented the big bundle we styled with. But not quite. Since the bundled response only contains the chunks the client asked for, we are not over-fetching code. We are doing well on the execution side. And with the way we're combining the chunks, it's about as efficient to download as we can make it. But we did sacrifice our cache hit rate. One way to make up for it is to use a service worker. As long as the service worker understands how this HTTP endpoint works, it can compare the list of the chunk IDs in the request against the cache and then only request the chunks that aren't cached yet. And once it gets the response back from the server, it can extract the chunk contents and cache them individually. In the future, we may also be able to use web bundles for this purpose. With that in place, we have a solution that runs exactly the code that is absolutely necessary, downloads what it needs efficiently and can achieve a high cache hit rate. And that may be the minimum viable chunk. Thank you for watching. I hope you enjoyed this exploration of taking code splitting to the extreme. As promised, here's the link to the guide on setting up granular chunks. And if I made you at all curious about dynamic bundling, the folks from Netflix gave some great talks on how they use dynamic bundling to run AB experiments at scale. Also, my Twitter handle again, just in case. If you want to chat about EOS modules or novel ways to bundle web apps, that might be the best way to reach me. Cheers.