 OK, let's get started. I'm Steve Kobus, and this is Life of a Pixel. I've been working on Chrome for about six years. And when I first started, I was a little overwhelmed by the complexity of the code base. There were a lot of moving parts, and there weren't a lot of resources that really explained how it all fits together. I remember I spent a lot of time just tracing through the code, like this piece calls that piece, and that piece calls another piece. But where's the part that actually puts the pixels on the screen? It was hard to see the complete chain of causality. But I figured it out, and I wrote it down, and it became this talk. So my goal is to present the entire pipeline to you in 45 minutes, show you what each step looks like and why it's there. The slides have a lot of pointers to actual code, which I think helps to make things more concrete and can serve as a future reference. But it means they're pretty dense, and I won't try to explain every class name on every slide in complete detail. If you want to explore further on your own time, the slides are at bit.ly-slash-life-of-a-pixel, and you can email me there if you want to know more about something. One of the challenges of a talk like this is the architecture is constantly evolving. So just be aware that I'm focusing on the current state of the code that is shipping today in Chrome's Canary Channel with the default flags. And in many areas, there are efforts planned or underway that change the picture dramatically. I'll mention some of them in passing, but I'm not going to talk too much about the history or the future directions. So this talk is about rendering, which is the process of turning content into pixels. I'm going to talk a little bit about content, and then I'll say a little bit about pixels, and then we'll dive into the magic in the middle. From a Chrome browser perspective, content means everything in the red box where the content of the page appears. And outside of content is browser UI elements like the tab strip, the address bar, the navigation buttons, the menus, et cetera. In Chrome, the content area is represented by a class called web contents, which is the primary public interface that the content namespace exposes to the embedded. As John mentioned in the last talk, a key part of Chrome's security model is that rendering happens inside a sandbox process. So an evil website might exploit a vulnerability in the rendering code, but the sandbox keeps the damage contained. So the browser itself is safe, and the other tabs are safe. So web contents encapsulates creating and managing renderer processes, and most of the things we're going to talk about today happen inside the renderer process. You've probably heard of Blink. We call Blink the rendering engine, but it's really a subset of the code in the renderer process that sits underneath the content layer. And the lines around what's in Blink versus what's outside of Blink have some historical baggage and are evolving over time. But essentially, you can think of Blink as the place where we implement the semantics of the web platform and all of the concepts and logic that are defined in the web specs. From a rendering perspective, content is the generic term for all of the code and assets inside a web page or the front end of a web application. So obviously, that includes HTML, which is text. Markup surrounding the text, like the paragraph tags, an example, includes CSS or style, like this style rule, which selects all the paragraph elements and assigns the value of red to the color property. So that tells the engine to render all the paragraphs of red text. And then there's JavaScript. And pretty much everything about the state of the rendering can be modified on the fly with JavaScript. So you can change the text, put new markup in, change style values on specific elements, change style rules, et cetera. And of course, you have external resources like images which can be embedded in the HTML. And those are the basic building blocks of a website. Of course, there are many other kinds of content that are rendered in special ways that I'm not going to go into today. So just to emphasize, a real web page is just thousands of lines of HTML, CSS, and JavaScript delivered in plain text over the network. Web pages don't require any sort of compilation or packaging like you might find on more traditional application platforms. And the implication of that is that the source code for the web app is literally the input to the browser's rendering pipeline. And the simplicity was key to the success of the web in the early days because you could just write some markup in a text editor and suddenly you have a web page. At the other end of the pipeline, Chrome has to put pixels on the screen using the graphics libraries provided by the underlying operating system. And on most platforms today, that's a standardized API called OpenGL. On Windows, there's an extra translation to DirectX. And in the future, we may support newer APIs, such as Vulkan. These libraries provide low-level graphics primitives. But you see things like draw this polygon at these coordinates and to move at this buffer of virtual pixels. But obviously, they don't understand anything about the web or HTML or CSS. So I've explained where we're starting and where we're trying to go. So the overall goal of rendering is to turn HTML, CSS, JavaScript into the right OpenGL calls to display the pixels on the screen. But as we go through the talk, keep in mind a second goal, which is that we want the right intermediate data structures to update the rendering efficiently after it's produced. And answer queries about it from script or other parts of the system. So what I'm going to describe as a pipeline or a lifecycle, and it's broken into a bunch of stages. So it'll turn content into something and then turn that thing into another thing and so on. And that's partly because rendering is too complicated to express as a single operation. But it's also because those intermediate data structures let us update the rendering more efficiently later on. That's goal number two from the previous slide. Because maybe we don't need to run all of those stages for every update. And once we've finished describing the first version of our pipeline, I'm going to come back to this notion of updating the rendering and introduce some new concepts that help us optimize it. And that let's jump into the first stage. We're in Blink Code now in the renderer process. And the first resource that comes down from the network connection is typically HTML. The other resources types like CSS, JavaScript, Images are either embedded in the HTML or brought in as secondary resources. So our starting point for rendering is the HTML parser, which receives that stream of tags and text. And HTML tags impose a sort of semantically meaningful hierarchical structure on the document. For example, that div might contain two paragraphs and each paragraph might have some text inside it. And so the first step in rendering is just to parse those tags to build an object model that reflects this structure with parents and child and sibling pointers. And we call that the document object model or DOM. The DOM is a tree of the sort found in computer science where the trees are upside down. And this is the first of many trees that we'll encounter in the engine. And the reason they're all trees is that they're all based on the DOM, which is based on the structure of HTML. When we talk about nodes, we usually mean the nodes of the DOM tree. The DOM serves double duty as both the internal representation of the page and the API exposed to JavaScript for querying or modifying the rendering. So the V8 JavaScript engine exposes the DOM web APIs that create elements or pen child as thin wrappers around the C++ DOM tree through a system called bindings. There can actually be multiple DOM trees in a single document because HTML supports something called custom elements. So you can build these fancy reusable widgets whose internal DOM is encapsulated in something called a shadow tree. And the shadow tree can have placeholders called slots for the embedder to inject some markup. That's just something to be aware of if you're doing any DOM traversals. Instead of all these separate trees, what you often want is something called a flat tree traversal, which descends from the custom element into the shadow root and descends from a slot into its assigned node in the host tree. Now that we've built the DOM tree or DOM trees, it's time to look at the styles. And each style rule has a selector, in this case it's and a list of property values. The selector selects a subset of DOM nodes that it should apply to, in this case, the two paragraph elements. And the style properties are the control knobs that web authors use to customize the rendering of DOM elements. So there's a style property for just about any setting you might imagine for changing formatting or colors or margins or positioning or backgrounds. And sometimes a style influences only that particular node and sometimes it influences the rendering of the entire DOM sub tree below the element it's applied to. For example, a rotation transform on a node. You're rotating not just that node, but everything rendered by the descendants of that node. Now, so not only are there lots and lots of style properties, but it can be non-trivial to determine which elements of style rule actually applies to. Like this one is every alternating paragraph inside any div that does not have the class named foo. And some elements may be selected by more than one rule with conflicting declarations for a particular style property and there's complicated precedence semantics. So the style engine's job is to sort all of that out. When we first encounter a style sheet, we parse the CSS text into an object model of the style rules, which has rich representations of the selectors and the property value mappings. And these style rule objects are indexed in various ways for more efficient lookup. Another thing to note in the upper right corner of the slide is that style properties are defined sort of declaratively by this giant JSON file in the Chrome code base. And a lot of those C++ classes that represent specific properties in the object model are auto-generated by Python scripts at build time. So once we have an object model representing the parsed style rules, we have to figure out how those styles actually apply to our DOM elements. Remember, the style rules are selecting complicated overlapping sets of nodes with precedence semantics. But at the end of the day, we wanna know the resolved value of any given property for any given DOM element, like what's the top margin of this particular div or what's the background color of the body. So style resolution takes all the style rules and walks the DOM tree with pre-order traversal of the DOM tree, producing a computed style for each element. Computed style is just a giant map of property to value. So it hangs off the elements and says this element is red and italicized and has a two-inch margin or whatever. And that's the output of the style engine. Chrome Developer Tools lets you actually play around with this because you can highlight an element, look for the tab that says computed and values there are mostly coming straight from the blink computed style object. Although they're not exactly the same because there's a few properties that actually get augmented with layout information. For example, the default width of the div as far as the style engine concern is just auto. But if DevTools will show you the actual pixel width like 800 pixels. So now that we've built the DOM and computed all the styles, the next step is to determine the visual geometry of all the elements. And you can think of an element as occupying, you can think of each element as occupying one or more rectangular boxes inside the content area. So the job of layout is to just compute the coordinates of those boxes. In the simplest case, layout places blocks one after another in DOM order, descending vertically. We call this block flow because the blocks flow down the page. The content inside a block is broken into lines and individual runs of texts and inline elements like span generate boxes within the lines. In Western languages, the inline boxes flow from left to right but in languages like Arabic or Hebrew, they flow from right to left. To figure out where text runs start and end and where to break a line of text onto the next line, we have to measure it using the fonts from the computed style. And layout uses a library called harfbuzz to select glyphs that correspond to the characters in the text and compute the size and placement of each glyph accounting for fancy things like kerning and ligatures. Layout also computes multiple kinds of bounding rects for a single element. For example, if the insides of an element are larger than its declared border box, we have a situation called overflow. And layout has to keep track of both the border box rect and the overflow rect. The interesting thing about overflow is you can make it scrollable. So another side effect of layout is computing scroll boundaries. That's like min and max scroll offset and a reserving space for the scroll bars. The most common scrollable DOM node is actually the document itself, which is the root of the whole tree. But CSS lets you make any element scrollable. More complex layouts are there for things like table elements, styles that specify breaking content into multiple columns, or floating objects that sit to one side with content flowing around them, or East Asian languages that have text running vertically instead of horizontally. But in all of this note, how the DOM structure and the computed style values like float left are the inputs to the layout algorithm. So each pipeline stage is using the results of the previous stages and producing outputs that influence future stages. Layout operates on a separate tree linked to the DOM. We call this the layout tree. And the nodes in this tree implement the layout algorithms. So there's a bunch of layout classes, like layout box, layout inline, layout table, et cetera, depending on what layout algorithm the element needs to use. But they all inherit from the common base class layout object. So every node in the layout tree is a layout object. Before we can do layout, we have to build the tree. So that happens actually at the end of a style resolution stage. But then the layout stage is just traversing that layout tree, figuring out all the geometry data, figuring out the line breaks, the scroll bars, et cetera. The last slide showed DOM nodes as being sort of one-to-one with layout objects. And that's true in simple cases, but there's some notable exceptions. For example, if you set display none on a node, then it does not create a layout object because we're not going to render it. And it's not occupying any space in the constant area. And sometimes you can have a layout object without a node. For example, layout block and layout inline are not allowed to be siblings. So if your inline and block DOM elements are siblings, we create an anonymous layout block to wrap the layout inline. It's even possible for a node to have more than one layout object. And finally, we talked about shadow DOM a little bit earlier. So the layout tree is based on the flat traversal, which includes all of the shadow roots. So a layout object might even be in a different DOM tree from its layout container. The layout engine is in the middle of a rewrite called ng for next generation layout. So right now, the tree has sort of a mixture of legacy layout objects and ng layout objects. And the biggest change in ng is that we have a cleaner separation of inputs and outputs. We used to put all of it on the layout objects, and the layout object had a list state that got modified during the layout. And it was hard to scale. With ng, we have immutable objects for the inputs, the algorithms, and the results of the algorithms. And this lets us be a lot smarter about caching. And it makes it easier to build new layout algorithms. OK, we understand the geometry of our layout objects. Now we're ready to paint them. And paint is recording a list of paint operations. Paint ops is where we start to get things that sort of look like graphics API calls. So a paint op might be something like draw a rectangle or draw a path or draw an image or draw a blob of text. And it would have parameters for the coordinates and the colors and so on. So this list of paint ops, they're wrapped in things called display items that have pointers back to the layout objects. And that whole thing is wrapped up in something a container called a paint artifact. Paint artifact is the output of the paint stage. But so far, notice we're just building a recording of the paint ops that can be played back later. So we're not actually executing the paint ops yet. And we'll see why that's useful in a minute. It's important to paint elements in the right order so that they stack correctly when they overlap. So paint uses something called stacking order, which is a little different from DOM order because it can be controlled by a style property called Zindex. And in this example, the yellow box paints after the green box, even though it comes first in the DOM. It's even possible for an element to be partially in front of and partially behind another element. And that's because the way CSS is written, paint runs in multiple phases. And each paint phase does its own traversal of something called a stacking context. So now the blue box is painting after the green box within each phase, but the background phase paints all of the backgrounds before the foreground phase paints any of the text. So we built some paint ops. We haven't executed them yet. And the paint ops and the paint artifacts are executed by a process called rasterization. And raster turns some or all of the paint ops into a bitmap of color values in memory. So each cell has some bits that encode the color and transparency of a single pixel. Raster also includes decoding any image resources that are embedded in a page. So the image will come off the network in a format like JPEG or PNG. And the paint up will reference the compressed data. Then raster is where we invoke the appropriate decoder to decompress the image into a raw bitmap. Okay, so we said bitmaps and raster are in memory. And usually it's GPU memory, referenced by a texture identifier in OpenGL. In the past, we would first raster to main memory and upload it to the GPU. And we still do that in some cases, but modern GPUs can actually run shaders that produce pixels directly on the GPU. And this mode is called accelerated raster. But in either case, the result of raster is a bitmap of pixels in some kind of memory, not yet on the screen. So there's some steps ahead of us. Raster issues OpenGL calls through a library called Skiya. And Skiya provides sort of a layer of abstraction around the hardware. And it understands some more complex things like paths and Bezier curves. Skiya is open source and maintained by Google. It ships in the Chrome binary, but it lives in a separate code repository in third party. It's also used by other products, such as the Android OS to do their own rendering. So when it's time to raster our display items, those paint ops are making calls onto an SK canvas object. That's a Skiya canvas. And that goes through some more abstractions inside Skiya. Skiya's hardware accelerated path actually builds another buffer of drawing operations, which gets flushed at the end of the raster task. During that flush is where we get to what looks like the actual GL commands that are building the text room. But there's a problem. So those GL calls aren't quite what they look like because of the security sandbox. This is in the renderer process and Skiya can't make system calls directly from the rendering. So what actually happens here is we initialize Skiya. We give it a table of function pointers and we say here is the GL API for you to use. But instead of real open GL, they're backed by these stubs in Chrome that proxy the calls into a different process in a special data format called a command buffer. And the GPU process decodes the command buffer and issues the real GL calls through its own set of function pointers. John talked a little bit about the GPU process. It exists first to escape the renderer sandbox for these GL calls. And secondly, because your actual graphics drivers might be unstable or insecure and isolating the GL in the GPU process gives the browser some protection from that. Like if the GPU process crashes, Chrome can just start it back up again without you even noticing. There's also a newer mode where instead of shipping command buffers below Skiya, we ship the original paint ops into the GPU and run Skiya on the GPU side. This has some performance benefits and will make things easier when we move to Vulkan which is the next generation replacement for open GL. But that's still being built. On most platforms, the GL pointers inside the GPU process are the real open GL. So they're initialized by dynamic lookup from the systems shared open GL library. On Windows, we have yet another indirection where they come from a library that we provide called Angle represented by the red box that's at an angle. And Angle's job is to translate open GL to DirectX which is Microsoft's API for accelerated graphics on Windows. There's also native open GL drivers for Windows but apparently they are good enough. Okay, let's review. So we've gone all the way from content through DOM, style, layout, paint, raster and GPU to pixels in memory. But it's about to get a little more complicated. And to motivate this, first remember the rendering is not static. There are all kinds of things happening during the browsing session that can change the rendering dynamically. And running the full pipeline is expensive. So we wanna avoid as much unnecessary work as possible. So think about change over time. We have this concept of animation frames. Each frame is a complete rendering of the state of the content at a particular point in time. And when we want smooth motion for scrolling or zooming or CSS animations of page elements, the gold standard is 60 frames per second or 60 flips which is the v-sync interval on typical display hardware. And that means if we take more than one sixtieth of a second to render the frame, the motion will stutter and look janking. So one obvious optimization is to keep track of what's changed and reuse things that haven't changed. So each pipeline stage has its own concept of granular invalidation where you can say, for example, this node needs its layout recomputed on the next frame. And the next layout pass will go through the tree but it will only run layout on the nodes that have been marked as a meeting layout. But that only gets you so far, especially if you're transforming a large visual region which is common for animations and scrolling. Rerunning paint and raster for the entire region after every scroll event would be pretty expensive. The other thing to keep in mind is that everything on the main thread competes with JavaScript. So even if you're rendering pipeline is super fast and it's still to get jank if the script is doing something expensive before the rendering even starts. So this sets the stage for an optimization that we call compositing. The compositor introduces two fundamental ideas. Break the page into layers and combine them on another thread. So a layer is like a piece of the webpage that can be transformed and rastered independently of the other layers. If you've ever played with layers in Photoshop it's a little bit like that. DevTools gives you this 3D view of the layers. And you can see that we're building them on the main thread and sending them off to another thread called Imple. Imple means compositor thread for reasons that are obscure and historical. The orange border here shows that we've created a layer for a DOM element so that we can animate it efficiently. Now instead of redoing raster on every animation frame, which would be slow we raster the layer once and then we move that bitmap, the rastered output we just move it around on the GPU. And if a container has a layer then it's children become part of that layer. So you can think of a layer as like capturing a subtree of content for independent raster. And it turns out all the important cases of fluid motion of part of a page can be expressed in terms of composited layers. Like we can have layers that are transformed by animation or scrolling or pinch zoom and scroll containers can clip a layer of scrolling content so that it's only visible within certain region. And that means the compositor thread has everything it needs to actually handle like scroll input events while the main thread is busy with other things like JavaScript. So the browser process gets in from the operating system that the user like move their finger on the touchscreen and it forwards that to the renderer process where the compositor thread actually gets the first crack at handling it. And if it's just scrolling a composited layer then the compositor can do that without even talking to the main thread because it already has the rastered output of that layer. But in some cases the compositor might say, no, I can't handle that. Like if you scroll something that doesn't have a layer or you have blocking JavaScript event listeners then the input thread, the compositor thread will forward that input to the main thread. And that takes longer because the main thread is busy. So composited layers are represented by this graphics layer class in Blink which wraps a CC layer. CC stands for Chromium Compositor. And today the layers are created from the layout tree by promoting things with certain style properties like animation or transform. The other thing to note here is an intermediate step called the paint layer tree. So paint layers are sort of like candidates for layer promotion. So not every paint layer necessarily gets a composited layer. Elements that are scroll containers actually create a whole set of special layers for things like the borders, the scrolling contents, the clip and the scroll bars. These are all managed by a class called composited layer mapping. So building this layer tree is a separate life cycle stage called the compositing update. And today it happens after layout and before paint on the main thread and each layer is painted separately. So the layer will have its own display item list with all of the paint ops that were produced when that layer was painted. When the compositor draws a layer, it's applying various properties to the layer like the transformation matrix, the clip, the scroll offset effects like opacity and reflection. And the data for those properties are stored in what we call property trees. We used to store them on the layer itself but we've recently decoupled properties from layers in order to make the compositing architecture more flexible. Like in theory, the compositor thread could actually apply these properties to any chunk of paint ops that shares the same values for the properties even if it doesn't have a layer necessarily. But we're not quite at the point of doing that. So in the same way that the compositing update builds the layer tree, we have a stage called prepaint which builds the property trees. And in the future, we're gonna create layers after paint instead of before paint. So that's a project called composite after paint. And the goal of CAP is to let us make more flexible and fine-grained compositing decisions and to decouple some of the things where previously if you became composited, then it would have all these other effects on properties. So CAP requires that decoupling because paint, of course, needs to know about the paint properties. But in the future, it's not gonna know anything about the layers because the layers are gonna be created in the next stage which is the paint artifact compositor. So after paint is finished, we run something called the commit which is just updating copies of the layers and the property trees that we just talked about on the compositor thread to match the state of the main thread. The commit runs on the compositor thread with the main thread blocked so that it's safe to read the main threads data structures. So let's revisit what raster looks like now that we have compositing because remember the point of layers is that they raster independently. And layers can be big. Remember for a scroll container, we have a layer for the entire scrolling content and most of it might not be visible. So it'd be expensive to raster the whole thing if we're only gonna see a small part of it. So the compositor thread is actually dividing the layer into tiles and tiles are the basic unit of raster work. So tiles are rastered with a pool of dedicated raster threads and the compositor thread has this thing called the tile manager that's sort of creating the tiles and scheduling raster tasks on a worker pool prioritized based on the distance from the viewport. So if you're far away from the viewport it's not as important to raster right away but if you're like just below the viewport maybe you'll become visible really soon so we should raster that sooner. And so each raster task is just producing a bit of that for that particular tile. This slide doesn't show it but a layer actually has multiple tilings for different resolutions. So those are all combined into like a tiling set. Once all the tiles are rastered the compositor thread generates draw quads. And a quad is just like an instruction to draw a tile in a particular location on a screen taking into account like all the transformations and effects and things that were applied by the property trees. And each quad is referencing the tiles rastered output and memory. And all the quads are wrapped up into something called a compositor frame object which gets submitted to the GPU process which we'll talk about in a minute. But we've now reached this is the output of the renderer process. Remember we said the renderer was producing animation frames ideally at 60 flips or at whatever speed it's able to produce them. So these are the frames that it's producing as the compositor frame. So we've seen that raster and drawing are both happening on the compositor threads layer tree after the commit. But we've also seen that raster happens asynchronously on a pool of words of threads. So this creates a little problem when a new commit comes in because we're not ready to draw it yet because it hasn't been rastered but we'd like to continue drawing something. We'd ideally like to continue drawing tiles from the previous commit while we're waiting for the new commit to finish raster. So we saw this by having two copies of the tree. The pending tree is the one that's receiving the commit and rastering itself. And when it's finished rastering and ready to draw we have a step called activation which just copies the pending tree into the active tree. Okay, so now we understand how complete animation frames are produced by the compositor by the renderer and submitted to the GPU process. But remember there are multiple renderers submitting frames. You can even have like a website that has an iframe that embeds another website from a different origin and those sites will get separate renderer processes. But of course their output has to eventually be kind of stitched together. And actually the browser process has its own compositor generating frames for the browser UI outside of the content area. So all of these surfaces are submitting their frames to a thing in the GPU process called the display compositor which runs on the Viz compositor thread. Viz is like the service that's inside the GPU process that handles this short for visuals. And so the display compositor synchronizes these frames as they're coming in and it understands sort of the dependencies between them where surfaces are embedded inside each other. And Viz is also what issues the OpenGL calls that ultimately display the quads from the compositor frame. So the output of Viz is double buffered. So it's drawing those quads into a back buffer and it issues a swap command to make it visible on the screen. Display actually runs on its own thread in the GPU process. So those GL calls are proxied from the Viz compositor thread to the GPU main thread over command buffers similar to what we did for Raster. And the end result of display compositing is that the pixels are finally on the screen where the user can see them. So click recap. We've taken web content, built a DOM tree, resolved its styles, updated the layout, created compositing layers and property trees, painted the layers, committed the layers, paintups and property trees to the compositor thread, broken the layers into tiles, rastered the tiles on worker threads, copied the pending tree to the active tree, generated draw quads from the rastered output, submitted those quads to Viz and displayed them on the screen in the GPU process. Most of the pipeline runs in the renderer but the GL calls for raster and display are in the GPU. And the core rendering stages like DOM style layout paint are in blink code on the main thread but the input events for scrolling and zooming can hit the compositor thread and update layers there while the main thread is spacing. Okay, we are reaching the end of life of a pixel also known as death of a pixel. Don't worry if you didn't follow all the details we know rendering is huge and complicated and probably more complicated than necessary in some places. I hope we can make it simpler over time. In fact, a lot of the refactorings that are currently underway actually have that as a goal, making things simpler even though they're maybe adding complexity in the short term. So things might get better but I think we do need to invest engineering resources into simplifying the system, paying down our technical debt and explaining and documenting the complexity which I hope I can make some contribution towards. Anyway, there's the link to the slides again and if you have feedback about the talk, especially if there's anything I got wrong, please email me because I know I was thinking about ways to improve it.