 So, Goya Grigorek has, like I said before, has been with us at every GoGorukko. It's great to have him back. He's one of the speakers we enjoy most. And when he started, he was a founder of his own startup company, and in the last year they were, well, last year, they were acquired by Google, where he now works on the Make the Web Fast team. So he's going to drop some knowledge on you. Yes, the creatively titled, let's make that a little bigger. All right, so I think many of us were watching a one-hour infomercial earlier this week done by Apple. And I think everybody took something different away from that. But to me, actually, the number that stood out, or the quote that stood out the most was that Apple announced that they've shipped more than or activated more than 400 million devices, which is remarkable. And of course, not to be outdone, Android immediately announced that they've activated half a billion devices on the same day. So you put those two numbers together, and you realize that we have almost a billion devices, smartphones. And the reason that's interesting to me is because all of those devices run WebKit, they have a browser. And not only that, but we have about 2 million new activations of these devices every day, which is amazing if you think about it. And what's so interesting about a browser, right? Like, I know that I've tracked my time, and I spend more than 50% of my time on my computer within the browser. So from an engineering point of view, you know, it's that thing that you throw HTML into, and sometimes it gets it right. Most of the time it lays it out wrong. But there are a billion devices running WebKit, and it is, at this point, the largest development or deployment platform available to all of us. So I don't care which platform you're developing for, iOS, Android, whatever, you are at some point using WebKit, whether that's UI WebView or WebView, or just a native browser app. And one thing that I feel that we're missing today is actual education or understanding of the fundamentals of how the browser works. Because when you look under the hood, it's actually an entire operating system. You could quite literally teach the entire computer science curriculum on the things that go on within the browser. We have everything. We have graphics. We have high performance networking. Believe it or not, we have machine learning. And with things like WebRTC, we even have distributed computing. There's virtually every branch of computer science that's in there. And not only that, but the browsers are pushing this platform or pushing the frontiers of a lot of the research in these fields. So every year, I like to take a step back and just kind of examine, what am I doing? What do I need to do as a developer to make good progress this year? And one of the things I realized early this year was that I don't really understand the browser. I mean, I write apps for the browser. I work with it all the time. But I don't actually know what's happening underneath. And over the last six months or so, I've actually spent time to dive in and understand and look at the source code. And this is actually kind of a remarkable thing. Until very recently, the browser was a black box. If you think about IE, we could not see what was going on. But these one billion devices, it's WebKit. The code is out there. If you prefer, you can use Firefox. We can actually look at this code. And the problem is the educational facilities, or even, let's say, universities, they don't teach you anything about the browser today. Because, frankly, we just advanced too fast. We'll take another decade before we'll see browsers, how to build a browser as a CS course. That course will come. But today, I think we need to fill in those gaps ourselves. And that's something that I've done over the last six months. And frankly, I think it's been one of the best investments I've made in many years. So understanding the stuff pays really, really high dividends. And what I want to do in the next 25 minutes or so, it's kind of an ambitious target, is to help you guys get started on this journey as well. So a browser is a big thing. There's a lot of code. If you do a checkout of WebKit or something like Chromium, that'll come out to be about 4.5 gigs. So don't just run that on Wi-Fi here. Do that at home. And there are also many major moving blocks, or major components. As we said, there's graphics. There's networking. There's everything else. So first observation, the browser is not a black box. If you ever paid attention to the WebKit logo, it's a white box. And not only that, it's an open white box. So there's already a couple of hints in there. So let's dive in. What's a WebKit exactly? It's kind of this weird breed. And a WebKit is not a browser. You can't actually build a WebKit and render a page. It's a browser engine, which needs a lot of other moving components. So the way to think about it is you have the core, which is, let's say, this WebKit thing. And you need to provide two things, one on the top and one on the bottom. There's a WebKit embedding API, which is the thing that you put the Chrome, the actual Chrome on top. Like, how does a browser look? Do you have tabs? Do you have other things? A bookmark manager is a good example. And then there's a platform API. And the platform API provides access to the capabilities of the actual machine. So for example, on your phone, you may have GPS. So you need to provide a bridge to WebKit to connect these things. So WebKit by itself ships with many components. But you can decide to swap them out. You can completely eliminate some parts and say, I'm just not going to provide it. But WebKit at its core is this WebCore component. That's where a lot of the stuff that I think most of us will be interested in is happening. And this is the part that is reused by all of the WebKit browsers. This is really what we mean when we say, this is a WebKit powered browser. We say, really, we're using WebCore. And the observation behind WebCore is that doing things like parsing HTML, constructing the DOM tree, getting the CSS object model right, all of those things are hard. And it takes a lot of engineering effort. So let's stop this insanity of building different object models and build one. And that is effectively what WebCore is. So resource dispatch and loading, parsing, DOM construction, and depending on your perspective, this is either the fun or the hard stuff. I thought it was fun. Then I realized it was hard. So then the other big part that ships with WebKit is the JavaScript engine. So WebKit by itself does actually come with JavaScript core, which is based on the KGS engine, which came out of KDE. But it's been revamped and rebuilt many, many different times. It actually provides a JIT and generational garbage collection. So by default, if you build WebKit, you will actually get a very good performing JavaScript runtime, which is exactly what you have when you run Safari. But of course, you can take that out and swap it in for something else, which is exactly what Chromev8 did. And you can replace that with any other engine. And in fact, many browsers do. So the platform APIs we touched on briefly, but this is where all of your components come in play. So for example, the network stack. The network stack differentiates many different browsers. The graphics engine, how you actually output stuff to the screen. So WebKit by itself will not render anything to the screen. You will need to provide those components yourself based on the platform. How you handle fonts turns out that this is actually a very hard problem. I recently started looking at the code for that and I ran away in horror. There's things like device capabilities, so location, storage, and sensors. How you provide your database implementation within the browser can be very different. You can use SQLite, you can roll your own, or you can do something else entirely. So the point here is that a browser is an implementation or a combination of all of these things. And depending on how you combine these different components, we'll dictate the performance and the capabilities of the browser. So just to highlight some differences, this is not a complete list. And I hope you guys can see that. But here's some differences. So let's say something like Chrome on OS X. So in fact, Chrome tries to reuse as many components as it can across all the different platforms just to keep it sane. So for example, for rendering, we use Skiya. For networking, it's its own networking stack. For fonts, it depends on a platform on OS X. There will be Quartz. JavaScript, of course, we have V8. Now compared to something like the Android browser, those guys went out and implemented most of these components themselves. So once again, depending how you combine these things, you'll get very different behaviors, which is why, for example, even within WebKit powered browsers, you can have visual output that is different. Because the rendering system may composite the layers differently. And all of a sudden, you have these visual artifacts. So just because it's WebKit doesn't mean it's uniform and all the same. So that's the architecture in two seconds. But what does it actually take to put together a page in WebKit? The W3C performance working group came up with this really crazy and scary-looking diagram. And I like to use it just because it shows all of the different components that come into play. Each one of these black labels is a timer that the browser tracks for the lifetime of a page, or rendering of a page. And there are three major components. There's the network. There is a server and the browser execution. So the server, you guys know. And this is your Rails app server. There is the network. And these things are not drawn to a scale here. It doesn't mean that the network occupies most of the time. But it shows you all the different and moving components within the network stack. So in fact, we'll take a closer look at some of the things that Chrome specifically does to address some of these challenges in the network. And then there's browser execution. So there's a lot of different stuff going on. So let's dive in. Network stack. I spent a lot of time diving into the network stack. And I learned a lot. And I think it's very, very interesting. So first of all, it turns out that the pages that we built today, on average, are over a megan size, connect to over 30 different hosts, and send over 80 different requests for each page. That's pretty heavy. And yet, we demand that the pages load in 300 milliseconds. Now if that's not a miracle, I don't know what is. So what does it take to make this miracle? Well, it takes a lot of careful engineering, as I found out. So first of all, the browser is actually getting increasingly smart. It actually learns your behavior as you use the browser. It quite literally gets faster as you use it. So some examples would be DNS prefetch. And this is the simplest one. If you have, let's say we render a page and it has a bunch of links on it, we can actually look at those links and say, well, there's a chance that you may click on that link. So let me go ahead and pre-resolve that hostname, because once again, the DNS lookup on average takes anywhere from 50 to 200 milliseconds. And if you're on mobile, it will be perhaps even higher than that. So we can do that. Now that's a cool optimization, until you come to Wikipedia and you realize that you need to do about 500 DNS lookups on an average page. So now we need an algorithm to figure out which other hosts that we need to pre-resolve. So, okay, that's interesting. TCP Preconnect, that's another layer of optimization, where we're saying, well, we pre-resolved the hostname, but perhaps we can also open the connection and just kind of keep it idle, such that when you click on the link, we could just set the get request. So we already performed the TCP handshake, which takes a round trip off. So that's another 50 to 100 milliseconds. Interesting. Pooling and reuse, so this is, we need to reuse connections where possible. We know that web developers in general don't tend to think about that, so we need to fix that problem for them to the extent possible. And of course, caching, what's a good caching strategy, especially on something like the mobile phone, which doesn't have a lot of space. And then one particular optimization, so I should say pre-fetch, pre-connect, pooling, caching, all of these things are implemented in most every browser in some variant. They will differ in some implementation details, but this stuff is basically in every implementation. Chrome does something specific, which is it actually learns the sub-resources as well. So what I mean by that is you go to, I don't know, cnn.com. The first time you come to cnn.com, the browser has to load all the resources, so it has to contact over 30 host names. It'll actually remember all of those host names. And the next time you come back, it'll say, hey, before we even get the HTML back from you, I know that last time I had to connect to these five hosts to get the static images, so let me try and pre-connect to those as well. So it basically builds this giant hash map and it keeps track of all of the requests and whether each of these requests actually succeeded. There's a chance that we may fail, that we may actually open a connection that's not needed. We keep track of that and then we have confidence intervals and estimations for every connection that we make. So all of these optimizations are there and they're trying to help hide some of the latency. So some examples that I'd like to show. This is a snippet from the actual source code where we actually have an enum for resolution motivation, which is, and the name should kind of give you a hint for this is when we will do the DNS pre-resolve and even TCP pre-connect and look at some of the reasons. Mouse over. So it turns out if you mouse over a link, we can pre-connect. That's usually a pretty good bet because you're about to click. And it takes the user about 100 to 150 milliseconds to do the click. So before you even click, we can already hide some of the latency. Omnibox, so if you're typing something in the Omnibox and we have good estimation that you may actually hit enter once you type this in and we know what you're going to type in, let's resolve that. Referrals and then things like self-referrals. So these are internal signals that Chrome uses. So I'll show you this. So for example, Chrome predictors. And this is where I hope that I haven't been browsing anything that you guys shouldn't see. So as you type into your Omnibox, you can actually go to Chrome predictors. And this is a map, basically a tree that shows you our estimate of where you're going to go based on what you've typed. So for example, if I type in A and A, there is an 87% chance that I'm going to analytics.google.com. And these green bars basically tell you that this is a, or Chrome estimated that this guess is good enough such that the DNS resolution may actually happen. Right, so if I type in git h, I'm probably going to githubarchive.org. So you can look at this data and there's a lot of interesting patterns in here. The other one that I like to show is histograms. And not a lot of people know about this, but there's probably about 50 pages of histograms in here where for just about every type of Chrome performance metric. So for example, image resampling, that's not interesting. Let's see, wide out duration. So the way to read this is, let's see, there's been 68, so 68 milliseconds and 3.2% of all of the completions happened within 68 milliseconds. So this is kind of a hard one to understand. So let's look at DNS. So you can look at DNS resolution and TCP connection latency. All of these metrics are available. You can actually peek inside of your own browser and understand how it's performing on your network or even with your app. So all that's to say is there's a lot of stuff going on, right? And of course, we try to optimize requests and we try to optimize against hiding all this latency, but the best request is a request that you don't make as a web developer. And the worst request is a request that blocks a parser, which of course begs the question of what the hell blocks a parser. And here's a simple example. Here's a minimal example that will work. So we have our very simple HTML5 page and I should say a valid HTML5 page too for the picky ones. And here's the magic trick, right? So we start parsing. So this document now starts arriving from the network. So we already passed the network stack and we start parsing this and we see the meta tag and we see the title tag and we start constructing this DOM tree right here. And then we hit the script tag and all of a sudden the world stops. And the world stops because you've threatened to do something very bad. You threatened to do doc write. So the browser has this very simplistic but very convenient concurrency model which allows you to at any point modify what's going to come next in the DOM tree. So not knowing anything else, we have to stop the world and wait for that script to come down to execute and only then can we proceed. So if we go back here, this link tag cannot be parsed until this application.js file is downloaded from the network and executed. Which is a source of a lot of latency and this is why you see the blank page while your JavaScript is loading. So that's not a good deal. The only two ways to work around this is through these two attributes called async and defer. And I'm not going to go into the details of what those are. I encourage you guys to look it up. But basically what both of those attributes say is trust me, I won't do a doc write. At which point the parser says, okay, all right, I can move forward. But what if we can't, right? What if you weren't kind enough to provide one of these async or defer attributes? Well, if you actually look at the document parser, which by the way, most of the code is in C++ but it's actually in very readable C++ so don't be scared. And here's an excerpt from the document parser and this line right here is waiting for scripts should be a pretty good tip off. Like we have a problem. We're waiting for scripts. And what it says is, you know what? If I'm waiting for scripts, let's start a preload scanner. So what the heck is a preload scanner? Well, a preload scanner is optimized to do one thing and one thing well, which is we need to identify critical resources and a critical resource is once again, another script or maybe a style sheet or something else that could block rendering. So technically, an image is not a critical resource, although we do still prioritize it because it won't block rendering. And all that it does is move ahead in the incoming stream and scan for these attributes. So it doesn't even do a smart parse of the document that literally just looks for like angle bracket image. Okay, I need that and let me extract that because we don't want to be spending time to parse that just to throw it away later. So that's the preload scanner. And the preload scanner allows you to do very interesting things. So for example, we have a page here that's actually taking 1.5 seconds to render, to return. But you can see that the style sheet was actually loaded about 200 milliseconds in. It also took a second, but it came in parallel with the actual HTML. So this is an example of the preload scanner working. And I encourage you guys to take a look at some of the code in this, and for example, in this gist file to understand what's going on. This is why things like early flushing and some of the performance work that gone into Rails 3 for allowing you to flush your templates earlier is so important because it allows the parser to actually forge ahead and start downloading all these resources not wait until the entire page is complete. So some lessons learned from this. We have the network stack. The network stack feeds data into the tokenizer. We of course don't wait for the entire page. So we feed it byte by byte. And two things happen there. We construct the DOM tree as we saw, right? And that DOM tree is what you eventually will see. If the DOM tree is blocked, the preload scanner is moving ahead and trying to find out what are the blocking resources. So that's a fairly efficient way to get your resources scheduled. The slowest way to get your resources scheduled is through script execution. So while it's very popular right now to move a lot of your dependency management into JavaScript that is actually in the long run probably the worst thing you can do from a perspective of a browser because you're hiding all of that information away from the browser. It can't help you, right? You're, think of this as your JIT and you're hiding all of the information from the JIT, right? So the JIT can help you. So that's a trade-off. There are many benefits to using a good script loader but there are some downsides as well and the biggest one is once you move it into JavaScript we can't help you. So not unlike the document preload scanner there are things like CSS preload scanners and other similar parsers that you can find in DOM tree or in the source tree for WebCore. So take a look. There's a lot of interesting stuff in there. If you do a lot of design work you should definitely at least read through the document parser code and understand the comments there. So speaking of which, let's build a render tree, right? So now we have a DOM tree. We've gone through the network. We've building this DOM tree. What is this render tree? Well, it turns out it's actually not a render tree it's an entire render forest and it's a scary one. So we have the DOM tree. We have the CSS object model and we have, and those two things come together into what we know as a render tree and when I talk to actual people building this code they constantly correct me on, well, are you talking about the render object, the render layer, the graphics layer or one of the other 15 variants of this tree? So the thing to remember is that there are many trees within the browser and the way to think about it is you have the DOM tree. The DOM tree has a lot of stuff that frankly we don't care about in the visual representation. So things like meta tags. You know, I'm not gonna paint that on the screen. So that doesn't even need to be in, for example, in the render object tree. So it's only stuff that's visible. Then depending on the type of element we may actually have a render layer which is to say some objects will actually get a dedicated layer. A good example is the video tag where the video tag is perhaps GPU backed and that by itself has a different tree. And the most important thing in this diagram is that these are different trees. These are, you know, they do share objects where possible but as with any system where you have multiple objects that are being balanced on the fly this is effectively a concurrent system. The moment you have to synchronize any time between all of these trees all bets are off when it comes to performance. So the best example of this is something like let me just query for offset width or offset height which is to say position of this element on this page. That basically signals to the rendering code that okay I need to flush all of my trees, stop the world, synchronize and everything and this is where it lives. It lives at this specific offset. Probably the worst thing that you can do from a graphics perspective. So you're not gonna get your 60 frames per second doing that. So speaking of which, 60 frames per second, right? We're building web pages, what gives? Turns out that's not the case. So in Chrome we actually just recently added this frames view which allows you to look at how much time it took to render a specific frame. So if you do the math, right? You're trying to have your buttery smooth scrolling on a page. For that you need about 60 frames per second. At 60 frames per second you have about 16 milliseconds to render each frame. That's your budget, right? The framework has to take some time to do that so let's be lenient and give it a couple of milliseconds. So really you should be counting on the fact that you need to do all of your work within let's say 10 milliseconds just to be safe. Preferably even faster than that. Now if you look at this diagram here you can see that this one frame right here it takes 46 seconds or 46 milliseconds to render. So I'm not gonna name the offender Mashable. But if you try to go to their site and start scrolling what you'll find is that on every single scroll event and I kid you not on every single one they're executing some standard banner JavaScript that's taking on average 20 milliseconds or more. So these events are not even accumulated and in fact sometimes we have to do it multiple times in a single frame. So if you ever felt like you're scrolling on a page and it's just really dragging that's probably what's happening. It's not your computer getting bogged down because of all the tabs. It's because the JavaScript that's executing likely due to some handler that's not properly set up. So we've done a lot of testing with this and have run some experiments and one counterintuitive thing that I've discovered is that it's actually better to be consistent than to jump all over. So for example, the best case is that you have 60 frames per second. The second best case is that maybe you're running at 15 frames per second but you should run at 15 frames per second consistently. It's much worse to be jumping between 15 and 60 where you kind of get this, look I'm going fast, now I'm going slow, now I'm going fast again. We can actually perceive that. So pick one, assume your budget and stay within that budget. And there's a really good talk on this from Google I.O. called Jank Busters that you guys should take a look at. So moving right along, so hardware acceleration. This one is kind of interesting. There's a nice sprinkled this on your code and everything will go faster that you'll find a lot of people talking about. It's called WebKit Transform Translate Z. And I kid you not, if you search for this you'll find many blog posts saying I just put this on all my divs and my pages are fast. It's awesome. I'm not sure why the browsers don't do this automatically. So there's a reason why we don't do this automatically. So what does this do? Remember when we looked at their render objects, we talked about the render layer. And as I said, some elements get their own render layer which is they have a backing store which is a GPU. What happens when you say transport translate Z is you're promoting the contents of that element into its own layer. And that layer is then propagated to the GPU. Now there is a gotcha with that. Pushing stuff off to the GPU is not exactly free either because to do that, first we have to paint the actual buffer or the view that you're trying to push into a texture. So think of it as a bitmap. That bitmap then needs to be transformed or transferred rather from the CPU of the main memory into the GPU memory. And that takes a lot of bandwidth. If you do the math for your, let's say regular smartphone like 720 by 1280 resolution, we need RGB plus alpha. So that's four bytes. You multiply that out and you'll find that we're talking about gigabits per second to do 60 frames per second. So it's very easy to saturate your graphics plus. And if all you do is you transform everything into its own layer, then you're pushing for every frame you're gonna be saturating all of your GPU and you probably destroy your battery in a span of a couple of minutes. So this stuff's not free and not only that, but GPU is very good at certain things like moving stuff along, right? GPU is very good at doing basic matrix transforms like move right or translate or rotate or alpha. Anything else, all bets are off. It needs to be repainted on the CPU first and then pushed the GPU. So a good example is something like changing colors. Changing colors requires that the CPU recompute the entire texture, upload it to GPU and then GPU can display it. You're actually much faster to just do it on the CPU. So the only free launch that you get with this kind of stuff is CSS3 animations. And the reason this is interesting is because you can actually, like this actually makes sense. Where you're saying, you know what? I'm going to have a spin animation on hover on this class right here. And spin is not some magic keyword in WebKit instead I'm just defining this key frame right here. And it's going to rotate this logo 360 degrees and it's just going to do this infinitely. This kind of thing the GPU is very good at, right? It's a basic image that needs to be transformed. Very fast, no CPU load, everything's good. The moment I try to change the color of this, we're done. So if you only remember one thing from all of this, right? And I think I spent a lot of time going through the code trying to understand what even matters, right? It's a four gig checkout. The first thing you can do is, you know, don't just drop everything that you're doing and say like, okay, I need to understand WebKit. You need to understand it piece by piece. So pick something that is, that you're working on right now and just spend a couple of hours or an evening. And the best way to get started is not to do the checkout as I found out, but actually just to go to code.google.com and type in a query. It turns out that the code is actually very well laid out and think of an element, for example, if you're doing a lot of CSS work, type in HTML link element and read through the source code. There's going to be the header file and the CPP file. So take a look at that. And then the two things that everybody should read are the document parser and the preload scanner, just to understand how a document gets constructed. So with that, I'll take some questions. How do you like the Chromey? By the way, this is not an official logo. So I don't get to play this much, but imagine I'm your boss, Larry Page. And I tell you, you got unlimited funding for the next two years. What are the top three things you would work on? And don't worry about the past. You know, if you want to fix JavaScript, do it. What are the top three things? You mean in the context of a browser? Yeah, to make the web faster. In the context of a browser. And it can be also mobile if you want. Personally, I wouldn't do three things. I would probably just take all that money, the infinite amount, and dedicate it towards education, which is very much the reason for this. So I do really feel, so I've had this conversation many times with many different people. And a reaction that I get oftentimes is like, really we should teach people about the browser, like what's so new about the browser? Like we teach file systems, we teach compilers, we teach graphics, a browser is just like an interesting implementation of all those things. And I think what people don't realize is that when people come out of that, you know, software engineering, computer science degree, or what have you, we don't build stuff for the Linux operating system, really. Most of the time we end up building for the web. And there's a lot of stuff that's hidden in implementation. So frankly, I would love to see a lot of change in the educational system, for example, in the computer science departments. You can now actually take classes that will teach you MapReduce, right? That wasn't true three years ago. And I think we'll get there. We'll actually have these courses available. But in the meantime, I think it's actually up to us to fill in these gaps, right? So I think that's the single best investment that many of us can make just to understand it. So, you know, there's a lot of technical things that we can talk about, like what should we invest into? But I think that's what will enable more innovation to come in the future.