 He wrote the high-performance browsing networking book for Riley, which is also available for free and the link's on his website. If the internet is a series of tubes, then this is one of the world's greatest plumbers. And together, for Ilya Kripovyk. All right, thanks, Jake. All right, so we're going to talk about a little bit about optimizing network performance and specifically some of the things that we've been doing on the Chrome team for helping deliver better apps. And I guess the first thing that we should ask is, doesn't matter, right? What's the problem that we're trying to solve? And Tony Gentlecorp is actually somewhere here in the room. I ran a number of different tests over the last couple of months where he's been kind of deep diving into, where do we spend our time? Like, when we tried to render a web page, what are the bottlenecks today? And he has a series of these posts in Blink Dev. If you guys are interested in kind of low-level guts of how Blink works and Chrome kind of end-to-end. But one test, to me, stood out in particular. And this is a test where we took the top 1 million Alexa sites and just ran into Chrome and looked at, where do we spend our time? Like, in terms of the actual main Blink thread, where is the time going? And the big takeaway here is that approximately 70% of the time, we're just basically idling on the network, right? That's that big chunk right here in the blue. And then after that, you have all of your usual offenders, things like, well, we've got to execute the JavaScript, we've got to paint pixels, and all the rest, do layouts. So this should not be surprising, right? This is specifically for the first page load. There's a very different profile, of course, once the page is loaded, and you're interacting with the page. That's a different problem. But this, in part, is one big problem that we're trying to solve, right? Like, how do we make this blue part smaller or just go faster? So there's two takeaways that you can take from this. One is, page loads and network are a problem, right? That's 70% of loading the page today. But the good news is that, if we can do anything to the network stack in terms of improving that latency and performance, it's going to have a significant impact on how we experience the web. So even small fractional wins in this space will, in fact, have a huge performance impact. So with that in mind, what I wanted to do is actually take a look at some of the things that we've been working on internally in Chrome. So looking under the hood, this is not perhaps something that you would be, as a developer, looking at APIs or trying to figure out how to optimize. This is the kind of stuff that Chrome does internally. But we have a very dedicated and awesome performance team working on this stuff. And I wanted to highlight some of the wins that we had over the last year. And also, you know what we're working on, and also highlight the potential areas for improvement in the future. And after that, we're going to look at some of the new additions, specifically low-level network plumbing stuff that we support in Chrome. So things like Speedy, some notes about Qwik, and other things. And then finally, we talk about measurement. Of course, performance is a big theme throughout this entire event. And we want to make sure that we give you the tools to measure performance in the best way possible. You should be able to measure anything you need in the stack. So first, let's actually do a quick survey. This is going to be kind of all over the map, but I want to highlight a few things. First, in Chrome 26, we landed the new asynchronous DNS resolver, which is kind of low-level plumbing stuff. So we're no longer relying on the operating system DNS resolver. We actually have our own. This is available on today. It's available on Windows, Mac, and Chrome OS. This is not yet on mobile. Hopefully, it will be. So why do we want to do this? Well, first of all, it gives us a lot more control. We can do a lot smarter strategies for how we resolve names and other things. And here are some performance numbers in terms of what we've seen since we've landed M26. It took us a couple of tries to actually kind of get the performance numbers as good as they are. But you can see that there's significant winds across the board. And for things like Chrome OS, we've reduced the DNS resolution time significantly, 36%. And not only that, but we're also measuring the resolve plus TCP connected. You can see that there are winds across the board. And of course, some of these are platform-specific. Some platforms just do a better job of implementing their DNS resolvers in the first place. But the cool thing is that we can actually now kind of take control. Now that we've got the basic plumbing working, we can take control and do smarter things. So for example, we can raise different resolutions for IPv6 and IPv4. We're now actually doing adaptive retry. So we actually remember which DNS servers we've used. So we can do a better job of making these resolutions faster in the future. And this is definitely a space for a lot of improvement. And also kind of subtle things like providing better user error pages. Before, we would just get a failed timeout from the DNS resolution. And we just kind of like, we give up. We have no idea. We can give any useful feedback to the user. Now we can go much, much further. That's pretty cool. Moving on, in M27, we learned that this big and important improvement, which is we completely rewrote how we schedule resources. It's one thing for us to get the HTML bytes, we then discover the resources. And then we need to figure out, how do we schedule them efficiently on the wire? Like, we care about JavaScript before images and other things. And the big change that we've done in there in M27 is we replaced that scheduler. And we also started focusing on perceived performance. So instead of just measuring the page load time, we started measuring things like speed index. So what kind of optimizations we can do in resource scheduler to improve speed index? In fact, we've made decisions where we've intentionally chosen speed index over page load time, or on load. So there are changes that have gone in where we've regressed, in some cases, on load time. But we've improved the speed index, because we think that perceived performance, getting useful pixels on the screen, is a win for the user. And one interesting takeaway from this work that was done in M27 was that we realized that a lot of pages were actually competing for bandwidth unnecessarily. So they were trying to download too many things. We've gotten so good at sharding our assets that it's actually backfiring on a lot of sites. So in particular, one big interesting change that went in in that iteration was that the new scheduler would only download up to 10 images in parallel. So for example, if you have a gallery of images, you have, let's say, 30 of them on the page, and you sharded them in 20 different ways, we would not open more than 10 connections at once, because we found that that actually hurts performance in most cases. So if you're developing your site today, Chrome will limit you to 10 image downloads. But in other browsers, you'll still have that problem. I'm not sure what exact scheduling algorithms they're using, but perhaps something you should consider on your site. There is such thing as oversharting your site. Later in M28, speaking of perceived performance, we've also improved the speedy performance quite a bit. So the change here is actually pretty awesome and pretty trivial in that now that we have control of the resource scheduler, we said, look, if you're using speedy, we have a much better way to schedule resources, which as we know the priority, we can send that priority to the server, the server can do the right thing. So we won't delay any resource scheduling on the client, which is kind of this like fake latency or not fake unnecessary latency that we're otherwise introducing. So if you're using speedy, this is a nice performance one because it allows us once again, get those pixels earlier, visible earlier on the screen. So if you haven't already, I definitely encourage you to look into playing with speedy. So if you're using Apache, you can solve on speedy and Ginex and other servers support it as well. And actually, we'll come back to speedy a little bit later. In M30, there's been more, yet more improvements to the resource scheduler. We keep improving and iterating on all of these different strategies. One interesting kind of takeaway that we had in this iteration was that we actually started a distinction between optimizing for the popular sites versus sites in the tail. There's different ways that sites are constructed in terms of patterns that they use, how they lay out the resources and all the rest. And this change, or this iteration in particular, actually helped quite a bit in terms of accelerating the sites in a long tail. And if you think about a 10% improvement in firing the unload, this is just like one chromed iteration revision, is huge, that's a 10% win in unload and a 9% improvement in speed index. So there's just faster pixels on the screen. So these are impressive numbers. And I think what's most exciting for me is if we look forward, based on the work that we have in the pipeline now and project that a little bit, we see significant improvements that we can still make to these algorithms. So right now, at least based on current code that we have, you can expect more wins rolling out to our users. So this is great, as far as I'm concerned, this is free performance, right? Like the apps, it's the same apps, they're just rendering faster because we're doing a better job of how we schedule those resources in Chrome. So that's pretty exciting. Another huge, huge win that's coming and that's available on Android today is what we're calling the simple cache. So one of the problems that we realized we had on Android and mobile phones in particular is that in order for us to dispatch a network request, we actually have to do a number of different context switches. Like we would go from the main threads, to an IO thread, to we do another jump. We would always do a check on the file system, which in itself can take quite a bit of time. And the idea behind simple cache is to try to simplify that as the name applies to the extent that we can and ideally avoid any context switches on going to disk. So that should help quite a bit in terms of the actual performance of the simple cache. And here's some early numbers. These look very, very good. The blue line on the bottom is the original. And what you see here is the latency, right? So you kind of had this like long tail distribution where basically every request incurred a minimum of several milliseconds, but then you had this long tail where it wasn't atypical for a request to take 50 milliseconds, right? Before we could even dispatch it because we had to kind of do a couple of thread hops and then check disk or check flash in this case and kind of bubble that back up. With the new simple cache, basically we can just complete it immediately, most of the requests. Every once in a while, we still have some delays, but this is not the type of line where you want to see on all of your performance charts. And this is quite amazing because once we have the simple cache, this is based on our measurements, this has improved all HTTP transfers, the speed of these transfers in terms of the time from the first request byte that we want to send to completion by 10%, which you think about it as massive, right? And not only that, but in M31, we're seeing 7% page-to-time improvement. So this is simply eliminating that extra latency at the beginning of each and every request. And once again, there's more work going into M32 and we hope that we can improve this even further. So this is huge and this would be an awesome win for mobile browsers. And then finally, one of the last things that we started iterating towards the end of the year here and something that I'm really, really excited about is focusing on improving the speculative optimizations that we already do in Chrome. We do a lot of speculative optimization as it is today, but now we're also looking at how do we refine these? How do we expose the right primitives and how do we make better use of them? One example is something like prefetch, right? So if you're familiar with link rel prefetch, what allows you to say is, hey, I will need this resource perhaps on the next page, that could be an HTML page, that could be a CSS file, an image, what have you, right? Please fetch this for me. It's actually that I don't have to fetch that, or I can just fetch it out of the cache when the user initiates that load. One of the gotchas there was, if that request is not complete in time for the next navigation, it would get canceled. So you kind of incurred the double download and it just didn't make sense. So for example, we have this new patch that's in, it's not available in Canary yet, but it's coming soon called detachable prefetch, which will actually keep the prefetch alive, even as you navigate away, such that you can still make use of that resource once you get to your destination. So that's pretty awesome. And this will also apply to other things like pre-renders and other types of improvements. So this is pretty cool, and this is how basically it looks. Chrome allows you to actually dynamically create these hints. So for example, if let's say the user initiates some sort of an action, like they click on the checkout button or they click on add to cart button and you know that they're gonna go to the checkout page, at that moment you can actually inject one of these link elements and say, hey, I would like you to prefetch that asset for me, because now I know I will need it. And vice versa, you can actually delete this element out of the DOM and we will cancel the prefetch as well. So you can dynamically script how and basically drive Chrome to do these prefetchers for you. So this is pretty cool stuff. And I think this is a place where we can do a lot more in the future as well. So that's a little bit about kind of the low level guts and improvements in Chrome. Now let's take a look at some of the protocols that we've been working on. So back in 2009, roughly actually four years ago, almost on the dot, we announced our work on speedy or initial efforts around speedy. And since then we've gone, I think, quite a long way. We've done some general iterations of the protocol itself. So v2, v3, 3.1, now we're working on version four. And that actually became the foundation of HTTP2, which is pretty exciting. And HTTP2 work in itself is progressing quite rapidly and I'm really excited about that. So today we actually have both speedy and HTTP2 support in Chrome, although HTTP2 is under a flag, but it is there. It's something that we're iterating on. And then once HTTP2, and this is a common question, once HTTP2 is marked as ready as a standard, we will just switch over to HTTP2. So speedy is kind of like an experimental ground for us to try different ideas and feed them back into the HTTP2 spec, right? So it'd be great if we had this feature. Let's go and try and implement that feature. We try it, we discover the rough edges, and then we kind of feed that back into HTTP2. So earlier in the year, we actually deployed speedy 3.1 across all of our Google servers and, of course, out of support in Chrome. Firefox also supports speedy 3.1. And here's some numbers. We've actually, we've never released this before, but these are the performance numbers that we see for speedy across some of the major Google properties. And these are consistent across all of the different Google sites. So you're kind of looking at the right order of magnitude. Anywhere between 20% to 40% to 50% improvement in latency as compared to HTTPS. And in some cases we're actually, so even despite the fact that we have these extra handshake round trips and all the rest in TLS, oftentimes we actually end up going faster than just vanilla HTTP as well, which is, of course, the point of this whole exercise to begin with. So this is really exciting. And I guess the important bit here is also that not only is it helping the median, which is, of course, what you'd like to see, but it's also consistently helping all of our users, the ones on fast connections, and especially so for the ones that are in slow connections, or the ones with the high RTT times, which is especially relevant for things like mobile, right? Where RTTs are definitely higher. So this is really exciting. This is very promising, and I hope that this will help kind of drive the HTTP2 adoption as well. So if you haven't looked at speedy, I definitely encourage you to do so. There are modules for virtually every popular server out there today that you can enable and just play with, enable it on your site. And there's also commercial support for it as well. So F5, Akamai, and others support speedy. So that's pretty cool. And as I mentioned, we also do have HTTP2. If you're curious, if you want to play with it, we do have HTTP2 support under flags. You can actually enable that and then run it against your local server. I think the only big public site that supports HTTP2 today is twitter.com. So in theory, you can test it on that. But there are also open source servers that speak HTTP today that you can play with. So speedy is kind of a production version, so if you want, HTTP2 is coming soon and hopefully fingers crossed sometime in 2014. So that's speedy. You may have caught the wind of some other protocol that we've started working on earlier in the year, which is QUIC, which is QUIC UDP internet connections. And the idea here is actually to kind of take what we've done with speedy and go one step beyond. And this was actually our intent right at the very beginning when we started thinking of speedy. But it was just too much of a leap to change both the application protocol and the transport protocol. So we kind of decoupled those. And QUIC is basically that. We're trying to go one step further and say, well, could we rebuild or could we build a better transport for HTTP traffic period on top of UDP? Could we experiment with new ideas? Some of the core premise of the stuff is it's all about latency. We're trying to eliminate latency everywhere we can. So can we eliminate extra round trips to establish the secure tunnel? Can we do better congestion control? What if we do packet phasing? What if we do forward error correction? What can we do to innovate in the space to help reduce the page load times on the web? And there's a lot of interesting ideas. If you guys are curious about this kind of stuff, we posted our design docs. And it's a very long doc. I encourage you to read it and give us feedback. We have a Google group for that. And this question comes up quite frequently, which is, what's the point? What are you trying to do here? And the answer is very simple. We just want to make faster internet for everybody to use. And there are two ways that this will happen. One is we end up building a really awesome protocol that everybody loves when we take it to ITF. And just like with HTTP 2 and speedy, we work with the community and make that the standard. That's plausible. And maybe that will happen. The alternative route is we just experiment with quick. We experiment with different ideas. And those ideas get adopted. The good ones get adopted into existing protocol stacks, like TCP and TLS. And actually, we're already seeing some of that. We are based on our experience with the encryption stuff in Qwik. The TLS working group is looking at improvements in terms of can we eliminate some extra round trips? So in either case, the point is, no matter which one of these happens, the users will win. We'll get faster internet. And that's our intent with Qwik. So that's pretty awesome. We don't have any benchmarks for it as of today. We're still at a point where we want to make sure that it works and it works correctly before we start optimizing all the edges around it. But you can actually play with Qwik today. We have it deployed on Google servers. And you can also enable it. If you go into Chrome flags, you can flip Qwik support. And then you can, for example, access YouTube. And you'll get served YouTube.com or other Google service over UDP over Qwik. And if you're curious, you can dive into Chrome net internals and kind of look at the actual protocol and all this other stuff. So if you're into low-level networking protocols, definitely a thing you want to check out and play with. There's lots of interesting ideas in the protocol. All right, shifting gears. Linus mentioned Chrome data compression. This is something that we launched earlier in the year. As you heard, it provides roughly 50% data savings. That's kind of the average number for a lot of users. Turns out there's a lot of poorly compressed content on the web. People still forget to gzip their content, which is one of the optimizations that we apply for text like assets. And we also convert all the images to webp, which provides a significant savings. So this is a big benefit to a lot of users. But one thing that Linus didn't mention is that there are other and secondary benefits to that. Because we run over speedy, so between your phone and the Google server, it's actually speed connection, it's an encrypted connection. So I actually use Chrome data compression in part for the data compression part, but also partially to secure my browsing. Because when I enable this, the secure traffic, if you're connecting to your bank, for example, an HPS site, it will go directly to the site. So that traffic is encrypted. But if you try and connect to some unencrypted site, it'll just flow basically as it is on the wire. With Chrome data compression, that goes through a secure tunnel. So even if you're on a Starbucks Wi-Fi or whatever, some unencrypted Wi-Fi and you're browsing around, all of your data is encrypted. So that's really nice. And maybe one important thing to highlight with Chrome data compression is it is still the full Fidelity HTML5 web experience, right? We are not doing anything to modify your site. We're not trying to render it on the server. You have all of the flexibility of JavaScript, CSS, and all the rest on your phone. That's where the code gets executed. So we're just modifying and optimizing some of the assets as they get delivered. Some common questions that I get about Chrome data compression, something you should know, is this is going through a proxy. So if you're developing a site where you're relying on GOIP functionality to customize the location to the user, or maybe serve relevant ads, you should be looking for the X forward for header, which is the standard of the client S forwarded by the Chrome data proxy. And similarly, for whatever reason, you absolutely want to make sure that we don't do anything to your content. You can actually opt out on a per resource basis. If you add a no transform header, it basically tells Chrome data proxy to just be hands off with that resource. So we won't re-optimize that image, or we won't recompress that text, or other things. So these are standard proxy directives. Chrome data compression proxies supports it. So just an FYI. Shifting gears, WebSockets. This is really, really exciting. Do we have any WebSocket developers in the room? Yes, awesome. So WebSocket compression is going to be live in M32, which is a long overdue feature. One of the gotchas with WebSockets was that you could transfer binary and text, but text would always go as uncompressed in both directions. Now that we have the spec up to date, and we already have the code in Chrome, you can actually negotiate the deflate compression to apply in both directions. And the server can selectively compress any given frame. And the client, as of today, Chrome will compress every single frame going out from your mobile device or from your desktop device. And I'm not going to go into details here, but we also actually, the spec provides a number of different parameters to customize how the compression will be done. For example, the size of the sliding windows, essentially you can control the resources used on your server and on your client, plus some other flags. So this is really, really exciting, because there's definitely been a sore point for WebSockets. We heard about WebRTC and DataChannel. The way I think about DataChannel is basically WebSocket, but over UDP and P2P. So we can communicate directly between devices. We don't have to go through an intermediary like a server. And DataChannel and M31 has now officially switched to SCTP protocol. So previously, we were using RTP DataChannels. And that was the reason for some of the incompatibilities with some of the other vendors. Because of M31, SCTP is a default, and we will aggressively remove support for RTP DataChannels. So if you're using DataChannels today, this is something you want to revisit. And if you're not familiar with DataChannels, I encourage you to check out the links. I'll post the slides later for how this works and why this is awesome, because it allows you to define things like fire and forget semantics, and don't retransmit. So it's a really nice transport for doing a low-latency data exchange. And then finally, let's talk about measurement. So there's a lot of protocol improvements that are going on, but as we know, we need to be able to measure things in order to improve them. So of course, we're all familiar with navigation timing, or I hope we are. Most people here expect it would be. You can get detailed low-level stats about how long did each connection take in terms of DNS time, TCP time, and all the other things. You can throw that into your analytics solution here. I'm showing you Google Analytics, which allows you to segment this data to say, well, I want to look at my mobile users versus desktop. You can segment it by any other variable you define, like as a user, click the checkout button, or they registered, et cetera. This is all great. One gotcha with this is this is only for the main page. What about the other 85 resources or 100 resources that you have on your page? How are those performing? Well, in Chrome, we have support for resource timing, which gives you that same level of access to all of the network metadata or timestamps, I should say, on a per resource basis. So you can see here that you can actually query for a specific resource, like your JavaScript file that you're loading. Maybe you're loading it from CDN and you're wondering how well is my CDN performing. You can get your real user measurement data for that specific resource and then look up the time for DNS, TCP connect time, total transfer time, et cetera. The only thing that you need to be aware of is that the resource has to manually opt in and allow the data to be gathered to begin with. This is done for privacy reasons to make sure that somebody can just iterate over your cache and figure out where you've been in the past or something like it. So for your own resources, you need to add this header, and then if you're using third-party resources, if that origin is already not providing this header, then you should ask them to do so. Because here's one example where I have a web font on my site. Web fonts have web fonts delay when the text gets painted. So the question is, how's, in this case, this is a Google CDN, how's Google CDN performing in terms of serving the actual font? Is it hurting my users? Well, now I can actually grab that data from resource timing just as I showed you a few slides ago. And we can just bump that into Google Analytics here. You can see that I'm tracking the DNS, TCP and transfer times. And it turns out that my fonts, the fonts coming from Google CDN, at least from my site, are being loaded in this case within 150 milliseconds, which to me was an acceptable time, and that was fine for me. But you can now think about using this sort of data to define third-party SLAs, right? You rely on third-party widgets. You can say, well, your widgets must load in X amount of time, et cetera. You can actually track this with resource timing, which is pretty awesome. So as a quick recap, we covered a lot of ground. There's a new DNS resolver in Chrome, which is double-digit performance improvement in actual DNS resolutions. The new scheduler is definitely something we're really excited about. We've already seen huge improvements there, 10 and 20% improvement in actual speed index and page load times. The simple cache stuff is a huge win on mobile, and I'm really excited to have that out there. And then moving forward, I'm hoping that we can make a pre-resolve and prefetch and a pre-render stuff much, much smarter. And you saw the speedy wins, right? So all of these things are incremental, right? 10% here, 20% there. Before you know it, you're actually saving hundreds of milliseconds and sometimes seconds for the user, which is a huge win. And some of these things you guys need to optimize for. These are the things where you need to install speedy, you need to configure speedy, and you need to make sure that your stacks are configured correctly. And in other cases, it's just also doing a better job of scheduling this kind of stuff. And then finally, if you haven't already, I definitely encourage you to look at things like nav timing, user timing, and resource timing. So I talked about resource timing. User timing allows you to measure any chunk of code and just get high resolution timestamps for, this is when I started, this is when I ended, and begin that back to your server. So all of these things are supported in Chrome, and what you can measure, you can optimize. So with that, I'll leave you the link to the slides. Thank you.