 I think it's a really exciting time to be a web developer in this space and web performance space in general, because a lot of new things are coming out. We have new capabilities in the platform. We have new capabilities in the transport layer. And HTTP2 is one of those things. If you think about it, the last update to HTTP was about 16 years ago, and a lot has happened since. HTTP1.1 came out in 1999. Can you guys think of a web page? Can you recall the type of web pages that we were shipping back then versus what we're delivering now? Things are dramatically different. And because of that, we had to adjust to that. So the story really begins about five or six years back, where one of the teams within Chrome started looking at what are the current limitations in the protocol. And we started this project that was called Speedy. And over time, as you can see on this graph, gained pretty significant adoption, which is to say every modern web browser now supports Speedy. And I don't count Opera Mini as either modern or a browser, so every modern browser supports Speedy. And that's the state of art. But throughout this process, the HTTP working group also realized that this is in fact becoming a de facto standard. And we need to get a move on this and improve this protocol, standardize it. And that became HTTP2. So HTTP2 took its foundation from Speedy and has since improved it and rapidly evolved it into something that is now available as this new RFC. And as you can see, the support today is actually already very good. If you've been following the process, the RFCs came out in February. It's now been standardized or it's been published. And just months, literally months after the protocol has been standardized, we already have more than half the browsers, half deployed browsers, modern browsers supporting HTTP2. So this is a big thing. And in fact, if you look at some of the stats that have already been published, for example, stats from Mozilla where Patrick McManus, this is back from February. This is just when the RFCs were being approved. He was saying that they enabled HTTP2 as on by default in their stable branches. And approximately 9% of all of the Firefox transactions were already happening on HTTP2. And more importantly, that number was higher than on Speedy, which tells you something about the rate of adoption. And in fact, the two protocols are actually very similar. So it was not hard to upgrade that. And also checking the stats within our own Chrome telemetry, I was curious to see or I wanted to know what is the adoption today. And as you can see, when we look at the new TLS connections being made in Chrome, this is stats as of this week, roughly 30% of connections that negotiate TLS today end up using HTTP1. This is not as silly as it sounds, because we actually require this to enable TLS false start. About another 30% is negotiating Speedy. So these are the existing Speedy servers. And then the majority of the actual connections are actually negotiating HTTP2. So all of this is to say that HTTP2 is here, not in the sense that we've shipped the RFCs and now we have to go out and implement it. This stuff is already there. It's working. It's been well-tested. It's been in production for many years now. And it is, in fact, replacing Speedy. So the Chrome team announced earlier this year that we will be deprecating both Speedy and NPN in early 2016. And that's just to make sure that we don't end up with two competing protocols that are effectively doing the same thing. So long story short, HTTP2 is the future, and this is how we want to move forward. So we want to talk about the anti-patterns or performance patterns and anti-patterns. But first, just a little bit of an introduction to HTTP2. First of all, the whole premise for the protocol was to figure out how do we optimize for lower latency delivery? It turns out that when we looked at the bottlenecks in the current web applications that we're delivering, we realized that for the most part, in many markets, not all markets, but in many markets, we were latency bound. So what this graph shows you, and you've probably seen some variant of this graph before, is we can increase bandwidth, or we can change latency, or we can change the amount of bandwidth. And as you increase bandwidth, yes, you get fairly significant wins as you go from, let's say, 1 megabit connection to a 2 megabit connection. But by the time you get to 5 megabits and then you upgrade to a 6 megabit connection, you're only getting single-digit performance improvement. So that's not exciting, because if we double our bandwidth from 5 to 10, we win a couple of percentage points. Whereas if we continue decreasing latency, we get this really nice linear improvement. So the realization there was that we really need to reconsider all the bits that are all the transport bits and figure out, how can we remove some of the sources of latency in the current stack? And that was the foundation for speedy. This was the core premise behind it. And everything was developed with that premise. And in the process of evaluating different ideas and approaches, we converged on this idea of we need one TCP connection. So first of all, we'll talk about why one TCP connection is actually a good outcome. But instead of opening many connections, and a modern web page today opens up to 30 to 50 connections just to compose and fetch all of the assets from all of various origins, we also need requests that are able to be multiplexed. So it's kind of silly that we can only request one asset per connection. These streams need to be prioritized in the sense that some requests are just less important than others. I want to fetch an image, but that image is not as important as the CSS file that may be blocking rendering. And we need to be able to communicate that to the server such that it can do a smart thing and prioritize it over delivering images. And some of the other things that kind of came out of it was the new binary framing layer, which allowed us to do all of this. And finally, header compression. If you're familiar with the HTTP protocol today, the headers are uncompressed. They're always transferred in plain text. And it turns out that on some sites, this is actually a significant source of data use even because you have cookies and other things. And we'll talk about that a little bit later as well. So the really high level overview is that with HP1, it's a text-based protocol, which is what you see at the top. You have headers and you have a request body or response. In HP2, effectively, it's all the same stuff except that we separate the headers and the response into different frames. And those frames can now be sent independently. And because those frames can be sent independently, they can also be interleaved. So it's now possible to say, well, here is a chunk of this response. Here's another chunk of this response. And maybe I'm still working on the other response I'm waiting for my database or something else. This allows us to just utilize the connection much better. It allows us to also flow control different streams. And apply prioritization heuristics on the server. So pretty basic stuff. And of course, this is nothing new. We're not inventing anything new. Binary protocols have been around for a long time. If you're using TLS, that's a form of a binary protocol to begin with. So we're just adding that layer into HTTP to get some of the nice properties that we get out of this multiplex approach. Another improvement that was added is the HPEC header compression. So this is a separate RFC, which is specifically aimed at optimizing the sent headers. So the way this works is, if you think about it, every request and response is basically a bag of keys and values, where you're trying to annotate the response and saying, here's my method. I want to get this file. The file name is this. The user agent string is this and all the rest. And same thing for response. But if you think about it, across all the different requests, while we make hundreds of requests per page, the user agent doesn't change. So why are we sending the same string over and over and over again? And it's actually a fairly large thing. And if you have cookies and they're not changing, same thing. So wouldn't it be nice if we're able to both compress? So perhaps we can use well-established compression mechanisms like Huffman coding and other things, but also build smarter ways for us to communicate the fact that I don't need to resend that header. And that's exactly what HPEC tells. So it actually has two mechanisms. One is it allows us to use Huffman coding or it uses Huffman coding to just compress any arbitrary value with a predefined dictionary. And the second allows us to actually just reference previously sent values. So let's say we have a request that requests this resource. Both the client and server keep state and remember what was communicated previously. So for example, in this request, we record the fact that we sent the user agent header and it contained this Mozilla string. In a future request, instead of actually just sending that exact key and value, we can just reference 62, which is a reference into this table that says, by the way, just append that to your output, which seems like a very simple optimization, and in fact is, and it turns out to be very, very efficient and beneficial. So this is a big win as well. That's as far as we're gonna go into the actual kind of nuts and bolts of how HPE works that lays the foundation, the required foundation for us to discuss other things. And as Courtney mentioned, there's a free chapter that you can read about HPE 2 that covers all of this stuff plus more in much, much deeper detail. So now we'll get to the actual interesting parts, which is like over the course of the last 16 years, we've built many web applications, we've built progressively more ambitious applications, we've established certain patterns and how we optimize for those things. And now that we have HPE 2, let's kind of take a step back and take a critical look at what are the things that are still relevant and what maybe some of the other things that are not, right? And the way I like to think about this is to kind of look at everything in layers and figure out where do the particular optimizations fit in into the grand picture? Which piece are we trying to optimize and what can we do there? So at the very bottom layer, we can start with things like, well, how do we optimize for the link layer? Whether you're on Wi-Fi or on a mobile connection or something else, there are in fact different patterns that we can use and how we fetch resources to optimize for that. Now, perhaps HTTP is not the right, like, HTTP layer optimizations are not the right tricks to use there. Then we have TCP and UDP, so things like DNS, right? So there's many talks at this conference about optimizing DNS and many vendors here that will be happy to help you with that sort of thing. There is TCP, where we have handshakes, optimizing handshakes, making sure that we have good throughput, so optimizing congestion and all the rest, packet loss. These are topics that are actually now reemerging because of the large growth in emerging markets where we're finding that we're not only latency bound but we're also bandwidth bound to a large degree and this is increasingly becoming a problem. We have HTTP, so this is where we get into the discussion around HTTP one versus HTTP two. So we know that we have certain limitations in the earlier protocols, things like limited parallelism and lack of ability to specify a priority and large overhead, right? These are the things that we've had to deal with HTTP. And then finally, there's application layer optimizations, things like, well, did I even need to fetch that resource? That's the kind of decision that you can make in your application. So let's walk up and down the stack and kind of figure out where some of these optimizations fit. And I think what we'll find is that in certain cases, we have moved certain optimizations into the application layer because the lower layers were unable to meet the actual requirements, right? And that's where we have to kind of patch the system, if you will, on the fly. So I dusted off my trusty book for Building Faster Websites by Steve Souders and I went through the list and I just started kind of knocking them off and trying to put it on this diagram. So the first one was reduced DNS lookups, right? Because we can't fetch a resource until we know where we need to fetch it from. So we need to resolve the name to an IP address. Pretty straightforward. Clearly it supplies to both HTTP one and HTTP two. Still good, makes sense. Reuse TCP connections? Yes, of course, right? So we know that TCP handshakes are expensive. This is regardless of which protocol you're using. So you want persistent connections and you want to minimize the resource overhead. So each connection that you make actually costs something on the server and the client, right? It's not only latency on the client, it's also the fact that you have to maintain an extra socket and that extra socket needs to be maintained all the way through the stack, right? It may be each and every proxy along the way and all the rest. And with HTTP two, we actually have this concept of a single connection. We'll come back to that in a second. So this still obviously makes sense. Using a content delivery network? Yes, of course, because it helps you reduce latency, right? If you have a lower latency to your edge, you have faster handshakes and all that stuff makes sense. Minimizing number of HP redirects. So we're moving off the stack into the HP layer. This is probably the most common offender on mobile websites where, in fact, it hurts the most because the latencies are higher. This is still a big problem for many mobile websites. And once again, this is not an HP one or two specific thing. In fact, one of the core premises behind HP two was that we wanted to keep all the semantics of the protocol the same. So any application that you have will work on HP two. There's nothing that's going to be broken. It's just that there are certain patterns in how you build your application that may hurt performance in one versus the other. And that's what we're trying to figure out here. Eliminating unnecessary request bytes. So this is specifically optimizing things like reducing the number of bytes in your HP headers. So we had best practices like cookie list domains and other things make sense. And with HP two, we actually have HPEC, which might address many of those concerns. It doesn't mean that you should be sending excessive headers or values in HP headers, but it will certainly help. And then finally, compressing assets during transfer. And despite the fact that we've had this conference for half a decade now, every single time at every single conference, there is somebody saying that we need to compress assets. Because despite the fact that we've been talking about it and we all know that it should be happening, in fact, most of the web, I shouldn't say most, but a significant fraction of the web content is not compressed. This is why products like Chrome Data Compression, Opera Proxy, and others are able to deliver the savings that they do, because they apply obvious things that we all know that we should do, but we don't actually do. Caching resources on the client. This is a really interesting one, because, of course, we all know that we should be doing that. But once again, many sites are not. And further, with HP one, as I'm going to argue later, I think we had very limited ability to do caching well. And with HP two, that changes. And we can do a much, much better job. And then finally, eliminating unnecessary resources, which is just did you really need that resource to begin with? Did you really need that high res image at all? Sometimes aggressive prefetching can hurt performance. Other times it can help. So these are effectively evergreen performance best practices. Nothing here should be new. This should all be familiar. And this applies equally to HP one and HP two. And when it comes, I'm guessing it'll apply to HP three as well. Because this is all just consistent with all the different layers in the stack. Now we start to actually move into the more interesting bits, which is focusing on the HP one optimizations that we've developed as best practices, or we call it best practices today. And specifically, there are three things I want to call out for HP one, which have driven us to implement these other optimizations. One is limited parallelism. So we know that with HP one, we can only request one resource at a time. There was an attempt to address some of that, or mitigate some of that, by using HP Pipelining. But in practice, data never panned out. For various reasons, despite the good intentions, no browser has enabled Pipelining due to broken servers and other things. So that really is not an option. We can't use that. Head of line blocking is another problem where a single slow response can delay the delivery of all other resources, and high protocol overhead. This is specifically referring to high cost of header bytes. And because of this last one, we've actually seen people try to develop their own protocols on top of things like WebSockets, where they have much lower overhead, which, I guess, is one way around this problem. But really, this is unnecessary, and we should solve that at the protocol. So first of all, parallelism. In HTTP1, we are limited by the number of connections. The number of resources we can fetch is effectively the number of connections that we open to a particular origin. By default, in HTTP1, all the browsers have converged. The numbers differ a little bit, but more or less, it's six connections per origin. And of course, us web developers, we're a smart bunch. We said, hey, six is just per origin, so we can fix that. We'll just open multiple origins. We'll just create C names, and the problem is solved. And that's true. You can increase your parallelism that way, but in the process, it means that you're opening many more connections. So that's TCP handshakes. There's TLS overhead. So if you're running TLS, that can be quite costly. Yes, some of those can be resumed, but nonetheless. This actually does consume quite a bit of resources when you talk to people that are trying to handle tens of thousands or hundreds of thousands of clients. And most importantly, I think, and this is something that we don't appreciate as much, because we don't have good visibility into it, is that actually breaks the TCP congestion control. The whole premise of congestion control is to observe what's happening within the connection and to react to these changes. But when you have multiple connections to the same origin, none of those things talk to each other. So they're competing with each other. And you end up running into cases like this. This was actually a case study from Etsy, which they have since fixed. But basically, the finding here was that the search page on Etsy was charging their images because they wanted to fetch all the images in parallel, or as many as they can in parallel, with the premise that this will accelerate page load. And you know what? In fact, it did when they tested it locally because there you're constrained by latency. But then when you took that same page and you ran it through kind of a mobile optimized, or not optimized, mobile restricted network, you would see things like a huge number of retransmissions, which is what this graph shows here on the bottom. So the blue is the good put, which is packets that have been sent that were successfully delivered and acknowledged by the client, versus this is retransmissions. So you can see that because we opened so many connections, the server was able to push a lot of data at the client. The client was overwhelmed, effectively. It did not acknowledge some of those packets. So what did the server do? It actually pushed the packets again. So in effect, for users that were on a mobile network that were bandwidth constrained, we ended up sending extra data, which hurt them even more. And this is this kind of thing that's very hard to diagnose because it's not the kind of thing that you see in your developer tools, but we see these kinds of traces all the time. And once again, this was increasingly relevant for emerging markets where we are bandwidth constrained. And I guess the other dirty secret is, we've always kind of put this question onto the rug, like, what's the optimal number of shards? It's like, is it 10, is it 20, is it 30? And the answer is, there is no such thing. It really depends on the context. If you're on a very fast connection with plenty of bandwidth, you should open as many connections as you want, right? Because that's not really an issue. Modulate the cost of the actual socket. Whereas if you're on a constrained device, you want to throttle it down to a much, much lower number, perhaps even one. But in practice, that's very hard to achieve if you think about it, right? Like, how do you know how to pick that right number and all the rest? So this is a really interesting quote from Patrick, once again, at Mozilla, where he compares the ability for us to reuse a connection. And specifically, here, he's highlighting that for HP1, 75% or 74% of active connections carry a single request. So most of the time, we're opening a connection and we're sending a request and we're closing that connection. So keep alive, and persistent connections are actually not, are not giving us much. Whereas with HP2, that number plummets to 25%, which is to say, we're able to reuse the same socket much more often, which significantly reduces the number of sockets we have to open to begin with. Which is, of course, a nice win if you're on the operations side of things because all of a sudden you have many fewer connections, fewer TLS handshakes and all the rest. So this is all, of course, leading up to domain sharding, right? So why do we have domain sharding? It's a workaround, it's a hack, around the fact that HP1 had limited parallelism or lack of parallelism to begin with. The reason we've deployed it is because it allows us to increase that parallelism, but it comes at a pretty high cost. First of all, there's no best value for the number of shards, we don't know what that is. Each connection consumes resources, each connection competes with others, so it leads to poor TCP performance. Frankly, it complicates our code, right? It's yet another thing they need to think about, how many shards? I need to rewrite my asset names and all the rest. And then for HP2 in particular, it actually has other negative effects. The whole premise of HP2 is to allow effective prioritization, right? So if we want to send you all the resources, we want to prioritize them or create dependencies and say this resource depends on another. That breaks down when you have multiple connections. We can't do that across connections. So because of that, prioritization is less effective. Similarly, HPAC or header compression is much less effective, because we now have multiple contexts, so we can't reuse those values across connections. So overall, it basically leads to this conclusion, which is today, even today in HP1, we tend to abuse the main sharding. In practice, it seems like if six connections is not enough, at most you should have two shards, which is your primary and a backup from which you fetch. And for HP2, this perhaps is a single biggest problem that will hurt you in terms of getting better performance out of HP2. So you really want to think about removing the main sharding for HP2. One interesting approach to do that, which is not very well documented, is that HP2 can actually coalesce connections on your behalf. So the way this works is HP2 requires TLS in practice. The browsers require TLS. The browser will also coalesce these connections or reuse a connection when it sees two criteria, or when the connection meets two criteria. The TLS certificate has to cover the same name. So if you have a wildcard certificate that covers, let's say your domain and CDN.yourdomain, that's condition number one. And second is that the host results to the same IP. So when we query it, if those two conditions are met, even if you tell us that the image lives on CDN.yoursite.com and you're requesting it from your site.com, we will route it to the same connection. So you get this behavior for free in effect. You don't have to do anything else. So this is just handled by the browser. And in here, you can see that, I'm just giving an example of, you can query, let's say Google.com and grab the DNS name. So these are the alternative subject names in the certificate. And it says that, yes, this certificate is valid for Google, Android, and App Engine. Which tells me that if I already have a socket open to Android.com, and then I request a resource from Google.com, we can reuse that socket. So that's just a logic that exists in the browser today. This works across all the browsers. Now the nice thing is that for HTTP One, this is not common to effect. So it'll, in fact, open multiple connections. So this is one simple way to say, I still want to provide some sort of sharding logic on my HTTP One because it does benefit me. But on HTTP Two, the browser will do the right thing for you automatically. And there's no kind of conditional responses that you have to provide at that layer, which is really nice. So moving on, concatenation. Similar thing, we figured out that requests are very expensive in HTTP One, in the sense that we can only open as many requests as we have connections. And head of line blocking is another issue. So instead of fetching and small resources, we'll just fetch some smaller number, which contain all the same responses. So if you have JavaScript files, we'll just concatenate all of them together. If you have images, you put them into a sprite in all the rest. So that helps with HTTP One. The other benefit that it can have is improved compression, right? So if you have, let's say, two text files which have a lot of redundancy between the two of them, compressing them will obviously give you a better benefit than compressing them individually. So that's a valid benefit, regardless of HTTP One or HTTP Two. But the downsides are actually quite numerous as well. So for example, there are certain types of files that the browser processes where we can't stream process them. For example, CSS. If you give us a CSS file, we can't parse it in an incremental fashion. We have to wait until the entire file has arrived on the client and then we parse it. And if you can give us a large file, it'll just delay when we have to process it. Whereas if you give us small files, we can incrementally process it faster. So that delays execution. For JavaScript, same thing. Some browsers actually Chrome recently added stream parsing which kind of helps with this, but other browsers may not have the same functionality. So delayed processing is an issue. Second one is expensive cache and validations. And this is a really tricky one because of course we know that we should cache things, but if you take all of your JavaScript files, all of your content, all of your libraries and all of your code and put it into one app.js which is in fact a very popular thing to do now, every build pipeline will spit out something like this. The moment you update a single byte in one of those files, you have to refetch everything. So if you took your jQuery file and you took your 10 lines of JavaScript and you put them together and then you updated one byte, you're fetching jQuery again, which is kind of silly. But that's kind of the best practice that we have today. So this brings me to a point about churn. With HSP1, we've had caching, caching works, but we were never able to make good use of it in the sense that we were never able to optimize for churn. Small files were always so expensive to fetch that we erred on the side of just putting everything into one blob and then having to invalidate the entire blob and fetch it entirely again. Now, because we can fetch granular resources with HSP2, we can actually say how do I optimize for churn where a churn is defined as the number of bytes I have to refetch whenever I push an update. Clearly, jQuery is pretty stable. I'm not perhaps editing jQuery. So let me put that as a static file and separate bucket, put a long cache lifetime on it and then for some other files that I'm updating frequently, I can specify a lower cache lifetime. It's a smaller file even if I have to refetch it and now I can start optimizing on this particular metric. That's not something we could really do before, right? So we can start thinking about optimizing cache times, the size of files and all the rest. And I'm hoping that we'll develop better metrics for this as well. Certainly something that CDNs and others can help us with. So the report card on concatenated assets is that for the most part, it actually has negative implications in terms of web performance and we can do much, much better. So it still makes sense for HSP1, but even HSP1, I think we should step back instead of concatenate everything into one large bundle, start thinking about what's the right mix? What are the blocks of this file that I can put to the side that have low turn versus high turn and optimize for that? For HSP2, this is safe to avoid. Even if you just remove all your concatenation logic, you'll probably better off as it goes today because there is no cost for making these requests. You can submit up to hundreds or even thousands of requests in parallel from the browser. There is no cost associated with that. And if anything, fetching smaller files will help, automatically help you with the revalidation and invalidation because any time you touch one file, you'll have to fetch that. So next one is inlining. And inlining is actually very similar. Here we're just saying once again, requests are expensive. So instead of fetching these small files, we're gonna embed them inside of another file. But this also comes at a pretty high cost. One is these resources can't be cached independently. So if I have, let's say, a logo image that I want to replicate across all of my pages, I have to embed it on each and every page. Further, if I update that logo page, or if I update the logo file, I need to invalidate all of the other pages. Or if the parent page changes, anything change on the page, I have to refetch the logo. So we've coupled these two assets and that's a bad outcome. It also breaks resource prioritization. So if you embed, let's say, a big image into a file, you're effectively upgraded it to be an HTTP priority image. As a browser, we try to be smart about HTML is the most important. We want to fetch it as soon as possible. Images come after that, or at least with lower priority. By inlining, you're actually upgrading its priority, which can have its benefits if you want to do that. But mostly it's an anti-pattern because it clogs the pipes, if you will. So really, what we're trying to do here is say, don't ask me for this resource because I know you will need it, right? And that is in fact a valid thing to say. That's one of the optimizations with inlining, where you're saying, I know you will come back to me with this and that round trip will cost me an RTT, right? I'm giving you the file, I know you're gonna ask me for it, so why don't I just give it to you? That's valid regardless of whether you're using HTTP1 or HTTP2. And that's what server push is all about. So HTTP server push allows the server to send multiple responses to one request. So instead of sending product one, two, three with embedded or inlined app.js in product photo, we can actually send multiple responses. And the way that works is the server actually emits a push, what we call a push promise that says, I promise you to deliver this resource and here's the resource. And the reason we want to send the push promise first is such that by the time the browser starts parsing the received response and starts detecting these resources, it already knows that I don't need to send the request for this because the server has promised to deliver it to me. And server push actually has many, many nice properties that inlining does not have. So first of all, it's granular. We can push individual resources. They can be multiplex and prioritized just like any other response. The client can actually control how server push is used. So the client is able to configure that when it first establishes a connection. It can disable it entirely or it can control how much data is being sent by the server. And this actually opens up an opportunity to entirely new optimizations that we haven't really seen on the web before. So for example, photos or images in general, we can actually implement strategies where the client is now able to request part of the file. So this is not an HTTP range request. This is me requesting a file and just pausing the stream midstream and then resuming it later. So the way this could work is you could say, well, I'm trying to render a page. These images are expensive. So I need CSS and JavaScript. But I would still like to know, let's say the dimensions of the image and a preview, maybe a thumbnail of the image. So I'm willing to accept the first 20 kilobytes of this asset because by reading the header of the image, I can get the dimensions and perhaps renders a meaningful preview. And then later, you can actually just send what we call a window update frame, which will increment this counter that tells us how much data we're allowed to send and the server will stream the remainder of the data for the image to the client. So this opens up new opportunities for the server to do new and interesting optimizations. And I guess the report card for resource inlining it removes a full request round trip. That's a valid optimization, both in HP1 and HP2. So that's great. It does work as a parallelism workaround for HP1, but it comes at a pretty high cost. So in HP1, you should be using it carefully. In HP2, you're better off replacing it with server push. Even the most naive strategy of server push, which is I'm gonna push this resource every single time you come to my page is equivalent to inlining, right? And if you can improve on that by say, well, I've already pushed this resource to you on the last page, so I don't need to because I know it's in your cache, that's already a win, right? So just replacing inlining with server push will get you to the same result and then from there, we can actually implement much, much smarter strategies. In terms of the actual bits of how this is implemented, one thing you should be aware of is push is restricted for same origin resources. So the server that pushes the resource has to be authoritative for the resource that it's pushing. I can't, my server can't push on behalf of Facebook.com, right, it can't push the widget. Only Facebook.com is allowed to push resources residing on that origin. There's also a lot of opportunities for the servers to become smarter and the user agents are actually right now still trying to figure out what are the various use cases that we can support. So for example, if the server pushes a resource that is already in our cache, we will reset that stream automatically for you, which means that we don't have to fetch or receive all of the bytes. And this is a new capability in HSP2 where we can actually reset a particular request without closing the entire connection, which is nice. It's also kind of interesting to think about other use cases like, is it possible to push a revalidation, right? You have something in your cache and I just want to refresh it and kind of punt it forward in the cache. Turns out this almost works in some browsers, but perhaps as a community, we need to rally around some of these use cases and say, yes, these are important, these are useful and whatnot. The other one is invalidations. If I push something to you and told you that you should keep it for a year, but then I realized that, oh my God, I made a huge mistake, would it be possible for me to push something that removes it from your cache, right? That's actually not that hard because you just push a resource that says I've expired, which will force the clients to refresh it on the next request. That makes sense. The other question is, should there be a JavaScript API? Today there is none, but with things like service worker and other things that are coming, it would kind of make sense to have a callback, almost like a web socket that says, the server is pushing this thing, what do you want to do with it? So that's still to be decided, but kind of interesting. On the server, this also starts to get really, really interesting. So one example is Jetty, which has had this strategy for a long time, but I think it's really clever. They observe the traffic pattern as it comes to the server. They look at the refer headers. So you request index.html, Ascendue index.html. You then come back to me and say, I also want the CSS file and the JavaScript file, and you tell me the refer, which is the index file, right? So after observing this pattern for a while, I can build a map that says, anytime I send you an index file, you come back to me with this. So why don't I just start pushing this to you? And the beautiful part about this is, this removes all of that work from you as a developer into the server. So the server can just automate this on our behalf. CDNs may automate this on our behalf. So this starts to get really interesting. And then finally, you need to actually think about prioritization. This is something that as developers, and as server developers, we've mostly ignored, and I think this is critical. And if you want to get good performance out of your HP to deployment, this is something you need to be very, very careful about and think carefully about. So let's talk about prioritization. How does this work? In HP one, we have a limited number of connections, and we have many requests. So the way the client optimizes this is it maintains some sort of a, some notion of a priority queue, where we say, let's say CSS and JavaScript is more important than images. And because we have a finite number or small restricted number of connections, we will try to push those critical resources ahead of others. So by the time the server gets their request, it does not have to think about anything. Basically, it's entire purpose in life is to serve bytes as quickly as possible and as fast as possible back to the client. With HP two, things change, because now the client is actually deferring all of that logic to the server. We're parsing bytes, we're detecting resources, and we're immediately dispatching all of the requests to the server. And we're annotating each and every request with detailed metadata for things like what is the priority of this? Are there any dependencies for this resource? So the server gets the stream of requests, and if it's not careful and if it does not respect that prioritization, it can actually hurt performance quite badly because it's very easy to push a lot of static bytes, like images ahead of CSS and JavaScript. And as you can imagine, if we do that, the page is gonna take a very long time to render. So you need to test that your server actually supports this and does so well. So how does stream prioritization work in HP two? We have two concepts. One is called weights, and the second one is dependencies. So each stream in HP two has a unique ID, and you can create dependency by specifying a reference to another stream. So you can say stream C is dependent on stream D. So one example would be I'm shipping you video frames, and I want to say that frame two is dependent on frame one. So please don't waste bandwidth on shipping me frame two before I see frame one. That would be kind of silly, right? And then similarly, you can also then assign weights between leaves at the same level of the tree. You can say both A and B are equally important to me in terms of the dependency, but given bandwidth constraints, you should assign more of your resources to A because the weight is higher than to B. So you can think about this in terms of CSS, JavaScript, and images. And actually one good example is the current implementation in Firefox. So Firefox uses this to prioritize where it places each type of resource. So the way it's done today is there's five groups and it has this notion of a leader, unblocked and background requests. And if you look at the weights, you can see that leaders get weight of 200, unblocked 100, and background is weight of one, which tells you that if you're under resource constraints, spend the least amount, like 1% or less, on background requests, assign most of your resources to leaders, and then for things like XHR and Fetch, those get lower priority. Images are dependent on leaders, right? And you start to see this dependency graph emerge out of this. So that's what's implemented today. That's what Firefox will send to your server today. And if you don't respect this, you will probably end up prioritizing the wrong things in the right, in the wrong order. So that's a long-winded way of saying that with HP2, the browser is highly reliant on the server. And the existing benchmarks that we have today for most servers, which is, let me run Apache Bench and test how many requests per second are not sufficient in HP2 world, where we also need to test things like, I gave you this dependency tree, are you giving me the bytes in the right order? The ordering of the bytes is perhaps even more critical than just the number of requests, because I'd rather have that than higher throughput. Higher throughput is also nice. And you should also make sure that when you're deploying your servers or picking a server, that it supports these things. Dependencies and weights, and stream connection flow control. Push is a new thing, I would say, to some degrees. It's not a must in the sense that it's a new functionality, but certainly would be nice. So to kind of round this up, we have HP1 and HP2. At the top, we have the evergreen best practices, and then finally at the end, we have the main charting, which you should remove with HP2. Concatenation, you should be very, very careful about it. The only reason you may want to use concatenation with HP2 is if it delivers significant compression gains. So if you're able to put multiple files together and just get a much higher savings, but otherwise I would recommend that you just ship granular resources. We all have beautiful code bases with modular code and all the rest. Just take that and give it to the client instead of pushing it all to one file. And for inlining, you want to remove inlining and actually default to server push. And then finally, pick your HP server very, very carefully because if it does not respect things like prioritization, then you may run into a lot of trouble. And then as a community, I think we need to develop better benchmarks for this kind of stuff. So with that, if you're interested to learn more about HP2, as Courtney mentioned, I have the free chapter online, which does a deep dive and then the slides here. So with that, thank you. We're a little bit over, which means we're in between these people and drinks, but if you want to take a couple of questions. I'm feeling courageous. If anybody has questions, and I'll also hang around here afterwards. I don't think you get to ask a question. What's the state of HTTP2 implementation in web servers, Apache, Nginx, et cetera? So there's a really good wiki page on the HP. If you go to the, if you just search for HP2 working group, there's an implementation page which contains a list of all the current implementations. Go there. In terms of popular servers, Nginx has announced that they will support it by end of the year. So I think it's a work in progress. There's nothing you can play with as of today. There are a couple of really good open source, new open source servers, like HTO and NGHCP that I would recommend that you play with. Things like Node.js, there's a module. There's Ruby implementations. Apache has a module that you can build in. It basically uses the NGHCP. Performance-wise, I'm not sure how well that actually behaves, but there is a module there. And then Apache traffic servers, probably the best implemented version right now as of today. Anyone else? We have one in the corner here. So I saw a spec yesterday for doing compressed data frames where you can basically do deflate on the frame itself. And obviously that would help with like upload and things like that. Has Chrome or anybody committed to doing that? For that particular extension, I don't know. So maybe to broaden the question, there's a mechanism for allowing extensions in HB2. There's some like alternative service or all service for that particular one, I'm not sure. So you suggested that we can do concoct if it gives you compression benefits, but let's say if you use SDCH, does that still help or does SDCH that's not necessary anymore? So SDCH, if you're not familiar, is shared dictionary compression. And the idea is you can download a dictionary that you can use to decompress other assets. That's independent, right? Because that mechanism works at a higher level. So you can, I guess that's one way of getting similar compression gains without concatenating files together. You could do this. You could come up with a good SDCH dictionary because you already know what your JavaScript and CSS look like. And if you ship that dictionary out, you probably don't need to concatenate. So that would certainly work. And perhaps an even more interesting and exciting thing to play with today is we have ServiceWorker. You can implement your own Delta encoding. So I'll just leave that there. Scary. Scary and awesome. One more if anyone wants. Here you go. One of the slide on there is kind of scary for me. It's the coalescing of the connections by the sans name. Just because a server presents a certificate with that sans name, it doesn't mean that it's ready to accept requests for some other server name. Right, so there are two conditions there. One is the certificate covers both host names and second results to the same IP address. That's, those are the two, but then we also have things like all service which we just mentioned. So if you want to search for that, you can actually, that's an extension that we're implementing in Chrome that will allow you to hint to the server and say I'm willing to accept or reroute traffic to this particular IP address or other things. All right, thank you.