 All right, good afternoon, everybody. And HTTP may be a bit of an odd subject at a real-time con. Like, HTTP seems like such an old hat trick. We've all moved on. WebRTC is the new shiny. WebSockets are yesterday. So what are we doing with HTTP? And before I make the claim that HTTP2 will actually change the game in many respects, and it actually has a lot of impact even on things like WebSockets and other protocols, it helps us to kind of step back and look how we got where we are today. And first, let's start at the beginning, right? HTTP 0.9, this is back in probably early 1993. Tim Berners-Lee has this idea for the World Wide Web, and we're going to make like this is the ultimate MVP. If you guys have not tried this, I encourage you to. In fact, if you don't know it, Nginx, and I believe even Apache still to this day supports HTTP 0.9. Open a connection to a server, type in get slash, whatever the URL is, that's all you need, you will get your response back. There's no headers, there's no metadata about anything. This is the ultimate MVP, this is a one line protocol, and this is the beginning of the modern web. Still works today, pretty amazing. A few years passed, right? This is now we're looking at about 1997, and HTTP 1.0 comes out. And what I think a lot of people don't realize is that HTTP 1.0 is in fact an informational RFC, as such it is not a standard. It simply documents all the crazy things that a lot of people have done with HTTP 0.9 cents. They basically picked up this idea of the World Wide Web and said, look, I'm going to build a server, and you know what, I also need to transfer some other file types like images. So let's let us add headers, and we'll even add a version string to this thing, and we'll add some status codes because sometimes I can't serve a response. So this is an organic process, right? This is a process that took on many shapes. There's many different iterations, and this is just like the web is exploding at this moment in time. So 1.0 was literally just us sitting down and saying, these are the most popular things that the current crop of browsers and servers are implementing. As you can see, the tax is getting smaller because by the time we get to 1.1, we keep adding new things. And in fact, 1.1 is the first implementation, or I guess standard I should say, which is an IETF standard. And what we've done there is we've basically taken 1.0. It's taken about a year and a half or two years. So this was late 1999 where we publish 1.1. And basically all it does is it just puts some concrete spec language around some of the most popular features. We've added things like Gzip compression, or transferring coding. We've added things like connection keep alive. So prior to 1.1, every time you requested something, we would close the connection. So far, far cry from being real time. So this is kind of the first features, even things like chunked encoding. So lots of stuff happened. But nonetheless, this is fundamentally what we still build the web with today. This is what we operate on. And this was built in 1999. And quite a few things have happened since. This protocol, 1.1, was built in the ages of GeoCities. This is where we went to read XKCD on GeoCities. This is even pre-dawn of Ajax era before the Outlook guys and IE gave us this awesome thing known as Ajax. This is way before then, not to mention all the other crazy stuff that's currently in our browsers. We have WebGL. We have T-pots in our browsers. We have WebRTC. We have P2P connections now in a browser WebSockets. The mobile web has happened. None of this has affected, in a way, what HCP is. So literally 15 years have passed. And it's about time that we make some modifications. But before we talk about what we're trying to solve, it also helps to understand what is the actual problem. We want to build a real time web. We want to make sure that everything is delivered quickly. We can render all the pages, but we got some problems. If you look at the median page on the web today, we don't need just a server to render a page, which was true when we started. We need 11 servers, because in fact, we are pulling in components from all over the web. We need the social widgets. We need the real time streams from other servers. We need images from another CDN. There's a median of 11 domains. There are sites that use up to 100 different distinct hosts to compose a page. We're also downloading a heck of a lot of assets, things like images, JavaScript, and CSS, and transferring quite a bit of data, so about a meg of data for your median case. There are pages on the web where you can, in fact, download 100 megs of data. I encourage you to try the new Oakley experience, weighing in at 78 megabytes, which it's quite beautiful. I will not show the demo here, because it will bring down the Wi-Fi. But fundamentally, the problem that we're up against is HSP was designed for an era when we had one TCP connection, and that TCP connection was occupied, and so the concurrency is just one. You can send one request, and that's it. We kind of came up with some hacks. We said, well, hey, parallelism is going to be six in the sense that we will open up to six connections. That was a workaround. That was a hack. And we'll come back to that in a second. But the end result of all of this is this is the typical experience on the web today. This is for desktop sites as well, for desktop browsers. So it's actually much worse on mobile. We're looking at 10 seconds plus median for a lot of the mobile sites, which, of course, is unacceptable, especially when we talk about real time. So despite all its awesomeness, the things that we've built on the web are quite amazing. There's a lot of performance bottlenecks, and you guys know them all firsthand. This is why we've been working on web sockets and all these other standards to help us, especially with latency. But fundamentally, some of the big things that I want to highlight here is, HTTP has limitations in the amount of parallelism. So we occupy connection, and everything must be in a request-response cycle. You send a request, you must wait for a response. Then you send the next request. Kind of sucks. We said, OK, well, we're going to come off with a workaround for that. We'll give you six connections. That's awesome. Six connections. You can have six things in parallel. Turns out that's not enough, right? Because we're fetching now 80 files to render our page. So we've come up with some more hacks for that. If you guys think the front end experience of building a performance site today is complicated, I encourage you to look at some of the crazy stuff we have to do inside of the browser to try and figure out the crazy things that you guys have done on your sites to try to make it fast, right? We have our own heuristics to say things like, well, if you've put your files, your JavaScript and CSS in the head of the document, we're going to hold that until we get an image. But if we get an image, but if that's over 1,000 bytes, we will release the request. It is crazy stuff. We have all of these heuristics, which shouldn't be there. Likewise, we keep putting more and more stuff into HTTP. An average request response today is over 800 bytes of metadata, which is quite a bit, plus cookies. And I've seen cookies that are up to 10k in size, which sometimes break on some routers, which is pretty awesome. And then there's stuff like competing TCP flows. So an interesting and popular technique today is domain charting, right? You have six connections per host, per origin, I should say. We'll just have many origins. Well, it turns out that if you go too aggressive and you have too many origins, you actually hurt your mobile clients because we keep doing these unnecessary retransmissions at TCP layer. So there's a lot of problems, right? But we're an inventive bunch. Us web developers, we've had to suffer through a lot. We've built a lot of stuff. So when there's a world, there really is a way. And we've popularized a lot of these workarounds as optimizations. I like to think of them as hacks. In fact, I will claim that they are hacks. And hopefully in the future, we shouldn't have to do them. Here are some prime examples, right? These are ground truth accepted optimizations that we need to do, domain charting, right? Like, who said we should limit ourselves to six connections per origin? We'll just have many origins, right? Like, I should be able to download 60 kitten images in parallel. Nothing wrong with that. Turns out, if you look at some of the popular sites, you'll discover that for slow clients, especially mobile clients, that actually causes a lot of retransmissions at lower layers at TCP. So we're, in fact, congesting these links. And while it doesn't show up on your desktop computer, which is where you're doing your testing in all likelihood, it is hurting the users that you're probably trying to optimize for. So that's a tough one. Concatenating files. This is a crowd favorite, right? Like, concatenate all the things. We've made all of our code beautiful, modular. We've split it into components. Everything is great. And then right before we ship it, we just glob it into this one blob, throw it up on our CDN, and say, here, have at it. And there's a lot of problems as we've learned with that. For example, for something like Gmail, right? We have a lot of JavaScript that we try to ship. We rev that almost every day. If we're not careful, we run the risk of running a self-inflicted DOS attack. Because all the users, if we just blindly throw out a new version, every single user that signed in is going to try and refresh. And that's going to drive a lot of traffic. But in reality, we're only updating a small portion of that data. So we would like to ship just a diff, right? We don't want to ship the entire concatenated bundle. It also slows down execution. If you have many different small files, we can execute them incrementally. And we have, in fact, shown that there's big benefits to having modular code. You guys have a modular on the server. We should be able to deliver it as modules to the client. So this is something that you shouldn't have to do. Spriting images, this is a disaster. Talk to any designer out there and try to explain to them why you need to do this. It makes no sense. Not only that, it doesn't even make sense on the browser. Because in order to show just one part of the sprite, we need to decode the entire sprite, which consumes a ton of memory. And it sucks from any direction that you look at it. Resource inlining, I'm not even going to go on this one. There's just so many problems. For certain types of assets, it's OK. Tax-based assets. But if you're doing base64 inlining your images and other things into your HTML doc, there's so many problems with it. We're inflating the document. We can't cache those resources individually. But all of these things is just something that we have to put up with today. So long story short, you look at that and you can go, well, there's a lot that we can fix here. We can fix HTTP. We couldn't fix this as such that you guys don't have to worry about it. And in fact, that is what HP2 is all about. So if you look at the actual stated goals for the protocol, when you slap a 2.0 on something, there's a lot of expectations. So I think one of the big things that we got right when we started the HTTP2 project is we defined a clear charter of the things we are not going to do. We said we're going to deliver a certain type of performance benefits, but we're not going to touch the semantics of HTTP. So we're going to try and prove latency. We're going to address some of the transport concerns. But we will keep all of the great semantics of HTTP 1.1. Perhaps those could be changed. Perhaps they should be changed. But that is beyond the scope of what we're doing. And that will be HTTP 2.1 or 3.0, what have you. And if I had to explain HTTP 2 in one slide, if you guys are not familiar with it, basically the biggest thing that we're adding is this idea of binary framing layer, which is to say you had your plain text protocol originally, which is just your headers at the top, and then you had the body. Now we're splitting it into these binary frames. So we're saying here's the header data. We have a specific container for that data. We can send it independently of other messages, like the actual data. And this allows us to do many things. And I will talk about or give you some examples of these things in a few slides. But the language here is that it's all the same HTTP. We're splitting it into smaller frames. And each request now becomes a stream. Streams can be multiplexed, which is to say you can have multiple outstanding streams against the server. They can be prioritized. So you can actually say, hey, I would really like that kitten image. But that CSS file is kind of important, because before I have the CSS file, I can't even show you the kitten image. And also optimize things like header compression. We've pushed a lot of stuff into HTTP negotiation. It'd be nice if we can reduce that. So as Mark likes to point out, we are not replacing HTTP 2. Despite its name, 2.0 sounds like a big revision. To some degree, it's not, because for all intents and purposes, we could switch to HTTP 2.0 tomorrow. And none of your apps would even see the difference, unless you care about the framing and unless you're writing either a client or a server. Everything else is the same. Your application will run on top of HTTP 2.0 just as it has before, except that you could optimize in certain ways to make it run even better. So here's the binary framing crash course in one slide. The idea is we are introducing a new binary framing layer, which is to say the semantics are the same, but the way the data is represented on the wire is different. You can't just open a telnet connection and type in get. You will actually have a binary frame, which you can decode, which in fact has many benefits. Because HTTP traffic is so prevalent on the web today, we want to make it efficient to parse for things like routers, proxies, and other things. So binary framing is important for that. For anybody that's ever written an HTTP parser, you'll be happy to know that we have length prefixed frames. Or if you've ever written any sort of network protocol parser, length prefixed frames are beautiful. It tells you exactly what you need to know. After you know the length, you know the type of the frame, which is like, this is a data frame. I don't care about data frames. I only care about header frames. I'm going to skip it. We have frame-specific flags that you can define. And then there is a stream identifier. And that's basically all it comes down to. It's just eight bytes of a common header space, which identifies this specific frame. And after that, it's payload-specific data, things like, here's my JSON data. That would be a data frame. Or maybe it's a headers frame, which describes the actual stream. So once you have all this data, you have your headers and data in different payloads. In fact, there's 10 different frame types in HTTP2. I'm going to highlight just two here. We can start to do interesting things, like we can interleave them. So in this example, I'm showing you, imagine this was one TCP connection. Previously, with HTTP1, you would have to send a request. You would have to wait for the response. Here, I'm saying, look, I'm sending you some data for stream one. And then another stream three comes along. That one happens to have higher priority. So the server is going to pump that into the wire to the client. So we're going to stop the data for stream one and we'll interleave stream three. And you can imagine that this scales to any number of streams in parallel. So this is a huge, huge win in terms of latency and inefficiency in general. Likewise, this is bidirectional. We can use a single connection for client and server to interleave streams in both directions, which is quite nice. And this allows us to finally get the semantics that we need in the browser to say, look, I'm asking for this CSS file. I know what is critical because it is at the head of the document. And the priority in that is going to be one, because I need to construct it quickly. Whereas this kitten's image, it is not critical. It is something below that. It's priority 10. So the one caveat here is that previously, we were doing this heuristic optimization in the browser. Now we're pushing all that logic to the server. So the servers that you guys build have to get smarter because you can imagine how a naive server, if we were just to send you all of these frames, if it was to disregard these priorities, it would, in fact, make the wrong decision. It would say, hey, the kitten's image is a static file that is easily accessible. Let me just pump that into the wire, whereas my maybe dynamic HTML is something that I need to spend cycles to generate. So you can get this wrong. So this is something that we'll have to work on as a community. Flow control is another benefit that we get with HSP2, which is really, really awesome. So the idea here is that the client can control how much data of which stream and when that stream is advanced. So for example, at time 0, I can say, look, I know that I need both the critical JavaScript and the kitten image, but I'm only willing to accept the first 64 kilobytes of that image. So maybe I can render a low-res preview if it's a progressive image. Then I can process all my JavaScript and CSS. That allows me to get useful pixels on the screen, both in terms of the page structure and some visible content. And then I will advance, I will re-op my window for the media stream and get the rest of that content. And you can start to think about how this affects how we deliver media. This allows us to have fine-grained control. The client has fine-grained control over how and when this data is being delivered. So this, in fact, is very, very powerful. And I think it's going to change how we think about delivery of images, video, and other media assets, or for that matter, any long-lived stream. Because you can basically partition your connection or your bandwidth into these chunks and you have full control over it. So that's awesome. And then another one that's new and exciting is server push. So the idea here is you guys are already using server push. In fact, that is called inlining. When you inline something into the document, you're saying, I know you're going to need it. Don't ask me for it. Just have it. So what we're doing with server push is we're just making that a first-class citizen in the browser, such that when we send you a request, you're going to tell us, hey, sure, here's the response. And by the way, I know you're going to need these other three things, so just have them. And that can be now cached in the browser. And that has all the kind of nice benefits of being able to reuse the cache and other things. So this is kind of an unexplored frontier. And I think a lot of people need to wrap their head around what this enables. Some of the other crazy things that you can think about is it's not just about pushing resources. I can actually push a redirect. What does it mean to push a 301? I can store it in the cache. Well, I can push a cache in validation. Let's say you told me to cache an image for a year, and then you realize that that was a mistake. You can actually push an invalidation and erase that out of the cache, or basically put a dummy record in there that says, I'm expired. And that will force our revalidation in the browser. So lots of cool opportunities there. Header compression. So what we're doing is we're finally cleaning up how we transmit headers. So everything is a key value pair, as it should be. We have things like methods, scheme, host, and other things. Some of these have these colon headers. That's just to make sure that we don't interfere with any other headers in HTTP1. And on the wire, as you can imagine, this is just a list of key value pairs. But here's the cool part. In the future, when you make a second request, we don't have to send the same repeated fields. Basically, the client in the server maintains some state to track what data has been sent. So the next time you send the request, let's say we're asking, sorry, this should read new resource. We're asking a new resource file. The only thing we need to send is this new value, which is the new resource. So you can imagine how that significantly reduces the amount of data that we need to transfer. And in fact, if you're just doing a polling request, if you're asking for the exact same data, there are no headers. Because all of that is just implicitly on the client. And that means that the overhead of starting a new stream for something like this is 8 bytes, which is basically as low as it is for a web socket to open a frame. So we're going from 800 bytes, on average, to 8 bytes, which is a huge win. So that's the short story of HP2. There's a lot of stuff coming. There's framing. There's free multiplexing. There's prioritization. There's flow control. And this has big implications in terms of the actual performance of the protocol itself. Here's a quick recap of everything we've talked about. The overhead is significantly lower, like we're talking a factor of 100 for an HP request. The parallelism is effectively unlimited. You can set the limits when you negotiate the connection with the client and server, but we're talking from 10s to 100s to thousands of streams that can be multiplexed over a single connection. And finally, we're eliminating all of the client queuing latency. So I know that some of us here have tried or thought about building an HP replacement on top of WebSockets. You don't need to do any of that, because HP2 gives you all of that, and much more. In fact, I want to highlight this. We're doing this, and we're still inheriting all of the benefits of HP. You have transfer encoding. We have encryption. We have these runs on top of SSL. We don't have to reinvent these things. We get them for free, which is exactly the point. And then finally, the opportunities. There's a lot of work to be done here still. We certainly need to do a lot of work to get the best performance out of HP2, not the least of which is actually we need to undo all of the performance optimizations that we've got all of the developers to do. So now that you've concatenated all your files, please undo all of that, because that actually hurts performance for HP2. We want to be able to deliver modular assets, which means we can invalidate modular assets, and there is no longer any cost to having 50 JavaScript files on the page. We will just send those immediately. There is no waiting. There is no queuing. And life is simpler. Life is more beautiful. And that will leave also to improved page load times. Things like XHR and SSC work transparently on top of HP2. And this is just HTTP. So now you can have a dozen connections, dozen SSC connections open. There are no limits. You're no longer hanging up on that connection, right? So that's beautiful. For WebSockets, there's more work to be done there. There's some experimentation with speedy to layer WebSockets over speedy. But once we do this, and I haven't seen the WebSockets guys actually seriously take a look at this, we get free multiplexing. We get free flow control, all the other benefits that you get out of HP2, right? I know that there are extensions being developed specifically for WebSocket multiplexing. Not technically, that is unnecessary in HP2. So we can do a much better job. A call to action here is smarter servers. We definitely need a lot smarter servers. We're going to be sending a lot more metadata to the server saying, like, I need all these requests. By the way, my flow control window is this. My priority for these requests is that. You guys need to figure out how to deliver those assets in the optimal way back to the client, right? You have a finite amount of CPU and bandwidth. How do you balance those things, right? Should you stuff the pipe with some JPEG data and then add some dynamic content or vice versa? And then finally, I think one really exciting place is that we can finally drop all the other RPC layers, right? The thrifts, the protobufs, and other things, this is a stack that is standard, right? It's going to be supported by hardware vendors and other things. It has all of the benefits that the other protocols provide and much more. We're building HP2 based on the experience of all of these other RPC stacks. So this is all of the best things based on the last 15 years of learning. So I do think that if I was starting building a new backend infrastructure today, I would start it with HP2. It is basically there. You get all the benefits. And I'll leave this thought with you, which is what about HP2 over UDP, right? So we finally removed the bottleneck of HP2 itself being the bottleneck. Now we have TCP. We have head of line blocking. What about shifting the stuff to UDP? We have a crazy project. I call it crazy. To me, it's in the crazy department at Google called Quick where we're doing exactly that. We're saying, well, what if we layer HTTP over UDP? But we'll leave that for next real-time conf. And finally, this is not all as crazy as it sounds. We are, in fact, making rapid progress on this. Firefox and Chrome have branches. They're not available yet in stable or even canary builds. But we have branches that have HP2 enabled. IE has PD support, which is a small delta away from HP2. So basically, we're all headed in the right direction. And there's plenty of server and client implementations of HP2 already. In fact, believe it or not, Twitter.com already works with HP2. If you use one of these clients, you can actually connect to Twitter and then speak to it in HP2, in this case, Draft 6. So all of the stuff is, in fact, coming. If all goes well, we should actually have the official spec in 2014, which may mean, as crazy as it sounds, that we may standardize HP2 before we close all the outstanding bugs in HP1. And that has been a work in progress for 15 years. So that is, in fact, one of our unofficial challenges on the project, to get ahead of HP1, because that would be just funny. So with that, I encourage you to learn more about HP2. It is coming. It is coming quick. And hopefully, it'll make the web more real time.