 Everyone hear me? Okay. So today we're going to talk about HTTP2 and asynchronous APIs. So thank you, Michael, for having me. Thank you, everybody, for having me. My name is David Schaffick. I am a developer, an author. I have three and a half books out. Open Source contributor. I am the release manager for PHP 7.1. You'll learn more about that tomorrow if you're going to be at the conference. Most importantly, I'm at D. Schaffick on Twitter, so if you want to reach out to me, that is the best way. If you don't want me to respond, send me an email. I do work for Akamai. We're the world's largest CDM. We do some really, really cool things. So check out developer.akamai.com. We get a chance. And then every time I speak, I always mention PROMPT, which is an initiative to get people talking about mental health and technology, which is really, really important to break the stigma around mental health. So check out mhprompt.org. Alright, so HTTP2. I need to change my layout here. I've got it set up for tomorrow and it's completely wrong. Sorry. Actually, it's mirrored. That's why. I don't know why the Mac always decides that every time I put it on a new projector that it's going to mirror and that that's useful to me, but... Command F1. Is that a thing? Oh, hey. Alright, there we go. That's way more useful to me. Alright, so we're going to talk about HTTP2. Now, HTTP2 is a real thing, and we know this because it has a logo. But what actually is it, right? So HTTP2 is made up of two different RFCs. RFC 7540, which is the HTTP spec itself, and then a secondary spec 7541 is consecutive for HPAC, which is header compression, and we'll look at that. It was created by the IETF HTTP working group, which is chaired by Mark Nottingham, who is a co-worker of mine, which is really interesting to get into fights around H2 with one of the people that created it. But let's take a look at a little bit of history first. So HTTP09 came out in 1991. Now, for those who aren't familiar with HTTP09, it didn't have anything to get, basically. It didn't have most of what we considered to be HTTP today, but it worked. So in 1996, they released HTTP1, which is really the first incarnation of what we think about HTTP today. In 1999, three years later, they released 1.1, which is basically where we've been for the last, like, 15 years, right? Yeah, 20, 25 years? That's a lot of years, whatever. So 1999, they came out with H1.1, and that's pretty much where we were until Google said, that is not good enough. We need something better. So they came up with an alternative protocol called Speedy, and they implemented it in Chrome, and then it got implemented in Firefox. It has since been deprecated in favor of HTTP2, and it has been removed now in Chrome, is about to be removed in Firefox, and just got added to Safari. Only in Yosemite. Anyways, in 2015, so last year, they came out with HTTP2. So HTTP2 is sort of a fork of Speedy. It took a lot of the lessons learned from Speedy and some of the ideas, and then improved upon it and made it an actual standard. So Speedy is now going away. So as far as browser support, like, we all like new things, but like, we can never use them, right? Because it takes 20 years to get new CSS stuff in, et cetera. This is not the case with HTTP2. It's actually supported in 60% of browsers currently in user's hands, and that includes Chrome and Chrome Mobile. That includes Firefox. It includes IE 11 on Windows 10 or Microsoft Edge. It includes Safari if you're on El Capitan or iOS 9, and it even includes Opera, but nobody uses it and nobody cares. So the big change is that it's binary instead of text, and this is a big deal. This is actually what allows us to be forwards-compatible and like make changes that encode HTTP1 in a much more efficient way. And it's because it is binary, it's much more concise and precise to parse and generate. What this means is on both the server side and the client side, it uses less resources to do the same things. So we get a performance window. It's fully multiplexed instead of ordered and blocking, and we'll get into that. But essentially what it means is you can use one connection for parallel requests. So one TCP connection, multiple requests. It has header compression, which is that HPEC stuff I talked about earlier, which reduces overhead, and it also has my favorite feature, which is server push. And what server push allows you to do is it allows the server to proactively say, you've requested this document, I also know you're going to need this and this and this, so here, go ahead and I'll send them to you. And what the browser will do is it will accept those and it will cache those, and then when it finally gets to the point of wanting to request those because it's seen the link tag or the JavaScript tag or whatever, it will actually end up pulling them out of the cache because it already has them. So that's pretty cool. So what does HTTP2 mean for you as application developers, right? Well, for the most part, it's completely transparent. It's built in the HTTP2 layer, so Apache or Nginx. So turn it on, we're good, right? Just walk away. You get the performance benefits, it's all great. So what's the whole point? Well, the point is the HTTP1 sucks, frankly. We have come up with numerous techniques that we use to make HTTP1 performance. So things like minifying and concatenating JavaScript and CSS, inlining small bits of JavaScript and CSS known as the critical path to render what's visible in the viewport, right? Which has its own problems and its benefits. We use image sprites. We use data URIs and we use domain sharding. And all of these are things that we've come up with to make things better, but they're also hacks. There's a lot of grunt tasks and workflows built around building all of these things. We've got a lot of tool chains and everything to make these things work. So if you remember this, this actually allows us to get rid of a lot of that stuff. Let's take a look at a couple of ideas and use cases. So very simple one, uploading multiple images, right? So just think that you have a web form with multiple file inputs and they've chosen to upload three images, right? So by default, it would be a serial upload. So we have client server. We open up a TCP connection and we shove all three images down as one request, right? The server would receive those and then say it wants to store it in some sort of backend storage. That might be S3. It might be at my net storage. Maybe it's something you built yourself. And it will send those in serial requests. So it would do three separate requests, right? This is not a great way to build it, but it's simple and it's easy. And it's something that we've been doing for 10, 15 years or more. So we try to improve this. We make it concurrent, right? So client server backend, we open up three requests. So maybe we're reducing Ajax to send this, right? So we would send three files concurrently. They would arrive in whatever order they arrive at the server and then it would then make three connections and it would concurrently send everything to the backend store, right? This is an improvement, but it's not the best that we can make it. Another use case. This is a CTB2 and asynchronous APIs, after all, is what does this mean when we look at APIs? So let's take a look at an example, terrible API, but something that's reasonably realistic, fetching a blog post and its comments. Client server, open up a TCP connection, make a request, get post one, right? We get a response back, 200 OK, send JSON, right? And maybe it looks something like this. So type is post, ID, title, et cetera. And then down here we have two URLs that point to sub-resources. So the author of the post and the comments that were made on it. So we go ahead and we make another request to the comments collection here. Open up a TCP connection, get post one comments, get another JSON response. Now in a perfect world, if you're building a perfect rest API, everything that is a resource has its own URL or URI and everything's all separate. So here we have a collection with links to all of the comments. So now we have to go get those, right? So open up a TCP connection, make a request, get a response, do it three more times, right? This is kind of ridiculous. So what ends up happening is, and here's an example of a comment, here's the comment author now, that's another set of requests we have to make. So we make at least six requests, possibly connections. More likely it's something like 14 if we're doing it the pure way. Now we'll get into some of that a little bit later, but let's be academic for now. So enter multiplexing, right? This idea of using one connection for multiple requests. What we can do now is open up our TCP pipe and we can just fire off all of those connections, all of those requests, sorry, within that one pipe and we'll get our responses asynchronously and concurrently, right? Great. So that's an improvement. We'll get into some of that later. So what does this look like programmatically, right? Let's look at some code. So we're going to start off with this, number request equals 378. Now what this represents is this demo. So this is stolen from the Golang's Gopher Tiles example. And basically on the left hand side, we have 378 little tiles being loaded over HTTP one. And then you'll see in a moment, it'll start doing the same thing over HTTP two, using multiplexing. Each one finished, it took 16.95 seconds. And here is H2, 3.15 seconds, right? Now I did this like on 3G on a plane or something. So it's quite a dramatic difference, but it's nice for a slide. But even still, even if you did it on a reasonably high bandwidth connection, this one is going to be dramatically faster with multiplexing. So here's some code. Now, as you can see, I'm using curl. Most of you are going, oh, I was that way until I started working with it. And I've actually fallen in love with curl. It is super simple and just, it works. It's rock solid, it's great. And it supports great HTTP two stuff. So here we have, we're defining URL. You'll notice up here, we got a little percent D to represent our integer that, you know, is the tile. We do a simple four, i equals zero, i less than or equal number of requests, i plus plus, right? This is nothing exciting. We then do a curl in it, create our client. We set some options. So here we're setting the URL. We're sprint effing in the current dollar i. And we're passing that into curl with curl set opt array. We then execute it. And we close the connection, right? That's the worst way you could possibly do it. But that's one way to do it. And it takes 47.67 seconds. Now, this wasn't on a crappy connection. This was a reasonably good one. So that's not great. So let's take a look at H2, right? H2 is faster. So all we have to do is add this line. We set another option, curl opt HTTP version, is HTTP version 2.0. This is absolutely identical. Passing that in here, right? That's where that goes, gets put in. Otherwise, it's the same code. And it takes 62.19 seconds. This is not the result we're expecting, right? And I show you this because I want you to understand that it's not a magic bullet. We'll get into why. But yeah, just turning on H2 for making requests is not going to solve your problems. It's mostly well on the front end, like serving over H2. All right, so now let's look at concurrency, right? So now we're going to look at HTTP 1 concurrency. And the way that we do that is using curl multi. So we have mh is curl multi in it. Inside our loop now, we assign the curl and it handles to an array. And we add them to the handle instead of executing them. And then we have this piece of code, which looks exactly the same in Python and Ruby and PHP. And I hate it, which is we have a do while loop, which essentially will run until we get a response. And then it will switch down to this one where it will start handling responses. With server push, which is something I added in 7.1, we can actually do this a little bit better. And I need to look and see if I can do this better as well. But this is like the example from the manual. This is what you will see. And I hate it. So we do this. H1, we got onto 8.66 seconds, which is a dramatic improvement. This is the way you should be writing stuff today. If we then add in multiplexing, we again add our HTTP version. But we also set pipelining to multiplexing. Another option. This is a curl multi set up, right? So with those changes, this is identical. This doesn't change. And I still hate it. We now get to 2.14 seconds. So now we're finally starting to see a dramatic improvement in speed as we would expect. So I want to go back and explain why it was slow. I think it's important. And it basically comes down to this. HTTP2 negotiation. So in an effort to make HTTP2 backwards compatibility, all connections start out as basically HTTP1 connections. Now, there are actually two protocols in HTTP2. There's HTTP2, which is HTTP2 over TLS. And it negotiates whether or not it's going to use HTTP2 during the TLS handshake using ALPN. And then you have HTC, which is HTTP2 over plain text. So more similar to what we have now with just plain non-SSL connections or TLS connections, right? So ALPN, as I mentioned, is the way that you negotiate through TLS. It's RC7301. It came out in 2014. So during the handshake of TLS, confirming the identity and all that sort of stuff, that is when they will negotiate H2. Now, if you recall, I think two months ago, there was an article, Google broke all of HTTP2 for everybody. And basically, it's useless now. The reason for that is that they deprecated, I believe it was NPN, which was the predecessor to ALPN. And the support for that was way, way, way less. But now we're pretty much back up to where we were and ALPN is great and everybody's happy. All right, so what does this look like? Obviously, I'm not going to sort of decrypt TLS. So we're going to look at H2C just to give you an idea of what this might look like. So essentially, we do an HTTP1 request. We set the host. We do this connection string here. We say we want to upgrade and we're sending HTTP2 settings. We say we want to upgrade to H2C. If you try to upgrade to H2, it should fail at this point because that's invalid. It has to be over TLS. And then we send settings. So this is actually a base 64 encoded portion of the HTTP2 stream, which would normally be binary, right? And then if the server supports HTTP2, you get an HTTP1.1, 101 switching protocols response. It tells you it can upgrade and then it's going to go to H2C. And then from that point, the actual TCP connection is now running H2. You just continue to use it as an H2 connection. Now, you don't have to send a GET request. You can also send, say, options or head. You can also use post, delete, or put with the caveat that if you are sending a request body, it cannot finish the upgrade until it gets sent. So for example, our three image uploads earlier, you wouldn't want to send the first request off as your upgrade request. You would send a head first, and then you would send all three once that comes back really quickly. You can currently use a multiplexing. All browsers require TLS for HTTP2. This is really, really important. There is, like, so Curl supports HTTP2 without TLS. And you can do it programmatically and everything. But every single browser on the market has decided you must have TLS for HTTP2. And of course, typically we would say, well, TLS is slow. That's why we don't want to have it on everything. But because we have this idea of one TCP connection, you only have one TLS negotiation. So you effectively remove all of that extra overhead. And the fact that it is maybe a little bit slower than plain text doesn't matter so much anymore. Now, you can do direct connections with HTTP2. So it's possible to skip that upgrade step. If you know, you have what's known as prior knowledge, that both sides are running HTTP2. And they send a very simple connect string, hey, you know, either work or bail out. I have here that it's not supported in Curl. It was added in, like, two point releases ago. I think 7.4.49 or whatever it is. All right, so server push. As I said, this is my favorite thing. So if you remember, we have multiplexing. This is good. The problem is we're still making sub requests, right? So what we do as developers, and I mentioned I would get to this, is we will take shortcuts for performance. So rather than having 16 requests, we will just take all of the data and we will munch it into one large JSON response that we will send, right? This is the tradeoff that we make. Minimize the number of requests but maximize the size of the payload. So even if they didn't want the comments, we're sending them anyways, right? That's the tradeoff that we make. So everyone can read that, right? All right, so here's an example. So I've embedded my author information, right? Here's my comments array and it's just got the comments right in there. Now notice that the author information is in there also. So here is me who authored the post. My information is duplicated, right? Or Jill who has two comments on here. Her information is duplicated for every single one but this is a performance sort of tradeoff that we make. So we can improve on this with server push. So client, server, TCP connection, get post one, right? This is all the same. Get a JSON response. Now on the server, we can say we think most people who make this request probably want the comments so we're just going to send you those. Here, have the comments. And it would send that array of URLs, right? And then because it sent you that, it will start sending you the comments. So one, two, three, four. Great. And we can also send you the author information, etc. All without the client having to make a request. What this does for us, now as a client, we actually have complete control over whether or not we want to accept that request. They can't force it on us. What happens is they send what's known as a push frame, push promise frame, and we say, ah, we don't really want that, we'll reject it. So you can try and push, it's optimistic, they want it, they take it. What this does for us is it means that CSS JavaScript minification is completely unnecessary. Gzip compression with multiplexing and server push means that we can get almost identical performance without having to do all of that work that we have to compress and minify everything. So if you look, even prior to this sort of stuff, minification plus gzipping, you only get like an extra 1% savings in bandwidth. It's just not worth it when you consider all of the work that has to go into compressing and minifying. All right, so HTTP is made out of streams. Each request and response is a stream, and streams are comprised of frames. Each stream can have a weight, which is to do with priorities, and it can also have dependencies. So the way that weighting works is you can say, ah, I give you a weight between 1 and 256, and the client and the server will figure out what that means in terms of the proportions of the amount of bandwidth that they have, or the resources that they have. So we can give something a weight of 1, okay, give something also weight of 2, and it gets two times the resources of something with a weight of 1. Or we give it a weight of 3, and it has one and a half times the amount of resources of the one with a weight of 2, and three times that of 1, right? So we can start to be really, really clever about the way that we apportion resources and saying this is more important than this, right? We can also have dependencies. So we have stream A. We have stream B, which depends on A, there's no point delivering it until they've already got A, so they can apportion bandwidth and resources based on that, and stream C that depends on B. So an example of this might be, have a style sheet with embedded fonts. Don't send them until you've got the style sheet, right? Or a web page with a style sheet with embedded fonts, right? You can start to set these things up. It's pretty cool. So as I mentioned, it's all made up of streams. So streams are small binary messages, and you have multiple frames per request. So you have headers frames, data frames, settings frames, push frames, all these sort of things. So each frame is very consistent as a common header. It's nine bytes. It's got the length in there to say how long the frame is, all this kind of stuff. And the point is, is that it's easy and efficient to parse. And you can interleave them. So you can send frame A from stream one. You can send frame C from stream two. And the whole point of this is how you get multiplexing, right? So you just interleave the frames from all the different streams. It's pretty cool. So here's an example. Each one request on the side. So we have all of our headers. Those would get sent in a header frame. Now the spec says that has to come up at the beginning of the request. So you know you're going to get those first. And then we have a JSON body that would have been sent in the post request. That now goes in the data frame, right? You also have headers in HPEC. So this is actually something I don't appreciate as much as I think it should be appreciated, but it uses a table as sort of an index. So there's a predefined table in the spec. It's 64 entries long of default values. They all have an ID. And rather than send those values, you just send the ID, right? So that can represent a header name and a value, or just a header name, and then you supply the value. And what's really cool is that you can actually append your own items to that table. So here's an example like the first 15 entries. You'll notice there are things on here that are not traditionally headers. So we have the method. We have the scheme. We have status codes. But look here. We have path slash index dot html. Rather than send that, we can just send five, right? So that's where you get some compression. As I mentioned, you can add your own stuff. So think about cookies, right? You say you have a session cookie which has, say, four kilobytes of data in it. You're sending it back and forth on every single request. What if, instead, you send it once from the server and then from that point on, you just exchange an ID of the item in the table? You're saving four kilobytes per request, right? This has the potential for massive bandwidth savings. So with all of this, we're going to have to come up with new architectures and techniques. So when we start thinking about delivering a page, we have some opportunities here to explore new ways of doing this. So have web browser. Open up a TCP connection to the server. Request our page, right? Now, I'm doing get slash index dot html. I could actually probably just send that hpack header instead. Get a response back, right? And now, instead of using critical path and embedding that little bit of JavaScript and CSS to render what's on screen, we can send those as discrete, separate, cacheable requests. And this is important. So the problem with critical path is you have to put it in the document regardless of whether or not they have your main CSS file cached because you don't know. So that's potentially overhead that you don't need for the benefit of improving performance if they don't have it cached. But because these are separate requests, these can be cached separately, and when you do try and send them, the browser will just go, already got it. Don't worry about it, right? So it's already cached. So here we send the CSS critical path to JavaScript and because we're vain, we're going to send our logo, right? And then we can have a dependency, as I mentioned. So we send the critical path CSS, render what's on screen. Maybe we have our name in a custom font. So we want to make sure that gets there as quickly as possible, but it's pointless without the critical path. So we make that a dependency. We send the rest of our JavaScript, right? Because we want things to actually work, not just like what's visible. Maybe we send a splash image. I think that's a terrible idea, but hey. And then maybe you finally send the rest of your styles, which is to render everything that's off-screen, right? We look at APIs. I don't know why that's still Chrome. Maybe you're using Postman. Open up a TCP connection. Do our get request, right? Post one. Get JSON back. So now we start... We already looked at kind of doing some pushes, but now we can start to bring in dependencies and things like wait. So we send back the comments, send back the author information. We have a dependency. We're sending you the comments, so we're going to have all of those comments be dependencies of that collection resource. Similarly, we're going to have the author's avatar be a dependency of the author resource, right? And then finally we have like the author information for the comments and all that kind of stuff. So in summary, no more hacks. All of the workflows that we have to build all of that stuff we need to make H1 performance, at some point in the future we can throw that away. Now obviously we have a transition period. I think we're going to be doing multiple things. What I would love to see happening, for example, with APIs is if they are requesting over H2, send back the discrete resources using push. Otherwise, munch it all together, right? So we can programmatically make these kind of changes, and that's where H2 starts to impact us as application developers. There's the potential for huge performance wins in the right scenarios, which is anytime you're doing concurrency, right? It already has 60% market share. This is something that's out there that we can use today. So, HTTP2 is awesome, and I'm really freaking excited about this, and I hope you are too. Yeah, that's it. Thank you.