 OK, let's welcome the next speaker, Tom, who's going to present us his project, HTPX. Thank you. OK, so I'll just start by checking the mic volume. Am I OK at the back? Yes? Cool. OK, good. OK, so my name's Tom Christie. I'm the author of a whole bunch of different open source projects, including an API framework called Django REST framework, a tool for building documentation from Markdown called MKDocs. More recently I've been spending a whole lot of time in the Async.io landscape, building an Async.io web server called Yivicorn, an Async.io micro web framework called Starlet. Most recently of all, spending a whole bunch of time working on this HTTPX, which has just about now gotten to the stage where it's kind of ready for everybody to start using it. So what is HTTPX? HTTPX is an HTTP client, just like requests. So you can use it for downloading web pages or interacting with APIs or anywhere where you're issuing HTTP requests over the network. And it should look pretty familiar to anybody here who's used the request library, which I guess would be most of you. We've got to the point now in the development where we've pretty much got feature parity with requests. And that's a lot of ground to cover. So for instance, if you want to be able to seamlessly just present the user with the unicode text of the response that's been returned, all of the things that you've got to do under the hood include things like, okay, well, the bytes that we got over the wire might be compressed because HTTP has a mechanism for content compression. So first of all, you've got to figure out whether they're compressed or not. And if they are decompressed it, and at that point, you've still got a whole load of bytes on the wire. So you've got to figure out, well, what's the character set that we're going to decode that text into? That might be present in one of the HTTP headers. If it's not, then you're going to have to find some way of figuring out what the character set probably is. So doing all of that stuff gracefully, handling redirects gracefully, different HTTP redirect status codes have different behaviors. They might change the methods. Some of them preserve the body when you're issuing the redirect. Some of them don't. Being able to handle basic and digest authentication. There might be authentication in the URL itself because that can contain basic authentication credentials. If you get redirected away from the origin that you first made the request to, then you need to be stripping the authentication. There's a lot of ground to cover. But there wouldn't really be much point to this if what we were doing was just building a new package and trying to match requests, like for like. So the big motivation for HTTPX has been to do all of that, but also introduce a bunch of new functionality. The first big piece of that is to provide both an async capable HTTP client and a standard synchronous API, all in one same package with a coherent API between those two different cases. HTTPX includes support both for HTTP 1.1 and also HTTP 2. I'm pretty sure it's the first, well, yeah, I think it's the first Python client library to really do that. It's got the ability to make requests directly to a web framework rather than actually sending requests out over the wire, so you can plug it directly into a flask application or something like this. We've got some nice sensible behavior about strict timeouts, which we'll come to, and the whole thing is fully type annotated all the way through. All of that together with reams of documentation and almost 100% test coverage. And there's more to come as well. So HTTPX, as I've said, should look pretty familiar if you've used requests. We've gone to great lengths to retain API compatibility with requests wherever possible. There's a documentation page specifically for what differences are there between the two in places where we've chosen to make a change. You ought to be able to migrate over from one to the other just by going and having a quick scan through that guide and switching anything over that's apparent there. But mostly not. A couple of examples of places where we differ. And we've got a slightly more constrained streaming API. So with requests, it's actually quite easy to inadvertently use a streaming response and not realize that the way that you've done that has left the connection hanging open at the end of it. So a little bit more constrained there. But we use different naming instead of request.session. We preferred HTTPX.client. And there's some differences at some of the lower levels of the API as well. Once you start getting into stuff that's more really implementation details, methods on the client class and so on. So fully type annotated. That's both at the layer of the public API but also all the way through the code base. The big advantages of that if you're building up your service and you're using a type checker like MyPy, it will be able to automatically ensure for you that you're calling into HTTPX correctly with the right types everywhere. So if you get type hinting in some IDEs and all of the APIs are nice and clear and explicit, there's no going and having a look at something going, well, does that thing take a string or does it take something that's string-like? Or so on. It's also given us a really high degree of confidence in our own code base. I think HTTPX would have taken a lot longer to build if we hadn't had strict type checking all the way through, I think. And some of the bits where we've made large refactoring earlier on in the age of the project I think would have been really, really difficult to do without that. Well, timeouts by default, so the timeout policy in HTTPX is to always have timeouts enabled by default so that rather than being able to, if you inadvertently make a request and you haven't set any timeouts and the network hangs, then your script is just left hanging indefinitely, you've got a five-second timeout by default and if you want to raise that or remove that, then you can. There's lots of fine-grained controls on that. You can control different aspects of the timeout. So for example, you can allow long connection times, but reasonable timeout periods if it hangs whilst you're downloading a response and you can control the timeouts on the connection pool and things like this. But we try to provide a really simple API at the basic level and then allow you to expand into some of the more complex options if you need to. One of the, absolutely one of the bigger things, the async support. So, Threaded Concurrency is not very good at performing lots and lots of network operations at the same time. You're kind of bounded by maybe having 10 or 20 threads running on the machine and that's as many concurrent requests as you'll be able to make. With async.io, the concurrency model is much less resource-intensive, so you can have thousands of concurrent tasks running easily and that allows you to do stuff that Python hasn't traditionally been very good at making large numbers of concurrent HTTP requests. We've kind of been having to catch up with other environments such as Go and Node, which have performed much better in this regard, but we're actually really kind of coming quite a long way along now. So, async.io is one of the async frameworks for Python. It's the one that's there in the Stoodlib, but it's not the only one now, so there's at least a couple of others. There's one called Curio and there's one which is getting a lot of development called Trio. Trio is exceptionally well-designed. It's motivated by this particular design constraint that its author has been kind of one of the architects of, I guess you would say, called Structured Concurrency. And one of the things that's a bit difficult is async.io and Trio and Curio. They're all completely incompatible, so you either have to pick one or the other. At the moment, of course, async.io, because it's in the Stoodlib, has the widest ecosystem support out of any of the options, but Trio is really worth looking at and a lot of the design work that's been going on to Trio, people are trying to pull down into async.io as well. The second of the really big features is adding HTTP.2 support. So, HTTP.2 is a big update to the HTTP protocol. That switches from using, whereas HTTP.1.1 has always been text over the wire, so if you go and inspect the raw bytes, you'd be able to just go and inspect the headers and see what's happening. HTTP.2 is a binary protocol, and that's allowed them to do some things that can improve the performance, such as header compression, and minimizing the size of the requests and the responses. But probably bigger than that is what's called stream multiplexing. So, with HTTP.1.1, when you're making concurrent requests from one particular server to another, you have to use one TCP stream for each individual request. With HTTP.2, it has a mechanism whereby you just open up a single TCP stream and you can send almost as many different requests and responses along that single connection as you want. And those two things mean that HTTP.2 has both lower latency sometimes, latency being the time it takes from when you've made the request until you get the response back, the lower latency being because you're less frequently having to establish a whole new connection in order to get your response back, and it also means you can get much higher throughput to be able to make more requests at the same time. So, this is a bit of an example of the stream multiplexing. In this example, it's a web browser that's making the request. It's going off and it's fetching a web page. The web page has got a bunch of JavaScript and CSS associated with it. In the 1.0.1 case, each of the two other resources that we're downloading is then downloaded sequentially. And in the HTTP.2, we can go and download them in parallel over the same connection. Now, it's a slight simplification because actually in HTTP.1.1, you'll tend to open up more than a single connection. But even so, if you're downloading lots of resources at the same time, this is kind of what it ends up looking like. HTTP.2 is significantly more complex than HTTP.1. So, for the moment, we've decided that the best user experience is to not enable it by default. We don't have any big outstanding bugs for it that I can see at the moment. But I think there may still be some cases where servers might be less robust or, you know, it gives us a little bit of a chance just to start battle-testing it before we decide to flip the switch and say, yes, we're happy enough that the user experience on this is always going to be good enough that we'll just keep it on by default unless you want to switch it off. And presumably at some point in the future, we'll end up doing that. So, some use cases for HTTPS. And the first one's really simple. Everywhere that you're using requests at the moment, right? It's a fully-featured alternative. If there's something that you can do in requests that you're not currently able to do in HTTPS, raise that as an issue, those things are going to be pretty limited around the edges at the moment. You know, there's, I think, event hooks. We don't yet support. I can't remember. And, of course, request has got a little bit better ecosystem support. You know, so, for example, custom authentication classes. More people have written those for requests so far than they have for HTTPS. Yeah, SOPs for proxy support is available in requests. We haven't yet got that. And, yes. Next use case, making parallel requests. Make it using the async client variants to make multiple HTTPS requests at the same time. So in this example, I'm using Trio as the async framework. And starting up 120 requests to Wikipedia to download all of the Wikipedia pages on the years from, I'll say 100 requests from the years 1920 through to today. Don't expect this to be completely magic. Although I say they run in parallel, you're not, you know, if you were running these sequentially, it won't actually take 100 times longer than this one will run. It will be faster, or it will be slower whichever way round I've said it. Can't remember. But you've got to, there are other factors at play. You know, you may be constrained by the network bandwidth that's available full stop anyway. You might be constrained by the server resources on the other side. So it's not going to just be magically the number of times that you're issuing parallel requests faster than. But it will, you know, it makes a big improvement, you know, like, da, da, da. Also, really, that's something that I ought to go and write some, do some really good investigation of and write some blog posts and case studies. So I'm hoping to do that in the future. What we got next? Oh, yeah, same thing. Same thing, but add HTTP to into the mix. And why you might do that is because, as we've talked about, performance benefits, sensible thing to do. Say you're writing a web spidering tool and you're using the async requests to get really good performance out of that. Try it with just HTTP1 to start with and then add HTTP2 into the mix. You'll also want to check whether the HTTP version that the server has responded with really is HTTP2. It might not support it. You can look on response.httpversion and that will give you the actual version that was used in the response. Next use case. Okay, so we talked about making requests in parallel. One of the use cases here doesn't look within the particular code block like you're making requests in parallel. When you're using HTTPX with one of the newer async web frameworks that's out there, maybe Sanic, FastAPI, Starlet, Cort and AII HTTP, although they've got their own built-in clients. The endpoint code that you're looking at will just be a, generally, will just be a single sequential flow of code. However, your web server is handling multiple requests at the same time because you're using an async framework with an async HTTP client inside that, you're not blocking an entire thread every time you're making an outgoing request. So your framework will be able to support a far higher throughput than if you were using a WISGI framework such as Flask and using requests to issue those requests. Another use case that I think's gonna start to be really interesting, don't expect to read that on the side, it's just to kind of give you a feel for, oh, okay, on the right-hand side, that's an example of a proxy server written using HTTPX and ASGI proxy server. And what I think's really exciting about being able to start to build up things like gateway services is, I think Python really starts to hit a sweet spot between productivity and performance. Typically, for this sort of thing in the past, you've wanted to use Node if you want to do this sort of thing or maybe go. What else have we got? Plug-ins directly into web apps. So, yes, you can direct your HTTP requests to a WISGI or an ASGI app rather than actually sending them out over the network. That's useful for using it as a test client. It's also useful for writing. Sometimes you don't actually want to be sending out requests, you just want to be simulating it. Say if you're in your staging environment or when you're testing locally, it's really useful to be able to stub out, mocks there and switch depending on the environment. One of the technical challenges in this has been how do you build an API that supports both the synchronous variant and the async variant without just copying everything in two separate sets of code bases. We initially started out with this kind of bridging approach whereby everything under the hood was actually running async and for the synchronous variant you just run this little kind of shim layer over the top and you run your event loop behind the scenes underneath that. Turns out there were some issues with that which were pretty intractable. So we've ended up switching, well, we're in the process of switching over to a slightly different approach. We've got something in the interim that's working really well. So I'm in a bit of a rush now but some of the components that we've used to build this up, there's two packages in particular that are absolutely outstanding pieces of work, H11 and H2 which deal with the core HTTP parsing for us. At the moment, so requests under the hood uses the URL lib3 package to actually do the now send a network request. At the moment our synchronous client also uses URL URL lib3 when it's sending requests and the async client uses our own engine that we've developed from scratch. The biggest thing that we're changing at the moment is getting rid of that and making sure that we're using the same engine in both places. So I've ended up rushing a little bit towards the end. So quick look at some of the other stuff that's on the horizon. Streaming multi-part uploads. So one of the things that's difficult to do with multi-part uploads is typically say with requests it will pull the whole, if you're doing a file upload and you're using multi-parts, it will pull that whole file into memory and then send the request. And it needs to do that because it might get a redirect response and it might need to resend it. We've taken care of how are we going to be able to provide streaming but rewindable multi-part uploads. HTTP3 support, Jeremy Lane is doing stacks of work in this in Python and I think yeah, that's something that we're going to be looking at in the future almost certainly. One of the other big things that we're looking at as well is taking our core networking component and isolating that into a really, really tightly scoped independent package that just makes network requests, doesn't do any of the clients smart, doesn't have any request or response models over it but it's just the data primitives. Here's how you make a request, here's how you get a response and being able to do that both in the sync case and in the async case and in the async case supporting async IO, supporting trio, hopefully in the future supporting curio, maybe supporting twisted as well. So that we've got this one really tightly scoped based package that then the client can sit on top of and I think we're probably, all right, yes.