 Lee Symes, who is a developer at Catalyst in Auckland, who's kindly flown all this way to come and see us and to talk about kittens. Lee Symes. A couple of things. Woo! Yeah, so today I'm going to be talking about some amazing stuff and a lot about kittens. So first kitten shot of Ligertree, right? A little bit about me. So I worked for Catalyst for a year and a half. I basically went straight there out of uni. And yeah, I'm writing a lot of Python JavaScript PHP. A little bit of Perl here and there. I've also touched Golang. It's a really cool language, but I just haven't had the chance to actually dive in. Also, I have depression and anxiety. And to be quite honest, neither of them being very useful. Yeah, it was great fun. So, yeah, this talk was finished last night at 11 o'clock. Yeah, just because I've been putting it off for so long. So without further ado, so first of all, we're using the Flickr API because they've seen a good way to get a lot of kitten photos. And there's a couple of quirks with Flickr API that I'm kind of just parking off to one side. And if you want to have a look at what they're doing, you can grab the code. But basically what we're going to do is we're going to use a list of my predetermined kittens. And for each picture, we're going to get the metadata so that we can say, you know, this kitten was, photograph was taken by this person. It's copyright and CC by SA. We also need to get download link. And then we actually need to download the picture. And finally, we can store that metadata and generate some HTML for it. So like I mentioned, CodeWise just extracted a whole heap of stuff out of the code that we're going to see. Just because the, yeah, there's quite a few things with Flickr API. It seems to be based around XML rather than JSON, which means you get some quirks. It's quite fun to work with. And we're using requests because that's just kind of obligatory with all any HTTP stuff. So, wow, that font is really bad. Oops. Sorry if you can't see that. I don't know how to fix that. Tech guys. Hold on. There we go. I have a curse. Ah, isn't that better? Okay. Sorry about that. I'm not going to live update my code. So basically what we're doing is we're just preparing the request to go out to Flickr because of the way it does JSON callbacks and all this stuff. You have to add a whole heap of things to make it work. And then we're just going to check whether the status code get JSON out and return it. So this is basically just a way of compacting that code into a single function call. So that when we actually want to get the image, we just make a single API call out to the info one. So a single API call to get info. And then another API call to get sizes, which is the download links. Then we call out some extra functions that basically part of that response. Throw it all together into a nice dictionary. And we can now update that data. So now we've got all that information stored. And without right clicking. So then we basically can just download the image and write it out to a file. And then we return the photo data so that we can store it. So throwing that all together, basically a quick main method. The writer just writes out a HTML file. Make sure everything's tidied away. And also times it. Because that's important. And basically we're downloading four of them, one after the other. So here are your four kittens. They're kind of cute. By the way, I was expecting a couple more, like, oars from the other. Like, aww. I'll do it for you. That took 12 and a half seconds. On my home network, because it's really bad, it takes about 45. So we can go faster. Because really what we're doing is we're making a request. And then we're waiting for Flickr to do its thing. And then come back to us. And then we're making another request. And waiting for Flickr. And then it comes back. So we're going to be using async.io. And AIO-HTEP. AIO-HTEP is basically requests, but for async.io. It's got much the same syntax API. It also has a server built-ins. Like, server-side stuff as well. So it's kind of like flask and requests combined with AIO. Or async.io and everything just kind of in one library. Which is cool. So a very quick async primer. That actually reads all right. Cool. So I'm not going to be talking about async from Python 3.3 or 3.4. It's possible to do async.io stuff in those. It's just that in Python 3.5 there are new language constructs. Specifically async and await that were added. And it just makes code look nicer and cleaner and beautiful. Pythonic. So async.def basically creates an asynchronous function. It allows you to use other async keywords like async for and async with. Also allows you to do await. And it prevents you from doing yield and yield from. And the reason we don't allow yield and yield from is because what the await keyword does is effectively does a yield from. Await pauses that function until that call is completed. So if we were to await get request to Google. The event loop would still be able to process other things. Say if we're doing a whole heap of requests off to different sites. Those would all still be processed. But the function that's doing that request off to Google would pause until the request come back and it's got the data. And then the function can continue. And that's all managed by the event loop. So, and this is going to be awkward for a while. We're going to import async.io. We're just going to define a print hello world. And to say hello to KiwiPicon. We're going to print preparing to say hello. Then we're going to sleep for five seconds. We're not using time dot sleep because time dot sleep would effectively block the entire event loop. Which means nothing can happen for five seconds. Whereas this async.io dot sleep is basically saying to the event loop in five seconds wake me back up. But go and do other stuff if you've got other stuff to do. And then we obviously print hello. So, when we call it down here, what happens? Well, nothing. And the reason is that you actually need a bit of infrastructure around this asynchronous function, the event loop, to actually make it run. So, basically the same sort of code up here, except we only have a couple more sleeps because I'm tired. And we're now going to get an event loop. And we're going to say run this function until it's complete. And that basically starts the event loop. And the event loop does its thing. And once that function completes, it will return control back to here. And we can print by. So, I'm obviously going to be very polite. And then one second later, I'm going to prepare to say hello. And then hello KiwiPicon. And then one second after that, the print hello world function will exit, which means that the code will now continue down here. And we'll say bye. Right, I'm out of here. So, that's your very first day sync function. Woo-hoo! Yay! Okay. Woo-hoo! Yeah! Rocky. Okay. So, we're already making for your very first async in. Because we've already got sync in at the moment. It's a bit sad. So, we're going to be using AIO-HGP. Let me just find my cursor again. So, that's kind of the first bit there. So, we're obviously going to import the library. We're going to async def, just so that we can use those await keywords. And then we're going to use this async width. It's basically like a width block, like the context managers that you all used with files and this sort of thing, except what you can do is when you enter, the enter method can do stuff asynchronously. And the exit method can do stuff asynchronously. So, in this case, what's happening is the enter method is actually doing the request and waiting until the response comes, or the headers for the response comes back before allowing us to carry on. So, when we're in that block, it's... So, once we reach this image line here, it's already gone off to the server and come back and said we've got headers. Fantastic. Then, obviously, we've raised the status just to make sure we haven't misplaced an image or something. And then we have to wait for it to read the whole image as well. We can't just... We can't just assume that the whole thing's been read right now. Especially if it's a large image, it will take some time. And we don't want to block for that either, because something else might want to happen. And then we basically store that in content. And then when we exit that with block, async.io will basically do a whole heap of cleanup, like closing connections and doing all these other things that we don't care about, but it just does it with all asynchronous. So, we're not blocking event loops or anything like that. So, I think that's really cool. So, up next, basically, then we just write out that content. The reason we're not doing, like, an asynchronous write, from my understanding is that Linux doesn't really support the notion of an asynchronous write-to-disk. And thus, the Python async library doesn't support the notion of an asynchronous write-to-disk. So, it's all synchronous writing. And then another thing is just this asynchronous main method. It's basically your main method, except, basically, I turn to write it with a main method that basically just bootstraps the asynchronous main method. So, this is your main function. Oh, questions? Oh, okay. And we're just going to download that. And then we're just going to say, run that and then exit. Which gets us that. Oh! This talk was kind of hard to write. Can you see why? And that took 10 and a half seconds on my home network. I've run this on various places, and I haven't updated that time, but that was at home, and it was all right. It's surprising. You can still watch Netflix, but Google Homepage is a bit touchy. I don't know. New Zealand networking. Yeah, so why don't we always download two kitten pictures, though? Right? Well, we could download one, and then we could download the next one, but we're asynchronous. So, we could download them both at the same time and wait until they're both done, and then say we're all done. So, we're trying to run it in parallel. Basically, we've got exactly that same get kitten method. Nothing's changed up there. And then down here, what we're doing is we basically change the main method to say, wait on the results of both of these get kitten calls. So, the gather will basically wait until all of those results have come back and return them as a tuple. Hence the sort of like tired face syntax there. We obviously don't need the results. It's not retaining anything. It doesn't do anything useful, but it's good to know that it's actually returning something. And the same thing. We're just going to run that. Which gives us those two kittens. Oh. This talk's going to run long. I'm sorry. Oh, dear. So, okay. How can we apply this back to what we were doing just a few minutes ago with the downloading four kittens? Oh. Oh, no. I've completely lost track of time. So, sorry. Time-wise, it's faster, right? The downloading one and then the next one. So that took 40 seconds. I think that's just because my network doesn't really like handling a lot of requests. It seems to be able to download things quick enough, but yeah, it's weird. But basically, the whole thing is, is that that's downloaded faster. Five seconds faster. I'd say five seconds and I have two kittens now. In case of emergencies, five seconds couldn't be, you know, important five seconds. So, anyway, that's the kitten at hand. Or Bush. So, what were we doing previously? We were getting info. And then we were getting the sizes. And then we're downloading a kitten picture. And then we're instant repeating. Get the info, get the sizes. Get a kitten picture. We could do even better because if we get the info for two kittens at the same time or all the kittens at the same time and get the sizes for all of them and then download the kitten picture for all of them, we can go even faster because we no longer waiting for a slow request. We're waiting for a couple of slow requests to come back and say, and be done. But if one of them takes a long time and the other one doesn't, actually, I lost my travel thought again. Woo-hoo! I'm distracted by a kitten. Oh, dear. So, it's going to run faster because we're not actually waiting for every single request. They're all running in parallel. We're only waiting for the slowest ones because the fastest ones will just finish and be waiting on the side. And the slowest one will be whatever holds us up. So how do we implement this? Well, first of all, you need to update our API call because previously it was synchronous and we can't really call synchronous code because otherwise it blocks the event loop. You can't. It's a very bad idea to block the event loop because then nothing else can happen. So, basically, update the API call method. That block, the update thing, is exactly the same. We're just going to do another get request. This time we're using a session that gets passed in. Basically, it allows us to do pooling, client connection pooling. Waiting for that, Jason. Again, because we don't know whether all that data's come through, we have to wait for the data. The only guarantee when you enter that with block is that you've got headers. And then we just do some quick checks and return that data. So how are we actually going to... I need to change that color scheme. I'm really sorry. I didn't realize it was going to be quite that bad. Yeah, I promise not to lie about it. So that actually works. Okay, cool. So what we're going to do, first of all, is wait for the first API call and then wait for the second API call. But they're not actually going to block any other API calls from starting. So what's actually going to happen is when the main method starts, it's going to run four of these in parallel. And then each one is going to start off its own API call, independent of the others. And then when the first one of those comes back, it's going to stand off the next API call. And that means that we no longer just... We don't have a lot of idle time in between that we're actually able to do stuff. The same block of code there. And then downloading the kit is basically just... Okay, same thing as before. It's very repetitive, isn't it? Open up an action, download the content, write it out to a file. Fantastic. And return the photo data. And then throwing it all together again because I picked a bad color scheme. We're going to do that. Again, here we're using... If I try not to kill myself. Gather to get all of the kitten data because there's actually return stuff. We want to hold all of it. And then we're going to add it all to the writer so that the writer can output it nicely. And then so that my really, really hacky JavaScript can display on my slideshow like this. Really cute kitten. Another really cute kitten. Downloading synchronously took 12 and a half seconds. Downloading asynchronously took four. And this changes even bigger on my home network. It's like 45 down to 20. It's actually quite a lot faster because we're not just spinning our wheel. We're able to do stuff. And parallel and make it faster. So can we go even faster? Well, yes. Because the way I designed this intentionally was so that we didn't actually need to get info call in order to download the kitten picture and write that out to a file. So we could get info call and get sizes and download kitten call in different lines of execution. Make it go even faster. So basically running through... How am I doing for time? Oh wow, that's 20 minutes. So basically we're going to make that API call and then we're going to call out another method that basically does the download stuff. But these are going to be running parallel. And then we just return that. So... There we go. I'm really sorry. There we go. Please work. So... Great, I've lost my travel thought again. Can I borrow my kittens again? Is that... Yeah? Cool. So this is actually doing the download. We're just going to wait for that API call and we're going to then get the... Basically split those two blocks out. Yeah. So does that run even faster? I don't remember. Interesting story. So it runs slower on my home network. The more efficient one. Basically, yeah. When I ran it last night off the hotel Wi-Fi, it seemed to run a bit faster. There you go. One second. Yay. Save a life. So why didn't it run faster at home? I guess the thing is that now we're doing eight requests. There's a lot of requests starting happening, opening connections, all this sort of thing. My home network is obviously very bad at high latency, but good throughput. So opening lots of connections is kind of bad. But it's something you've got to experiment with and if you're on a fast network at work, this is the faster way to go. So basically what I wanted to show was it's actually quite easy to move through and basically make your code ridiculously fast. API calls, asynchronous, we're all over the place. The whole thing is that I didn't... It's a relatively small amount of code change for what I've done. I mean, it's relatively small amount of code, but the whole thing is to allow your code to run in parallel, that way you can actually achieve things faster without waiting for some server in the middle of nowhere or Sydney to finish or get rained out. That was great fun. They should really invest in umbrellas down in Sydney or AWS outage. So now what? It's actually really easy to do a couple of things. So rate limiting API calls is ridiculously easy. I wrote five lines of code. If you want to do stuff like that, you just basically bound in 7.4. Bound in 7.4 limits the number of things that will enter this block here. Nice and easy. Other things like, if you wanted to say... Say you've got a list of issues on GitHub that you want to download, you're obviously at the moment probably going through first page, second page, third page, fourth page, and you think, okay, well, I can't really speed that up with async because I need to get the first page before I can get the second page. I need to get the second page before I can get the third page. There are a couple of tricks that I learned when I was doing some stuff with JavaScript, which is that they give you the last page as well. So you can work from the top going down, and once you've made that first request, the bottom going up, and half the time that you're actually getting this whole listing. That was quite interesting. I didn't have time to actually hack something out to do that, but maybe by the end of the conference I will. Yeah, so... What was the difference between those two things? Oh, well, okay. Also proper rate limiting of, say, only five requests every second. I kind of hacked something together. It's a little bit bad. It's up on code. So that's the end of my talk. I just want to thank a couple of people. Basically, every single one of these people has kept me alive. Literally, like, literally, kept me alive for the past two or three months. I've been through really tough times, and without them, I wouldn't be here. So thank you. Yeah, so any questions before I completely tear up? Hello? Yes, this appears to be on. When you tag a function with an async keyword, can you still apply Python decorators to such a function? Yes, still a function. So the decorator should be just applying def, but you can also, like, decorate a function. So you can decorate an async function with an async function if that makes sense. Oh, let me grab some code. Let me just change my theme so that people can see it. Oh, yeah. Sorry, I didn't mean to grab everyone. Basically, you're able to decorate async functions with async functions. I'll get out of that here. There we go. The question. Cool. Sorry. Let me just go back here. Thanks for the talk. A complete sort of newbie question. How does this stuff play inside a web framework where you've got a process running request? Does it kind of just work, or do they kind of need to be changed to handle asynchronous? So if we're talking about things like Django, not sure. If you're talking about something like... So AIO-HGP has its own server framework in-built, which basically tells everything asynchronously. So you can always... So if you're writing something new and you use the AIO-HGPE server framework, you can basically get the request and do a whole heap of async stuff whilst the request's happening. And you won't be blocking other requests from happening because they're also async. But as for adding it to something like a Django or an existing project, it's probably a bit harder. You might be able to do something if you had to say... You might be able to do something with, like, creating a brand new event loop just for that request if you wanted to basically... Well, you could use the event loop. You just have to manage it a bit more carefully. So if you... Let me just go back to this slide here. You could, in theory, have that be your request handler and basically have your async main function do the asynchronous handling of your request, like making seven API calls all at once. But it's a bit more infrastructure to just manage, I guess. Yeah. Any other questions? Oh, hey, a couple. There's one at the back. Hi. Thank you. A quick question about the async main loop. Does it have any relationship to the number of threads or cores on the machine? No, it's all single thread. It's one thread, one... Yeah, one Python process, one thread. But the whole thing is that it won't... If you're doing, for example, a HTTP request, the whole thing is it won't block the threads for that HTTP request. Other things can do other things. And that's kind of what the point of this is. You're able to spawn threads and you're able to call between async.io's event loop and the thread back again, which might be a thing to do for long-range tasks, possibly like HTML parsing, if it's getting a bit too much for the one thread. You might want to thread that in another thread. But no, it's all single thread. We'll make this the last question. Can we make it the second last and have this gentleman? This may be a silly question because I'm stuck in the Python 2 world, but does this supersede concurrent.futures? No. Concurrent.futures works with the multi-processing library, I believe, and that still works in Python 3. And there's a future... There's an async.io future, which kind of reflects that of the concurrent futures, but it's async.io based. It's kind of a bit separate from concurrent futures and multi-processing. One last question. Thank you. So this is the example you gave there. It is mostly blocking on I.O. requests. And traditionally for these sort of things, I guess a multi-threaded approach would work as well because, you know... So where would you see... How would you decide whether or not you would do something like this with multiple threads or using something like this which has a bit more different-looking code that one has to get used to first? Yeah. I don't really know what... So for me, the benefit with this is that I know exactly when my code is going to go off somewhere else, which means that I can use musical structures and all these sorts of things. So for example, I had some... I had some sample code that I'd written for the rate-limiting thing. Really? Well, okay. Let's see if we can find a different theme. Okay, hold on. Hey! Is that better? Okay, fantastic. Well, whoever's going to see this talk next, he's going to have a great experience. Thank you all for playtesting. So here, I'm basically... I'm mutating this global list, but the thing is that I always know that nothing else is going to be mutating that list. That list is not going to change or has no chance of changing until we hit one of these await keywords or an async with keyword, something like that. So the whole thing is a bit more easy to reason about, in my opinion. So I know that this function, once it reaches, say, here, is going to always run through to here without any other code touching anything. I have control, and I'm... This is what the entire Python... Well, this entire Python thread is doing. Yeah, that's one of the benefits I see. Plus, you don't have to manage threads if that's something that is easy or hard for you. I haven't done a lot of thread in the Python. Yeah, cool. All right, great. Thanks, everybody. The next talk will be in about 20 minutes. At least I'm... Whoo! I made it!