 Good morning, everybody. I hope you enjoyed your afternoon tea. Our next speaker is Grant Patton Simpson, who is joining us all the way from sunny Auckland. Probably not as sunny as here, though, right? No, it's not. Anyway, without further ado, Grant. Right. At the last KiwiPiCon, who was at Christchurch last year? Yeah, quite a few. There was a talk about asynchronous coding, how important it was going to be for the Python community. Async is much faster. It's the future. We need to embrace it as a community. But there are some challenges. I mean, you've seen this probably for other uses, but there is quite a learning curve, and there's lots of different approaches you can take, and it can be quite sort of hard to know how to get started. And if your code's not working, you're not sure where it's gone wrong or how. It can be a bit confusing sometimes. Some people like Async. Some people like Async because they like the cutting edge. They want to blow their minds. For those in the back, what if we used to be able to make wishes, but then someone wished we couldn't? So they want to blow their minds with Async, and it certainly can meet the bill there. The basic attitude is, you know, Async, therefore I am. Apologies for the bad pun, but I had to stick that in somewhere. Others are just doing Async because they want... They're doing it for the speed. They want it nice and fast. So for those sorts of people, the priority is the shallow learning curve, minimal boilerplate, and they just want it to go fast. So this talk, if that's you, this talk is for you. By the way, that's Postman Pat's van, as you've never seen it before. Ripping along. Okay, so let's have a quick sort of refresh on how Async is faster. The problem is, why are computers so slow? Like, I'm running something over there, and it's just excruciatingly slow as far as I'm concerned. I've been with computers a long time, and they really, really, really ought to be a lot faster by now, given the CPUs. They're extremely fast. The science fiction as far as I'm concerned. But lots of programs still run really slowly. You know, what's that about? Why so slow? I'm running for much, much, much, et cetera, et cetera, et cetera. Much, much slower processes to complete. And they really are like that as far as it's concerned. For example, getting some data back from an API call. It's excruciatingly slow as far as the CPU is concerned. So here's something synchronous. You do step one, then you wait, and you do step two. And the CPU is sort of snoozing in the middle, waiting for the call to come back. And then finally, then the CPU gets to process it. So synchronous means doing one thing after the other, no matter how much spare capacity you've got. Async, and here we've just got three workers. You can sort of do some things at the same time, and then sort of pull it together at the end so it can finish much quicker. So here, all going well is some sort of demonstration of that. So you can see the async bit's already finished, and synchronous is still plotting along. And you can also sort of imagine that if we had more than three workers, we might get a greater speed up, and so on. It all depends on the nature of the job. Remember, this is not an advanced async talk, so just some basics. Is it time to do a total code rewrite? Sadly, there's no make everything async decorator, so it's not available. But it's not too hard to follow some simple patterns. Now, these patterns, as far as I've been able to do, have been tested and tried and so on, but there may still be some issues. If there are, please let me know. But certainly some of them I've used quite a lot. Okay, these patterns are based on futures. Now, Python 3 has introduced a whole lot of goodness. That's not available in Python 2. And so futures is a nice way of making async programming simpler. So what is a future? It's sort of like an IOU. You'll get the results sometime later. So it's something you can hold on to, and there'll be a result. So a future might be a promise to return a translation once it's complete. You hand over a job, and when it's ready, you get the translation. So the data, the results, the values will be known later. So we can actually hand these jobs or these futures to a function which will iterate through them as and when they get a result. So we could have something like this. Somehow we've got some jobs, one job that's just arbitrary name, but that seems quite sensible, and futures as completed, and we've fed in all the jobs, we get the result for the jobs as and when they're done. So that's sort of quite sort of synchronous at that point. And we often call futures jobs to avoid confusion. So we don't go, for example, from concurrent import futures and then talk about futures and adding jobs to them and so on. So we just have, I call them jobs. Often the examples do, but that's what they are. So what's happening there is a whole lot of jobs where the results will be known later. They end up in done jobs as and when they get a result, and then we can process those. And so over time, the jobs all get done and end up in the done jobs. Although that depends on the criteria you've set. You may have said stop when the first one succeeds. But if you want to see all of them, they'll eventually end up there on the right-hand side. Some assumptions for different patterns. Some important requirements. All the following patterns assume independent functions. So what do we mean by that? They can be run in any order. If the order matters, then async may not be what you want to do. Or at least not a simple form of async. The functions don't stomp on each other's toes. So what does that mean? It's very easy to stomp. Functions to stomp on each other's toes in very subtle ways. So manipulating the same variable, altering the same mutable object, for example dictionaries, lists, writing to the same file, or using a library which does something under the hood that you don't know about, which maybe they may be sort of colliding and so on. But if those assumptions are met that you're not having those bad things happening, then it can be quite simple. So let's look at some useful patterns. One pattern I sort of call results at the end. So here we have some, we start off at the top. We fire off three separate jobs in this case. Those are cogs for those at the back. And then at the end, we sort of gather all the results together when we've got all of them. And then we process them somehow we've got them. So that's how it works. So the assumptions there are that the functions can be run in any order, can be run simultaneously, all those things. Pure functions are the safest bit. Pure means no side effects at all, not even to the inputs. That's the absolutely safest way, but it may not matter. And you can wait till all the calls are complete before looking at the results for this particular pattern. So let's look at the parts. I'll sort of hide bits of the code and show bits of the code. But what it says there at the top is pretty straightforward. From concurrent import futures. And then down here there's just a small amount of boilerplate with futures. Thread pool executor. There's also a process pool executor that's relevant. 12 is the number of workers. If you're using the latest version, if you're using Python 3.5, then you don't need to specify the workers if you don't want to, which is one less thing to know about. And then what we're doing here is we're getting the results back using executor map. And there goes a function that takes one input and an iterable of single inputs goes there. And what happens is we don't proceed beyond that point until we've got the results. And at that point we can actually look at the results. So that's a very simple bit of boilerplate. Very easy to memorize. I first used that when I had 5,000 API calls to make. And I got them all done in five seconds. I think eight seconds. And then I was able to just crunch through the results and I thought, this is worth doing. Just a couple of lines of code and it was really fast. Okay, so what are the parts? Again, we've got an iterable... Pardon me. Got a cold. We've got an iterable of single inputs there at the top. I've just made it very simple. It's just 1 to 10. And that's what goes into the map function over here. That's your iterable. So that's pretty straightforward. In this example, I've got my function is getData. And it takes one input. That's an ID. Does some bogus this or that. Does an API call using requests. And so that's what we feed into this part. So I usually start, I write this bit and then I sort of backfill what I need. Depends how you're doing it. And then, interesting thing about doing it using executor map is it actually returns things in the same order that the iterable supplies them. So if your numbers are 1, 2, 3, 4, whatever, that's how you get the results. That's not how the results are actually conducted, but that's how they're returned to you at the end. And that's showing the whole thing all at once. And this is just to demonstrate that that's the results coming back as they actually did in a run I did. It'd be a different order if I did it again. But that's how the results actually get returned once you've exited the, once the map has finished. Which could be useful. You might want them in a particular order. Another pattern might be results as we go. So what's happening here is we fire all these jobs off and as and when they come in, we want to know about them. So once again, you've got some assumptions. You want to run an independent function multiple times. You want to process the results as they come back. They could be in any order. Okay, I better talk about submitting jobs and as completed. Some things there you've got, what I've got there is, I've called them comp job. Same thing as done job, completed job. Call them what you like. But once again, they're iterated through as and when they complete. So, and another thing you might do is up here, jobs, you might use a list comprehension to sort of populate it where you actually submit the function and the individual num for num and nums. It's reasonably common to see people doing that as well. So, just saves a few lines. Yeah, and you only exit the context manager. So it's a common, you don't have to use a context manager, but the guarantee you get with the context manager is when you only exit that, when all the pending futures are completed. Which is quite useful sometimes in your code, you can rely on things. Okay, prerequisites. It might be that you want these things to all happen first, it doesn't matter which order. Once they're all done, then this next thing can happen and then you might process some results. So here's an example from real life sort of. You have to empty the bin, do the recycling, and do your homework before you play on the tablet. Yeah, easier said than done. Reality may vary. But that's the basic idea. You have some tasks. I'm firing off things in parallel all the time, but they just hardly ever get done. They're all failed, cancelled. Oh, there's four. Four workers. Yeah, that's right. So with the chain steps approach, you have a number of different jobs which must be completed before another job can occur. That's quite a common thing. But the prerequisite jobs can happen in any order. It doesn't matter when, or simultaneously, etc. And they're independent of each other. So what we've got here, as you'll notice that boilerplate at the top, you see it again and again, you sort of memorize it. There's not that much to it. For some reason, I've put in two workers, but it could be something else. Info job equals, sometimes people call it E instead of executor, which may seem a bit funny if you're used to using that for exceptions. But I'm just pointing out sometimes people do that. Anyway, so I've done it in this example here. We submit a job, we've got another job there, and down here we go futures.weight, and we tell it what jobs we're actually waiting for, in which case it's both of them, and we've got there all completed. So what we're saying is, those two jobs there, we only move past this line once both of them are done. So it's pretty straightforward. And then we might, so we've got there, and we might, yeah, once we get down beyond weight, we can guarantee there will be results because they have actually completed. That's what we specified, so we can actually get the results back. And then what we might do next is we fire off another job, the third job where we're going to process those details and we feed those completed results in as arguments. And then we might have something at the very end where we do some cleanup or whatever it is we want to do. Add done callback is sort of quite simple. That's one way of doing it, you just say run this function when it's all done. But you could just sit outside the context manager and run that then, because that will be, everything will be done by then. Or you could wait on the process job, I guess, for that to be completed and then do cleanup. So there's a number of different ways of doing it, but there's not an infinite number of ways of doing it, and that's one thing I like about this. You can learn a few things and do quite a lot asynchronously with it. Another approach might be you have multiple chain steps. So each of these is sort of in its own world, but they sort of, that one chugs along and when it's finished, then it can do the next step and then it can do the final step. So the example I've got here, you may recognize this person, but I made these slides quite a long time ago. Futures tell me more, we have two functions. One function is where we're getting an image and the second function sort of memifies it. We stick some text on it. Obviously you can't modify an image until you've gotten the image, but that's the only sort of order restriction that really applies. So it doesn't matter if I'm getting this picture and then I get that picture and then I finish that picture and then I finish that picture. It doesn't matter as long as internally there's a sequence. So there's some assumptions there. Each call is independent of any other call to get an image. Same is true of calls to modify images. The only guarantee of completion order is for the second part of the chain relative to the first. So otherwise the composite jobs can happen in any order. So the second part of one job could theoretically happen, well, will often happen before the first part of another. So this approach, the approach taken lets us chain together as many steps as we like and we could have used the add done callback method or approach instead in this particular case. Okay, so what we've got here is that's just, you know, just some, I've put random, I've made it sleep for a random amount of time so that I could sort of demonstrate the things were coming back in whatever order I wanted or not. That's just a verbose no op in the middle saying it's got the image and then it returns the image. So that's that step. And here we're adding a meme to the image and once again we're not really actually doing anything. We're just pretending we're doing it. And down here we've sort of gotten quite familiar with this already with Future's thread pool executor. 12 workers this time. I think I've had 128 and all sorts of things. So once again we've got a list comprehension here to populate jobs just using the executor submit which is pretty straightforward. Once again you've got a function and you can have any number of arguments you supply. They just come afterwards. So for completed job in Future's as completed then I grab the result and then I submit it to the next round where it's actually adding the meme to the image. And so obviously because the first round for that particular job has been completed I have the image I can now undo the second stage. Another approach we might have is we want to do lots of tasks until one of them succeeds. Sort of almost like a life philosophy. Go through here and we do this one and we want to cancel these ones because there's no point you might have 500 or 600 things you don't want to go through all of them. You've found something, it's all you needed so let's knock the other ones off. So sorry for the grey writing but it's just a repeat. You want to stop as soon as one of the functions finishes and you don't care which one. That's the new bit. So what have we got here? We're submitting the jobs, that's straightforward. We're getting back two things which is the sets and one of them is called done and one of them is called not done and in this case I've said futures first completed instead of all completed so what will happen is when we actually look at those two sets in done there will be one thing and in not done there will be all the others. If you change that to all completed the end you get done is all full and not done as empty so it sort of makes sense and the other one is you can also exit on first exception. So here we go, we'll have a done job there with something in it because we exited on the first success and then these ones are just still incomplete and how we get them out is because it's a set we just pop them out. Well in that case we pop out the only one which is in there and we can get its result. Yeah another thing, cancel. What happens is you might want to cancel all the jobs in the not done job group because you don't want them to complete so if we that early slope we had those little back bars that's where we were sort of not proceeding with them. Now there's some quirks there cancel what you can i.e. what's not already with a worker so if you fired something off to a worker it's in process that's going to do its thing but it's not going to end up in the not done results, you've already proceeded beyond that but you don't exit you don't get down to here until no running jobs nothing's still in process so that's what that context manager does guarantee for you. So the program doesn't finish just because a single job is finished even though you're only waiting until first completed you have to wait until all uncanceled jobs are complete to exit that lock. Okay adding more async to your code so those are some basic simple patterns use wisely obviously async is not always worth it like if your script only takes 11 seconds to run and you're not repeatedly running it all the time then maybe it's not a candidate for asynchronizing or whatever. Another thing is using it wisely is make it easy to reason about your async code so favor clean syntax and clear naming over sort of like abstruse awesomeness so sometimes there are things that if you understand how it works you can assume this will be done by this but there's another way of doing it where you make it explicit you know like as completed or whatever. I prefer to make it so that you don't have to be at your mental best to be confident you know what your code is doing or that someone else in your team understands what it's doing because async can go a bit strange this is a joke you know the variance of this joke some people when confronted with a problem think I know I'll use threads and then too they have that's Ned Batchelder so he's good value so actually he's got a collection of lots of those two problems variance but do use it wisely so don't hesitate to use async and python if it speeds your code up significantly give it a go have a try and you can cheat using pre-existing patterns that just work I hope these all just work but it's not far off if anything's wrong with any of them and putting async into the standard python toolkit it shouldn't just be for elite programmers or those who want to blow their minds the goal is to make it safe to put async code into production with teams of mixed ability in other words the real world and get it used a lot more pardon me you know so async all the things where it makes sense that is and a lot of people have helped or a few key people have helped and this is actually working code using async to thank these people randomly Venom and Lee Symes from Catalyst and concurrent dot futures whoever that represents for putting this into the standard library so if you've been doing async using other things it may be that this is something you can take for a spin there are lots of other approaches I don't know much about them but yeah okay so that's it alright I believe we have time for some questions if anybody has any yes but not all at the same time oh thanks for the talk I was wondering because you can set the number of workers would it make any sense for debugging reasons or in any scenario to just set it to one yeah I think the answer is probably yes I think it's good to do that sort of thing anyway just to sort of when you're playing with things to sort of get more of a feel of what's actually going on it improves your reasoning about how it works to sort of just one of the problems I have running over in that laptop is an asynchronous process that takes a long time to conclude and so by the time it's finished you forgot what it had started so it's good to sort of create some toy projects where you can just a few lines of code where you can experiment with it and make sure it's working like you expect rather than writing the code it's all serious a huge number of workers it can get really hard to sort of reason about if you're confident Does async use multiple cores if your machine features them and if not can you make it do that I think the answer is yes I talked about the thread pool executor versus the process pool executor I think the answer lies with them there I'm sort of like I'm sort of reaching the edges of what I understand myself that's why I thought I felt qualified to sort of make it a talk you know practical async you know for dummies yep and process pool executor fails miserably in idle as I discovered when I quickly thought I'll quickly demonstrate that as well and it just all sort of sent standard out into who knows where another process uh oh yep you mentioned in Python 3 I believe not setting the worker could you elaborate on that 3.5 I have to explicitly make some sort of assumption about I don't know how it works it out but it works out something sane and just uses that so yeah so it's one less thing to sort of have to decide on if you're just getting something going but I was sort of playing doing experiments where I was changing 128 and 12 and seeing you know whether things were faster or slower for one of the things I did so I don't know if there's one true way but yeah I hope there'll be more advanced people coming along giving talks on top of this sort of thing and we'll hear a lot more about async and conferences to come yeah the lighting of async code looks quite straight forward how is the debugging do you have any tips there haha not sure yet I've got a lightning analysis I'm doing for a lightning talk which I don't think is going to finish in time and I'm thinking about that exact question um what's that yeah this question is related to the previous question and perhaps the answer is another talk I've got 10 jobs and 5 of them died what do I do right you mean like do you spin them up again or right I'm just thinking if they've not sure you can tell if they're cancelled um if they've had an exception I'm not sure I haven't actually worked with that I'm not sure whether they'd end up in done or not done if it was in that part they wouldn't I presume they wouldn't end up as completed but I'm not sure to be honest I'd almost be tempted to quickly whip up something where I throw an exception and then see where it ends up I mean I don't have an actual answer to that because I haven't um I haven't had it happen but yeah yeah alright I think that's everyone cool um so next up starting in about 10 minutes we'll have the poster session so that is up in the Chester lounge I think it is that's up one flight of stairs not two so it's sort of in that middle floor there um that is an opportunity to look at all the posters up there talk to the people who've made those posters and just have a chat amongst yourselves for some time um and then we'll be starting lightning talks at 4.30 so if you could go to the main auditorium for that around then thank you very much and thank you to our speaker Grant