 Hi everyone. I'm Jonas. I'm originally from Switzerland, however the last four and a half years I've been living in Tokyo Japan and I'm working for a company called HDE in Japan and we are hiring like everybody else so if you're looking for a job or if you're a university student we also have a fantastic internship program where we will pay your flight to and from Japan and you will have a paid internship for six to eight weeks with us. If you're interested in any of this please feel free to come talk to me or check out our website. Just quick links where to find me if you have any complaints or feedback after the talk. I'm og.ch on Twitter. I'm og on GitHub and the company website is hd.co.jp slash en. So yeah my talk is called why you might want to go async but thinking about that it's maybe not a great talk title because before I can explain you why you should go async I should probably explain what I mean when I say async. So who here in the room understands what I think they understand what what I mean when I say async or who has used async. Okay most people so you can probably leave and go to another talk. No okay so yeah async is short for asynchronous and more specifically when I say async I mean asynchronous I.O. and especially in this talk I will be focusing on asynchronous networking and so reading and writing to a socket doing HTTP requests responding to HTTP requests doing database queries that kind of stuff and it's the opposite of synchronous or blocking code. I'm probably the best way to illustrate syncing versus asynchronous versus asynchronous is to just give some example of libraries on either side so the probably most famous synchronous library in the Python world right now is requests which is a HTTP client library that probably most of you are familiar with it's synchronous as is Django which is a web framework Flask also web framework all of these are blocking and synchronous and there's Postgres libraries class equal libraries and libraries to talk to AWS when actually most libraries you will find on PyPI will usually by default be in a synchronous and blocking mode of operation. So what are some asynchronous libraries that we have? There's Twisted which has been around since about the Bronze age. It's been yeah it's been around forever and it's a very very good library if you want to do asynchronous networking in Python. They have support for every protocol under the sun and there's also Tornado which is similar to Twisted and asynchronous networking library but more focused on HTTP servers and HTTP clients and since Python 3.4 we have async.io I think called async.io in the standard library which allows you to do asynchronous networking right out of the standard library and there's also Seneca which is a Flask like and there was a talk about that yesterday as well as it's a web server web framework but it's asynchronous. There's also Curio which is an alternative to async.io. There's AIO PG which is an extremely fast Postgres client. AIO BoDocore if you talk to AWS services and since I'm giving a talk I might as well just shamelessly plug some stuff that I wrote and there's a thing called AAPNS which is asynchronous Apple push notification service client library and Arsenic which is an asynchronous web driver client. If you're not familiar with web driver that's you might be familiar with Selenium so it's like Selenium but async and in asynchronous I.O. there's a couple of fundamental core concept and at the very, very core there's this thing called the event loop. The event loop is a thing that will run all your other code and it will schedule all your functions to run in the correct time ideally and it will handle the I.O. that is coming in and that needs to go out and making sure that your stuff gets called whenever it's possible and the other fundamental block of asynchronous I.O. code these days is a thing called coroutines. Previously when we did asynchronous code in Python we did just callback hell so you write callbacks for everything. These days we can use coroutines and the coroutine is a function which allows you to do cooperative multitasking and that's where the name comes from it's cooperative subroutine and a coroutine is basically a Python function but one that is aware of asynchronous networking. So in general in Async I.O. what we try to do is to have I.O. so anything that reads and writes to sockets or the network to be non-blocking and however your logic by logic I basically mean if those two statements in Python are still blocking. So yeah. After this short crash course of what Async is or what I mean when I say Async why do you might want to go Async the answer is actually simple and I will just give it to you it's money. You can save a lot of money by moving from synchronous network servers or synchronous clients to asynchronous servers or clients. But again, the title why you might want to go Async is probably a really bad title, and the better title might be, why does going async save you money? So a very common mantra you hear people complain about Python is that Python is slow. And one of the reasons why people say that, especially when they compare it to something like Node.js or Golang, et cetera, is that in our default world of synchronous web servers such as Django, a single process or threat can only ever handle one request at a time. So that's obviously not enough. We cannot just have our app handle one request at a time. So what we generally did in the past or even in the present is we just spawn up more threats, we spawn up more processes, we spawn up more servers to handle more than one request at a time. And doing this, we waste a lot of resources, primarily CPU, but also a lot of memory because all these extra processes, yeah, they need extra memory overhead to run. And a thing to realize in where you might wanna go async is that most of the time you spend when you write a web server or if you write a web crawler or something like that, is that most of the time you're not actually spending doing Python code, you're actually just waiting for somebody else to give you data. This might be your database, this might be a third party service, this might be the website you're trying to download, this might be the client trying to upload a GIF of a cat. So actually surprisingly little time is wasted in your Python code. Your template render engine is usually not the bottleneck or where you waste most of your time. So yeah, this whole mantra of Python is slow. Again, it's probably not entirely true or at least not very honest. I would rather say Python is inefficient. But that's also not true because Python isn't inefficient. Synchronous Python is. If you're writing synchronous web servers, synchronous web clients, synchronous database clients, you're wasting a lot of time just waiting for nothing. So let's have asynchronous Python come to our rescue. And I will give an example here. I've been talking a lot about web servers and I will continue to talk a lot about web servers because I believe everybody understands roughly how the web works. But all of this also applies to databases, database clients, servers, what not. But yeah, in this example I will implement a little handler that handles a post request. It will read the post data the client sends us and save it to a database and then return some identifier as JSON to the client. So let's implement this in a synchronous web framework. I hope you can sort of read it. I don't know if it's big enough. But yeah, this is not a real web framework, it's just how you might do it. So you have a handler, it takes a request, it returns a response. On the first line, it reads the post data from the client. Second line, it saves it to the database. And then the last line, it just returns the response. So what happens if we have two requests coming in? And they're on the right side in the comments. And so yeah, the first request starts handling. It reads the post data. It saves it to the database. The second request is still waiting on the top, cannot do anything. First request is starting to write the response. Second request is still waiting patiently at the beginning. And yeah, the first request is finally done. And the second request can start being handled and it will also go one step by step through this. And this is a very simple example, but I hope that you can sort of understand where the fundamental problem of synchronous web servers lies. So let's rewrite this amazing web handler in an asynchronous fashion. And it might look something like this. And the key knight among you will have noticed this crazy thing that I do no longer define my function which is def handler, I give it this crazy keyword async. And also we don't return response anymore. Now the async keyword is what defines a coroutine function in Python. Prior to Python 3.5, there was a decorator to do that and now we have this nice keyword to make it easier. And an important thing to understand about the async keyword is that the async keyword doesn't make your function asynchronous. The async keyword allows your function to potentially be asynchronous. Because if we look at the second line, you see this again, this crazy new keyword that you might not yet be familiar with await. The await keyword is actually what makes your function asynchronous. And in the previous example, when we, in the synchronous example, when we got the request, we choose to access the post data on an attribute on the request object. But here we call this get post data function. And the reason for that is that in a synchronous web server traditionally, if a request comes, we read the whole request, pack it into a Python object, and hand it over to a request handler. But an HTTP request is usually compromised of a request line, which indicates the HTTP method and the path followed by a bunch of headers and then potentially a request body. And the request body might be fairly large. It might be a video, might again be a gift of a dog or a cat. And so in the asynchronous world, usually we only read the request line and the headers. So at that point, the request handler can already do a lot of checking. It can check if the path we're trying to access even exists. If not, it can just stop the request and not waste any time reading all that data from the client. Or it might check cookies or authorization headers to see if the client is even allowed to do what they wanna do. But the change, but what that means is that now, instead of just accessing the post data, we have to await the post data. And this is where the async magic comes in because while we await this data to arrive, our event loop, our asynchronous framework can go handle other requests. They can call other Python code. And when we eventually get all the data from the client, it will return to our function and assign it to data and we will continue. And then in a second line, we do basically the same thing. When we write to the database, we write to the database and then we have to wait for the database to respond with an, yes, I saved this item to the database. And again, because of the await keyword, while we wait for the database to respond or finish responding, we can go handle other requests and do other things. And lastly, rather than just returning a response object, what usually happens in async is you create a response object somehow from a request with the HTTP status and the headers and then you write to it. And you can write to it multiple times and then you eventually finish it. Now, in our simple example, we don't really need that we could just return a response object because it's so simple and small, but imagine you're generating a CSV file on the fly and you might query a database multiple times to fill the data. And in that case, you could do one query and then write that to the CSV and write it directly to the client and asynchronously keep filling data from your data store into the CSV and then finish it. And this way, the response will be a lot more responsive on the client side. So again, let's see what happens if we have two requests coming in. The first request will start getting to the read post data and then while it's waiting, actually the second request will also start awaiting post data. And then they will, maybe the second request was faster because it has a better internet connection or the first request uploaded a huge amount of data, the second request used to tiny a little bit. So they can actually get out of sync and but they can just step through kind of in parallel or together and be done with it. So why does this work? Again, it works despite us still having only one process, we only have one thread. But while one request is waiting for data to arrive or data to be written and returned, the other requests can be handled by our system. And if that whole code was confusing, let's just give an analogy of a restaurant. Imagine you open a restaurant here on the beach in Rimini and a customer comes in and they want a five course meal. And you say, no problem, sir, sit down and you start serving them the meal. And now a second customer comes and they want a coffee and you say, no, sir, or no, man. You have to wait till this other person finishes their five course meal before we will give you your coffee. Now this is a problem because now you have all these people waiting outside your restaurant to be served. So obviously what you're gonna do is you open more restaurants so you can still serve one customer at a time. That's what we do in Chango and Flask. And an alternative, and I know this is crazy, but an alternative would be to have more than one table and if a customer comes in, you seat them at the table and then you have your event loop or your waiters go to them and ask them what they want, send them to your database or your kitchen, I'm sorry, and they will serve them data at food. And then that way you can, your kitchen still doesn't have to be able to handle everything at once, but your customers can gradually get their stuff earlier. So let's bring all this theory a bit more to reality and a little bit of story time of where I work. My day-to-day job is building a single sign-on and access control system where people can log in to various services. So we need to handle a lot of login requests primarily on Monday morning when everybody comes to the office and starts their computer. So we've been using Tornado for a while but we've not really been using it correctly for various reasons and it wasn't really async. So what we did is we converted our main handlers to async and we're still using Python 2.7 at this point. We're still using the normal Tornado system and we're actually making heavy use of threats here at this stage because we have a lot of legacy code that relies on legacy APIs, et cetera that would be hard to just upgrade to async in one go. But so we just basically move all our IO out into something called a threat pool executor and looked at it and actually just doing this we were able to reduce our number of servers we need on a Monday morning by 25%. Now some people might say 25% that's nothing. I mean it's not even an order of magnitude who cares but in our case it was actually several thousands of dollars a month and the very, very interesting thing here is that our business logic, the thing we actually care about, the part of our application that does interesting things or what we think is interesting things state the same. All that we had to change is those little bits of code that do IO. Now this is because we already were using Tornado if you're on Django this will be harder but still the interesting thing is that these days if you're doing asynchronous Python code your code will look more or less the same except every time you do IO you might have an await statement or something like that there. So let's get back to my talk title why you might wanna go async and maybe a better question than that would be why do you wanna go async now? Because async has been around in Python forever I've been doing twisted since basically when I started learning Python I've doubled with it a little bit and they've been doing async since forever. So why now? And the very big reason for that is basically Python 3.6.1. Async IO was added in the center library in 3.4 but you still needed to use decorators to define your coroutines. You had to use yield and yield from and all these kinds of things that just didn't really feel right. And then 3.5 added async and await keywords which meant you can now make your code look beautiful and 3.6 added a whole slate of new great tools such as async iterators, async generators and you can use have async for you can even have async list and dictionary comprehension and you can have async context managers. So just like in your synchronous code where you would use a with statement for a file you can use a with statement for a client session or a database session or something like that but in an asynchronous context. And because of this whole thing with Python 3.4, 3.5, 3.6 doing all these async stuff in the standard library what actually happened in the wider ecosystem is we had a little asynchronous revolution in Python. All these crazy projects started popping up left, right and center and making the asynchronous environment in Python a lot stronger. We have a project called, there's a project called UV loop which uses the same event loop that Node.js is built on which is very, very efficient and brings that to Python. So it's compatible with async.io. So if you're using async.io you can just switch it out and according to their benchmarks it's a lot faster. And there's a thing called AIOPG which I mentioned before which is a Postgres client library which again according to their benchmarks is faster than for example the Golang Postgres client which I don't know how they do it but apparently they do it and it's amazing. And they're Seneca which some of you might have seen the talk on yesterday which is a flask-like web framework and that is also incredibly fast and there's all these other AIO libraries out there these days and that really like you build great good stuff. And the greatest, probably the greatest thing about async.io being in the standard libraries that we now have a common way to do async in Python which means even though we are using Tornado thanks to async.io I'm now able to use async.io libraries, I'm able to use Tornado libraries, I'm able to use all these libraries together thanks to the standardization that async.io gave us. So to continue this little success story we had with our lock-in handlers from before what I did a few months ago is I upgraded our app from 2.7 to 3.6.1. The main thing I did for that is I replaced the Tornado event loop with async.io event loop. It's still Tornado, we still use the framework which is use a different loop. I was able to remove about 20 dependencies because they were no longer required on 3.6 because they fixed things that were fixed in three back ported stuff to two. And I only had to rewrite really two dependencies we had our APNs client library to talk to Apple push notification services and some Google APIs. Now the Google APIs run work on 3.6 but the upgrade path wasn't clear and obviously the Google client library is synchronous so it wasn't really interesting for us so I had to rewrite it. And we were able because of the switch to async.io main loop we were able to use some of the async.io libraries that are out there and use better and get rid of some of the threading we need used and use actual real async. The result three times speed up of our request handler. And I'm gonna be completely honest I'm not entirely sure why it is three times faster but I'll take it. So yeah to recap why you might wanna go async you wanna go async to save money primarily and you wanna go async now because it's ready and mature in Python and it's really nice and awesome to use. And please use for Python 3.6 it's not just great for async it's great for basically everything out there. And yeah and with that thank you very much for your attention. If you have any questions please feel free or hit me on Twitter. Thanks for your talks very interesting. In the current Ubuntu long term support version 16.04 I mean 16.04 the Python version is 3.5. Would you run that or would you install a 3.6 anyway and run that? I would assuming you're using that. I would install 3.6 that's what I do. We don't use Ubuntu we use a different we use Docker but we use a different base image and I just install Python 3.6 because yeah you can definitely use 3.5 to do async it's just your life as a developer you will be a lot happier on 3.6. And of course with Homebrew on the Mac you got 3.6 as well so then the environments are more similar. Yes. Hello thank you for the talk it was very interesting. Okay one question I noticed one thing I noticed myself when developing async apps in Python is that generally in async world file system support is not really good. Let's say you have to code the workarounds to make it work with file system in async apps maybe like dedicated to some thread pools for like file system storage and retrieval and stuff like that so how do you manage that? Yes so it's called async IO but one big part of the IO spectrum which is files is not actually included usually and that is because from what I understand and I'm no expert on that part but as far as I understand for example Linux kernel doesn't really or Linux file systems don't really do async IO at all so you have to find a workaround anyway and yeah the general workaround suggestion seems to be using thread pools. What I found is that in because we run a web server usually during a request we never have to hit the file system we need to hit the file system when we start the app a lot to load certificates to load settings files to load templates etc. So we just do that in the startup and take a little bit of a performance hit on startup have the startup be a little bit slower but then try to avoid to hit the file system. If you need to hit the file system threats are basically the only like the best options you have yeah. Thank you. Thank you Jonas. It doesn't always seem to be very easy to tell in advance what speed gains or other gains resource gains you might make by putting in the effort to go asynchronous. In fact you said that you weren't even sure why you were getting some of the gains you did. What strategy would you use to decide whether or not to put that effort in? So yeah so with the speed gains an important thing to realize here is that asynchronous async.io will not necessarily make your app faster. What it will make is make it more efficient. So a single request might be handled slower than if you were in a synchronous world because it has to share some of its time and resources with other requests. But if you have a hundred requests in general those hundred requests will be finished in an aggregate faster than if you did it in a synchronous world. So yeah that's one of the reasons why we were surprised to see such a speed up because we were expecting to just see a lot less CPU usage and a lot less memory usage but we didn't expect it to translate directly into a speed gain. And to your question on how you should choose if you wanna do async or not I would say always go async. Because these days it's not that expensive not that taxing on your mental thing if you're doing for your programming because it's getting a lot more convenient and easy to write and read async code. So I would actually say you probably always want to go async simply because of all the benefits you get from it. The only reason why you might not want to go async is if you are strongly bound to an existing framework or code base such as Django Flask whatever or if there's really no way to talk to the third party services that you need in an asynchronous fashion. I was just wondering how badly are you able to shoot yourself in the foot if you for instance would await a function that does logical computations instead of doing IO? Sorry if you're doing, if you're awaiting a function that doesn't do IO. That doesn't do IO like I was wondering like I'm still in the- Yes, yeah. So if you call, if you await a function that is not a coroutine it will just give you an error. If you await a function that is a coroutine but is actually doing heavy blocking computation of stuff that is really bad because it will block your whole event loop and there's an environment variable you can set primarily for development called Python async IO debug. If you set that to one it will actually in the console warn you if a single call to a coroutine takes too long. And so yeah, if you're doing computationally heavy stuff you still have to use the same tools and strategies that you would always use in Python which is either sending it to a different process or a different thread so you don't block your whole application. Okay, thanks. Any more questions? Okay, yeah, thank you.