 found of Python internals, storage, and file system. Please welcome Mr. Chetan and Vishal. Thanks, Shiva. I think we'll have to bear it with this kind of distraction in between. So let me start this talk by introducing ourselves. Here we are once again. So the topic of our discussion today is async programming and Python. So based on your experience, we have an understanding that async programming requires a slightly different programming mindset. And the main objective of this discussion today is to understand what's at the crux of async programming, what are the different async programming constructs, and how they relate or gel with Python. We are going to talk about a couple of modules of Python which support async programming. And since it's a slightly vast or an exhaustive topic, we would prefer this to have an open discussion on this. So let's start with some basics. So any programming task can be either IO bound or a CPO bound. So when I say IO bound, it can be waiting for a raw input, or a file IO operation, or a network operation, or making an HTTP call. It's pretty simple and clear. CPO bound task can be humongous calculation in memory task, like calculating factorial of 1 million, or if somebody who is familiar with video processing, video transcoding is a very CPO intensive task. But what's up with this IO tasks? So I'm really sure when you do a IO task, I can say that this task can be completed in X amount of time. We are never sure about that. What if that particular raw input that we're looking for never comes in? What if the bandwidth of our computer is less that the HTTP call takes too long a time? So we are kind of not sure. And let's take, understand this concept slightly better with an example. So let's say I import a very famous module called Request. And I make a get call on Google.co.org. And let's say hypothetically, my bandwidth available to me is one byte per second. So won't that take too long a time? Yes. And if that's my only task in the web app that I have built, God forbid, I don't know what will happen to that. I think only way of looking at it is I can go ahead and have a holiday and come back. And then the task will be done. But can we think of a model where we can do certain things in a concurrent way? While one IO task is being done, can I do go ahead and do any other IO task? Obviously, your application, web application, or any other program that you build will not have only one blocking IO task. It can have multiple IO tasks. So while one is happening, can we do something better in that time frame? And or can we run any other IO task in that time frame concurrently? It's the third process. And this is what is called as non-blocking or a sync programming. So as per theory, non-blocking is nothing but essentially making the ability to make continuous progress. I mean, let's not stall. Let's not get blocked. Let's make progress continuously. That kind of attitude. And with the blocking task, let's say the example that we took in the previous slide, all my resources, CPU, memory, are getting blocked on a particular IO task. So there is a monopoly kind of mechanism that is created here. So non-blocking paradigm forces me not to do that. Also, when we take care of these things, it gives us better, I mean, lower latency or can give me a higher throughput. Or if it is a web application that I'm developing, it will help me build more responsive web apps. So now I think it's a good time to know what are the three different programming models you are targeting here. So first of all, synchronous model. Very easy to understand. I have three tasks. One is the task one is done. Only then will I move to task two. And the task two is done. Only then I'll move to task three. So it happens in a serial way. And the total time taken would be the summation of individual time taken by each of these tasks. Threaded model, I try to create multiple threads and try to get these things done in a parallel way. Of course, in Python, we have certain issues there. But this is more of a programming understanding rather than relating to Python. And asynchronous model, why not interleave concurrent tasks and get the desired availability of, I mean, capability of executing parallel tasks. So just to take an example, in a mathematical way, I'm making a HTTP call to this domain. And let's say this is the same task, and I call it task one, task two, task three. So in asynchronous model, the total time taken would be 1.2 second. In a threaded and in an asynchronous model, the total time taken would be slightly better. I mean slightly lower. So what's the magic here? I mean, what's happening here? So let's go a bit detailed in this. So any web framework that you have worked on, maybe Twisted or Tornado or any Async model that you have worked on, essentially works on this paradigm or this fundamental. So here, this event-driven web server is a Tornado or a Twisted web server. And whenever there is a request from the client, this event-web server is responsible for handling this request and sending it back to the IO loop. IO loop is responsible for handling the event and sending back to the event handler that it has been designed for. So if you go one level advanced, so IO loop is nothing but a design pattern in itself. It's based on the reactor design pattern. So the reactor pattern tells that it's used in every non-blocking framework and works on a philosophy of a single threaded loop. So this is nothing but a single threaded loop which is waiting on events. And as soon as we have the events being come up, we handle them with the default event handlers, as we talked in the previous slide. But who is monitoring for events and where does it happen? So let's say I develop a web application using Tornado web framework or Twisted web framework and in my application, I am making certain IO calls, four to five IO calls. And for every IO call that I make, I create a socket and a file descriptor is associated with each socket. So when I develop my application, I tell the framework, Tornado or any web framework. This is the file descriptor that I have. So why don't you monitor these file descriptors for me? So Tornado in essential will maintain a list of file descriptors, events to monitor and the associated event handlers for each of these events. But who is exactly monitoring this event or monitoring the FDs for event? That is a kernel level construct. So there are different libraries in different, I mean operating system flavors. So KQ is used in I think Mac OS and select is using in Linux. So these are some kernel level libraries that provide event notification in non-blocking way. And whatever the events exactly, so event, ePoll watches for file descriptors and returns, the required events which are read, write and error. So now that we are kind of good in understanding what's the async programming way and what lies at the crux of it, let us now associate this understanding to the some constructs that are available in Python with Vishal. Thank you Chathin. So now we know basics of async IO. So async way we can summarize as, it is good for applications where you have a lot of IO tasks that are happening and those IO tasks would be running in an interleaved manner. All of those tasks are supposed to be independent. If they are not independent, we cannot run them concurrently. So we have essentially a single thread of execution where all the tasks are run one by one and they are suspended and resumed periodically by the event loop as we learned in the reactor pattern. So benefits of async IO is, you don't need more resources and the program is quite simple. Then your IO task won't block, even if it is blocked, and the whole application doesn't stall at all. So things are progressing all the time. So food for thought for all of you with async IO, can we solve one of the perennial problem in Python called Jill? Let's keep thinking about it. So let's understand how async IO programming is available in Python. So Python has web framework as well as programming module for applications. So we will start discussion with async IO module. So async IO is part of standard Python distribution now. It has been released latest in, I think, early this year. And applications are backward in incompatible with async IO because it's available only for Python 3.4 onwards. So async IO is a completely new way to write your Python applications. You need to think things from scratch. You need to learn what are the constructs available in async IO module and how to write your application using those constructs. So async IO is essentially the same as we learned in the async paradigm. It gives you a single thread of execution. That thread would be handling multiple events. Each event is actually a task. So to run those tasks, it uses cooperative scheduling. So it's a fancy term for just suspending something in progress. So if you find that a task is installing the application, it will suspend itself. And when the event is done, it will be again resumed. So in the meanwhile, we can run another task. So the calls like select EPOL, all these calls are wrapped in a library called LibEvent. So this library is common for Windows and Linux. And the underlying implementation is different for both operating systems. So async IO internals, so these are the major components of async IO module. We will understand them one by one. So let's start with event loop. So as we know that it's a single thread. So this thread would be responsible for running your application. So the major responsibilities of event loop in async IO is to schedule a task. You can register callbacks. It can handle signals. It can allow creation of transport and connections. As well as it provides an interface to thread pool in case you know that a task is going to take a lot of time. So you don't want to run that task in your event loop. You can offload that task to a thread pool. Next is a coroutine. So we all know a normal function in Python. So a function is just a bunch of lines of code that runs serially one by one. Coroutines are special functions wherein you can suspend execution of your function at will and resume it as soon as the particular event is done. So say in the beginning we had an example wherein we are reading a data from a socket and if that call takes a lot of time, in that case we can suspend the execution, allow some other tasks to run and when the data is available we can resume the execution of our function. So coroutines are decorated by asyncio.coroutine that's a keyword and in asyncio coroutines would essentially have a yield from statement. So we'll learn about it in an example. So third important construct is a task. So coroutines are run by a task. They're wrapped in a task and then put in the event loop. Event loop will schedule the task and if a coroutine suspends, so task will allow another task to run and as soon as the coroutine resumes it will pull the task in the event loop. So actually in asyncio the event loop employs two queues. So these queues correspond to two different category of tasks. So at any point of time of your application while it's executing you have tasks that are either ready to run or that are blocked on some event. So tasks that can be run immediately are kept in the ready queue and if we find that a task is about to block then it will be moved to a suspended task queue. So if you want to create a task in asyncio you just have to call either the async function or you can call a create task function. It has been added recently in Python 3.4. Now since IO is essentially an operation for which the result is available in future. So futures are unresolved references wherein we don't know when the result will be available. It could be a value, it could be an exception, anything. So here we have an example. So we create an event loop with get event loop call and we create a future object. asyncio.async will here you can see. So asyncio.async will create a task that task is actually running a coroutine called slow operation and we have a yield from the statement here. So as soon as and the future object is called as an argument to the coroutine and we see that future.add done. So as soon as the future is available we'll call a callback called court result. Now let's go to the coroutine slow operation here. As soon as the execution of the coroutine starts we sleep. So whenever as soon as we sleep the coroutine is suspended after a second it is resumed and the future is set. So when the future is set we call the callback court result. So we print the result and we stop the loop. So this is a very basic example of asyncio. It almost uses all the constructs we have learned so far. There are more constructs called transport and protocols. So the majority of your work will be done in these constructs. So transports are a way to connect. So you have sockets, you have pipes, you have SSL connections and protocols are just applications. So asyncio gives you wrappers which are non-blocking for various type of calls like socket creation, reading, writing from the socket. Let's understand asyncio in detail with one more example. Here we see we create one event loop and we create two tasks. So these two tasks are using the same coroutine. Here we see there are three tasks which are three blocking calls. So these calls are supposed to run serially one by one. But as soon as we create a task one with channel one and the execution of the coroutine starts, the yield from statement in the beginning of the coroutine suspended. So while it is suspended another task can begin and it will start its execution. So whenever the connection is created, the coroutine is resumed and it will move to the next statement. The best way to understand yield from is to just ignore them. So that's what Guido has told. So you don't have to look at them, just ignore them and just believe that it is running synchronously one by one. So one more thing to add here is like even though these two tasks are asynchronous, but we need the operations that are under my subscriber. Those are synchronous because the first one should complete only then the second would make more sense. But in itself these operations are again async. So that is one important thing to note here in this case. So we have one more example. So we create an event loop. So event loop is a must for any async IO application. So in this loop we create three tasks, which is a combination of IO and CPU tasks. So all three tasks would be running currently. So good thing with async IO is you don't need a very high-fi machine to run your application even with a single core and without more memory you can run your application efficiently. So as soon as all the tasks are done, we will wait till all of them are complete and as they're complete we'll close the loop. So this is an interaction with async IO module in Python. Now I would hand over to Chetan to help us understand the web framework which is async and it's called Panado. So Chetan. Thanks Vishal. So all the concepts that we learned till now, like in async paradigms or with async IO, all the same are at the heart of any asynchronous library. So if you learn about the basics, you can implement it anywhere. That's the philosophy that we have. So a similar thing can be done for tornado as well. So the event loop that we started with is implemented in tornado with IO loop class. The coroutines are implemented with gen.coroutine classes and future is implemented with the concurrent.future and again as Vishal mentioned, the transport and the protocols that we have in async IO. Similarly, similar constructs we have in tornado as well and those are implemented in higher stream. So let us take a very basic example of tornado async. So here in this case, we create a object called HTTP client which is nothing but async HTTP client library of tornado and we make a call to this domain and we print something. So whenever the event loop starts, right, it will come up and first see that there is some async call happening. So it will just handle it to the event, give it to the event handler, all the callback called handle in the score request and prints the before event loop starts. So the output here would be interesting to understand. So it prints before event loop starts. That's because the IO task is made asynchronous and only when the result is available will the callback take the event and go ahead and print the success and the response body or error, whatever the case may be. So again, our core routine is implemented in tornado. So in async IO, we use the decorator at async IO.coroutine. Similar decorator we have in tornado which is tornado.gen.coroutine. And one thing to note here would be is instead of yield from, we are using yield keyword. So yeah, so this is the yield. So in async IO we had yield from. So that returns a generator. Here in this case it results of one of the values of the generator. So and the response variable here is a future. So only when the response from google.com is available will that value be filled in the response variable. So it's again similar to what we have in async IO. And as we shall also give an example of creating our own task. In tornado also we can create our own task. Again the same paradigm is just that we wrap it under another decorator gen.engine and we create a synchronous task with gen.task. So we can also create our own future slightly philosophical how can create our future. So the server side code is similar to what we had seen two slides behind. So we again take async sttp object and then make a fetch call to google.com. But in this case async sttp class is written by our own in our own language, which is myfuture.py. And we wrap it around decorator return underscore future. So it prints essentially in myfetch and then returns your callback. So callback here is nothing but a function test which will print in test and the argument. So argument here would be the update time stamp that gets printed. So it's nothing I mean difficult to understand here. It's the same philosophy is just that in async IO do it in a slightly different way or syntax is different in tornado you do it in a different way. But the underlying philosophy is same. That's what is the most important thing for us to understand here. So we did some performance tests as well. So it is no core routine or anything here. It's a simple program. So the blocking class takes a sttp request which is a synchronous library call and makes a call to slash work URL and async makes a async sttp object and makes a call to slash work. But what is slash work? Slash work is nothing but construct that is added to IO loop and it waits for 0.5 seconds. So here are some of the results that we got through the Apache Bench test mark. So we found out that async paradigm helps you relate to more requests per second and the time taken per request is also lower. So those are some of the results that we got. So let's conclude this session with the learnings that we have. So we both believe, Vishal and I, we believe that async program is easy to understand maybe the mindset that we have to build is slightly different. It's not with semi-similar to the conventional programming approach. Python has a new model async IO which is quite extensive, can be easily utilized. And yeah, if you use async, you get good performance, improvements, responsive web applications can be built easily. The recommendations that we have is, of course, async is not a hermetic kind of approach. So it's not a holistic solution to all problems. You need to understand and use these philosophies. And you can also look at other paradigms such as multi-threading or event lists. Or there are some new languages that have come up like Go and Scala that provide for concurrency modules. So yeah, with this, we'd like to conclude the talk. There's some references that we used for the talk and you can contact us here. Thanks. Okay, so let's have some discussion if you guys have. So if you've spoken about a couple of frameworks that's Toredo and Twisted, are used just extensively, but I don't know what to do. The difference between Toredo and Twisted? So that's the whole, yeah. So just to repeat the question, what's the difference between Toredo and Twisted in an async context? So as we understand that we haven't twisted extensively, we have used Toredo, and somebody would have used Toredo, I mean Twisted extensively. But our philosophy of this talk is if you understand the basics, like what are the different paradigms that we are working on, it's not difficult for us to learn the other one as well. Yeah, so. Yeah, so async programming, as we said, requires a different programming mindset. So that happens, it's true for Node.js as well, it's true for Python as well. So if you go call back, ender, call back, ender, call back, that way, you get complicated in your own thing. But the thinking should be clear and design should be clear. I think these problems are best solved in the design mode itself. Else you end up using promises in Node.js and all. Great. So to repeat the question we have two concerns here, chaining of coroutines as I understand. And then the second question was call from thread related to the CPU operations. So Async IO programming does provide a certain chaining of callbacks as you can say. So the example that we had published with the subscriber thing. So as I told like even though the task may be asynchronous, all these three operations are performed in synchronous way but individually these operations are asynchronous. So we can go ahead and create, go inside and inside and create chain of callbacks as well. So that is available and in relation to CPU, how is CPU handled in the event loop? I think what was your question? You can use a thread pool for that. See when you are running a CPU task, why would you want to yield? I mean the Async IO is primarily the use case, yes it's only for IO driven application. You can. We are not aware of it then. Any other question? Oh yeah please. So what is the provision to do for based on Python version, is there anything we have so that we can avoid the problem. Yes for earlier Python versions you can either use Async core. So there is another module. So actually Async IO is a refinement of older modules like twisted eventlets, Async core, Async chat. So if you want to do Async operations for old Python versions, you can try any of them. Did I answer your question? Also let's say you are a question of database, right? So let's say if framework like tornado, we have certain Async libraries for database operation. So that can be leveraged. So in this case if you want to interact with MongoDB and tornado, so MongoDB has a module called as motor. So motor can be integrated with tornado and then you make everything Async. So that's another solution that is available. So you need not go to Async IO for that or you need not go to Python 3 for that. You can still do all these things in Python 2.7. Okay. Thank you. Thanks. Thanks a lot guys. Thank you very much for your time.