 Let's welcome Maurice Julien, he works as a back-end software engineer at Yelp in Hamburg and will present us a talk about asynchronous network requests in web applications. Give him a warm welcome please. Thanks a lot for all of you being here. Asynchronous network requests in web application, but first I work for Yelp. It's an awesome company. We're connecting people with great local businesses that allows you to find what's interesting around you, restaurants, hotels, cinema, everything. We have 90 million monthly active users just for mobile. We have over 102 million reviews in total. 70% of our traffic search comes from mobile, so we're a very mobile oriented company and we're launched in 32 countries including Spain. What is this talk about? First I'm going to introduce a bit why did I do this talk, what's problematic and what's actually not so easy about wanting to do asynchronous network in web application. We'll do a quick reminder about what's a deployment server, what's pre-fault model and we'll dig into what's the meat of this talk. Some coding examples, some different ways to do this. I might scheme over many details and the goal is to have an overview of different ideas on how you can do a synchronous network request in web applications. That's business on the app, that's what I'm working on and we have a service which is publicly available and from a very high level people do request on your services, but that's a very, very high level view and that's not really what it looks like. You don't serve on only one server, what many people do is they have an SOA and at first it's fine, you call your session service, you call your business service, you call your user service and it's horribly slow and it gets slower and slower as much as you have services and the user are just waiting in front of their phone which you don't really want, you want answers. So what you might want to do if you can is just call all of the services at the same time and that way they are not as your total time spending for your public service endpoint is not the sum of all the other services you have to call but just, hey, what's the slowest and that's going to be your timings. We're all doing Python, Standard Library contains most of the tool you need for everything and there is a very nice thing which is the thread pool executor. I've taken copy pasted things from the documentation, the example is exactly about doing asynchronous network requests. It's very nice, it's what you should be doing, you could be using processes but hey, threads are very nice as I'm reading from the documentation. Thread pool executor is often used to overlap IO which is exactly what we want to do and there is also another mention which is it will default to a number of processes on the machine multiplied by five and that's kind of where things go into what some problems sort of arise which is we don't run just one Python program, we're running a web application and our goal is for a web application to run as much core as possible and as much power to serve as many requests as it possibly can and right now the thread pool executor is going to be concurrent with this and also it means that there is only a certain limit until which it can serve requests. Right now it's going to be a number of core machine multiplied by five which means if you want to do one more request it's not going to be executed completely in parallel so that ain't great. I'm going to do a small little thing about just to define what we call a web app. If you're running Tornado, Twisted, this kind of evented app or even something directly plugged into G event that's not exactly the position I'm going to take. You're going to learn a lot of things, don't leave the room but I'm going to focus on the WISGI app because that's what Django, Pyramid, Flasks app are fundamentally. It's fairly unlikely that you're deploying your WISGI app with WISGI's simple server or the HTTP simple server so we also need to choose a deployment option just for the sake of us talking about asynchronous network requests for today. I'm going to be using WISGI because it's very used in production there is a lot of options you can do almost everything and it's using the pre-fault model which is what's most efficient to run web servers. If you don't know what... Oh, that didn't start at the right place. Cool, so you have the master process, your worker process which is going to load your app and then going to fork to create the workers and then going to run your application in several pythreads. That's really, really nice because fork has a property of being copy on the right which means that you will save all the memory for these new processes you've spawned because they are all the same. We'll come back to that. Cool, and when you want to serve requests later you have something with a proxy or an HTTP worker using WISGI and your master process is reparteting requests to workers and to their threads. Cool, no one looks too badly injured right now. That's a simple... a simple solution to do what we wanted. It's completely synchronous. I'm not even using the thread pool. It's going to be our base of reflection for everything that's happening. You have the application which takes the environment and start response which is the basic of the WISGI standard. I'm using requests for the network calls because it's way easier that way. I have on my machine a small server another server running that is going to be... just you give it a time, it waits for that time and respond and that's a very nice simplification of long network calls. Let's run this. Great, so WISGI. That's the basic version. We're just going to run once. Nothing too complicated. Let's attach a top to it and let's curl our web server. This is supposed to last 1.25 seconds. It lasts a bit more because overhead and that's long. Very, very long. Right now, if I have another little script which just allows me to hammer the server and you will see that it only serves one request at a time, all of these requests are arriving in parallel. If we ask for three requests, it's going to wait for 4.5 seconds. It's going to be the exact sum of the thing which is what we don't want. What I will just show quickly is running server in another version. Oops, sorry. I had prepared many, but as I'm talking way too much already, we're just going to run what's called the mix with threads and processes parallelization. Basically, every time you're at your process, it should not add too much memory, but it still does. You're using more calls and threads-wise, you had a bit more memory because threads overhead, but with the gilts and everything runs so there is a limit to how much threads you can run in parallel. Right now, I'm using a VM for controlling this and actually two processes to thread is the maximum I can run because I made a very, very small move. Sorry, a very, very small one. Let's run this other one with a mix and you will see we can handle... Yeah, let's throw four requests at it and let's change the port. And, yeah, we waited a bit more than expected overhead, but still it's good and if we throw a fifth one at it, then we're going to wait an awful more longer because one of the requests is waiting until the other are available. Cool, so that's you whiskey in two minutes and something. Great, you can see that basically whether you had processes or threads, it parallelized in a very similar manner. A note I'm completely forgetting. It's so obvious. All the code I'm going to present is Python 3.5. Everything is made here with Python 3.5 in mind. I might tell you if things are available for all the Python version, but that's... Use the new Python, it's nice, it's good. Great, so that's very simple. Let's try to do something a bit nicer. Still, let's go into looking into the standard library where it is available. Hey, I think I have, nice. So that's, again, a very simple solution. I'm going to always do all my examples with two processes. There is a very good reason for that is as I was telling in the beginning, it's a fork-abusing server, so many things can be a bit different when you have... If you just run one, you can not see everything, some bugs appear, some strange thing happens, because the fork process happens and some things are loaded. So just to be really clear that it all works, two processes that's going to be the rule here. All the code is taken directly from the IOHTTP library for doing network calls. If you do not know how this works, there was a bunch of talk at this conference to know how to do so, and there is more again today. Let's go and see them. Let's run it. Cool. Let's scale that. You risky. You risky.dini. No, yes. Let's attach a stop to it, so we see how many threads, what's happening there. Sorry for muttering in French. And let's curl it just to prove that I actually did my job and prepared my slide, and the code is actually working as I advised. Yes, this only lasted half a second, plus overhead, and that's the length of the longest call in the pile of code we wanted to do. Victory. And we can see something a bit different as well, which is like all the calls started at the same time, and they all ended at the same time. They did happen in parallel. We are thrilled. If we try to hammer it again, and this time I will remember to change the port, and let's throw two requests at it. Yeah, everything is happening fine. If we throw four, we're waiting. Paralysation on the whiskey server level is completely transparent. There is not really a difference with what we were doing before, so this is fairly nice. You might have noticed, looking at each top, that the awful new number of threads appear, which we did not really expect, but it's a little bit different than before in any way. Let's look at what they are doing. Okay. I'm out of time. Believe me, they are doing DNS resolution. It's a little problem with AOHTP. It spawns a thread pool executor to actually undo the DNS resolution, and you need to be aware of it, because it could create a lot more threads for you. Just looking at the code again, there is one part, which is a bit more important than it seems, it's a sync.io.getEvent loop. It's not as transparent as it seems. What we did is we kind of changed the way our program run, and it was a very nice schematic in Twisted Network Programming Essential, so I just took it. Single-threaded is what we used to do in the single-app system, and what we're trying to do now is the event-driven column. So we are not waiting anymore. We are just piling up, piling up, and everything is happening inside the loop. We changed the way our program run, but this schematic is nice, but actually there is still a lot of holes into this. So that showed my performance didn't crash, so I don't need this. A little point. If you try to use the threading model, you will encounter problems, because getEvent loop just changed the main Python thread. You can go around this with a little trick where you just set the event loop if you fail to get the event loop, you can just recreate one, and that way you can run with thread, which we can just prove right now. So this time you see call2, which means we are back in the situation which represented initially two processes, two threads per process, and this time if I saw four requests that it was the right port. Yeah, it works. Awesome. We see that our extra thread from the DNS resolution, this is actually going to be a solved problem fairly soon as AIO DNS is coming in, and it's going to have rather than a thread pool executor using the event loop like the rest, but right now it's still not available on stable version. You need to use the master version of HTTP.io to do that. Cool. We did a lot of things, and as I started mentioning, we did change the way our app run, but maybe we could go a bit further than that, and just change also the way the whole server runs and have everything be core routines. Maybe it will be really nice, and we'll continue my bingo of using asynchronous library in Python with gvent, which should make actually a bingo with tornado, twisted, and async.io. The gvent app. So here are two main things to notice. We monkey patch the standard library, so it does not come out free out of the box. It's a commitment. We are changing further and further and further the thing, and no, yes. And we pass the option to use g to spoon gvent, micro-threading, to the thing. Gvent is not exactly an event loop. It's based on the green thread implementation, but for all purposes, it really behaves in the same way as Async.io would. And you might also notice that I didn't change anything at the network call. It's exactly the same as the one in the synchronous application, which is nice, so we did patch the standard library for it. Again, two process. Let's run it. Let's kill this one. Great. He was going to tell you what you're running. Let's curl it. Yeah, it works. I forgot to attach myself to a worker. You will see why I do that in a second. When you see the number of threads, again, yeah. Again, gvent spoon threads by default to do the DNS resolution. No, it happens. If we try to hammer, again, the server, let's put a lot more. So I had put in 50 gvent call. So this should be, like, the maximum number we would expect for things, 50 gvent threads onto processes, 100. Hey, but that's actually a bit more. That's not 1.5. That's a bit more. But let's do less to see how it goes. No, that's not less. And yeah, okay, let's do something in between. Yeah, so gvent, the scaling happens in a bit of a different manner, because you don't get perfect performance than a jump, than a jump. You have a slow raise in the number of concurrent requests that you can accept for an extra little cost. And if we go, like, above what we specified, then it jumps again. You have a jump, but there is not, like, a flat, hey, I take requests, I take requests, I take requests. I take requests, I take requests, I take requests. It's, ah, I take requests, requests. There is some sort of ramp up that we lost to running it. So in that case, we gained a lot of concurrency, but we lost the gain in timings that we wanted. And we completely rewrote our application. We made everything revolve around gvent. And, hey, oh yeah, that was a good idea. So actually, if you strace one of the process, you can, I can prove you that, hey, it's doing DNS resolution. You can see local rules getting resolved here. And, yeah, it depends on how good you are with reading strace. And you, you are changing everything and you might not want to do that. What you might want to do is keep your good old application. You don't want to deal with asyncio because you believe you're going to mess it up. You want the simplest solution possible and maybe just the network calls and all the rest be gone. Like you really want a minimal option. And there is the last technique to do that and the last idea that I will present. And that's offloading the event loop in a separate thread and then interacting with the thread. That way you have, rather than having, we're a bit back to the thread pool executor in the beginning, but rather than having many, many threads, you want to do the same thing and you want to do the same thing with the thread pool executor which does everything asynchronously and then you interface your code with it using the future abstraction which comes and DNS exactly made for this. The code is a bit more involving here. I'm using tornado for, because it makes a nice code for a presentation, I will mention all the options just a bit. Just wait. The event loop, we have a function that makes this current in the thread, starts it, we spawn this in a separate thread, we start, I do clean up in a terrible manner with it exit, don't do that. It just makes the code nice and doesn't make the presentation crash, but there are better ways to do this. The long network call becomes a bit more complicated because we need to pass the result in a future. You might be noting I'm using concurrent future and not the tornado future if you're familiar with tornado it's because we are not running things in the same thread and so we need the thread safe version of it. Great. The application is very, very nice and we get an extra things from being able to do this and that's, we can do something else. Right now, all the example I presented, you are preparing all of your network calls then throwing them at the distant server and then waiting for results. By doing this kind of things, what you can do is like, hey, I want this thing, then image this one and then you block for the result you want at the time you want them. If you have things which are, some calls which are dependent on one another, you need to call the business service to get the idea of a business and then call the second one, that's what we're using at Yelp for obvious reasons. Right now I'm just sleeping one second. Let's run it. Oh, no, a little thing. You might notice an extra option here which is called lazy apps and I'm forced to do this because I'm cheating a bit. If you do remember the application is loaded and then forked and the problem with that is the IO loop is created right away when the code is imported and that wouldn't work with the pre-forked model perfectly. So I'm using lazy apps to avoid this behavior when just the Python stack is loaded, then the fork happens and then the application is loaded everywhere. Cool. It's running. Cool, let's curl it. Yes, one second, it's not 0.5 like usual because we wait one second, but we just have one second. Everything happened in parallel behind. We are happy. If we try to hammer it, we have the same kind of performance and with the right port. Thank you in the front. Yeah. And if I put three, we should wait for three seconds which would give me time to make an H-top on it. And we see only two threads which is what we expected. Tornado does it right. There is no extra thread spawn. Victory. Again, our server scales in the exact same way as the original Whiskey server did. And that's very nice because the extra cost we have is just extra thread, but it's not really a very working thread. What it does is mostly waiting, so it does not impact performance much, does not impact memory much and it's very, very practical, especially if you don't want to get too involved into asynchronous things. You just want to hide the fact that you're doing asynchronous things. I showed you could, you should really not be using in production, but fortunately there are a few libraries you could be using. If you are really interested in offloading the event loop, Crochet is a very, very nice library. It uses a twisted event loop and you can basically run anything in the Twisted Reactor offloaded. It's very, very nice. And if you are just wanting the networking part, Yelp does a library which is called Fido that you might have heard of if you went to Stefan's talk yesterday and use it. It's called Final Note. So it was very, very short. I went, I skimmed over many, many things. But what the takeaway is use what fit you need. If you have like a very, you need to handle many, many asynchronous connections and you want something, do a vented app. Go for G-Event, go for, do something that help your app. But if you do not want to do it, don't say, I can't, it's like I need to do this. No, there are many, many solutions to keep your app in the way it is and still get the performance. Remember that you're doing tradeoffs between speed of your endpoints and concurrency and beware of DNS resolution. It can really be a problem and a pain from time to time. We are available everywhere. I would really like to recommend the engineering blog. It's really nice. We publish lots of articles, interesting ones. And we have a booth at which I will be later if you have more questions that I cannot answer right now. Thank you very much, Loris, for this talk. I have a couple of questions. I have a few minutes left. Please raise your hands. I might have a question. Why do you need to avoid these threats for DNS? You don't need to, but it's an extra thing that you don't really need. Like if you were, you could also avoid DNS resolution altogether. If I had written 1.7 from the local host directly, I would not need that, obviously. It's not necessary. It's more full threats that are taking CPU times that could be just integrated more seamlessly into the rest. Tornado does it, Twisted does it, and not all libraries do it. It was more surprising in a way than really completely bad and completely terrible. It's more you can avoid it. Thank you. Any more questions? Yes. You were doing offloading to another thread inside of Uwizgi, but one of the things that the Uwizgi documentation talk about is that when you activate the threads, it actually has to create the GIL, and it doesn't, if it doesn't allow threads, which is strange, but it's how it works. Have you thought about another process and then get the results back so you don't have that? I don't know if the trade-off is going to be good or terrible, but yeah, maybe. We thought about it, but actually you don't, so actually you have several options. You could also tie yourself closer to Uwizgi, and right now I'm spending the thread manually, but you could also ask Uwizgi to spawn the thread for you, and that way it would really be shared across all of Uwizgi instances. It's an optimization, but that ties you with Uwizgi, which means that you use Uwizgi for your deployment, and it's, you can do it if you want to, but if you want to switch to the unicorn or if you want to switch to something else, it means that you need to rewrite part of your application code. And about for putting in a process, you don't need to, because this offloaded thread actually doesn't do much. What it does most of the time is waiting, you don't hold the guild that much. So the guild is not a problem with a thread that does so few things, because it's spending this time waiting most of the things. If that were to change, so if you are to offload a lot more things in it, maybe it would be worth it, but as you're IO bound most of the time, threads is more efficient, just memory-wise. Yeah, yeah. Thank you. Any other questions? Hi, thanks for the talk. And my question is about G-Event. Basically, it monkey patches the standard library, so that means that messes with internals. And how do you feel? Will it survive to the next Python releases? I'm not, to be perfectly honest, I'm always a bit wary when you have to patch libraries, especially because that means that when you enter, sorry, I'm speaking way too fast, all third-party libraries, you have problems, and when you go G-Event, it's like, hey, you need to use everything G-Evented. And now that there is a standard in Python, which is async IO, I think as people are migrating to Python 3, things like G-Event will disappear. But for people using Python 2, G-Event, Twisted Tornadoes are really, really good options. So I would think it's going to go away. Yeah, my personal opinion is it's not going to survive async IO, unless there is some huge improvement into the scheduler, so as they are not exactly running the same way, it's green thread, unless there is a huge improvement in that regard, things like UV loops for the async IO should be more performant anymore. Anymore? There's one in the bag. Thank you for that long walk, and thank you for the great talk. Just a comment rather than a question. G-Event Python 3 support is being worked on, so they have plans. You're right, maybe they will win over async IO, or maybe we have a wonderfully vibrant ecosystem of different approaches. Cool. Thanks. Ben? No more questions? Thank you very much, Loris. Thank you.