 Okay, welcome to the developer track at scale 14. This is sponsored by Percona, which means that I have to read you their marketing blurb. With more than 3,000 customers worldwide, Percona is the only company that delivers enterprise class solutions for both MySQL and MongoDB across traditional and cloud-based platforms. And with that, I'm just gonna turn it over to Jesse and let him do his presentation. Thanks, everybody. So, I'm Jesse Davis. Is your mic on? Is it? Check, check. I think it's on. I work for MongoDB, and MongoDB's pretty cool. Naturally, we are hiring. And if you wanna find me on Twitter, I'm over here. And I'm gonna be tweeting at a link more information about co-routines at the end of this talk, so you can find that link there. What I wanna show you is how a non-blocking framework that uses callbacks and an event loop would be implemented in Python 3. And we're gonna go in three stages. So, you'll know where you are. First of all, we're just gonna do a basic blocking HTTP client, and that'll be like our base case, so that we understand the simplest possible implementation of a solution to this problem. But we'll see that that's not very efficient, and so we'll replace it with an ASIC framework that uses callbacks. And that'll be very efficient, but it will also be just incredibly ugly and awful. And so, we'll replace that with something that's quite beautiful called co-routines. So, a little bit of setup. I've got a web server running on port 5000, and if we see what that web server is serving, we'll see that it says, hello, scale. And we'll also see, it's not very fast. I've deliberately coded it to take about a second to answer each request. And we'll see in a moment why it's so important that this is slow. It's actually talking to slow servers that an ASIC framework is best for. So, let's do the first bit here. Let's write a basic blocking HTTP client. So, I think the simple way to do this is just write a function that will fetch a URL at a path. And it's gonna need a socket. So, we'll import that from the socket module. And let's connect it to localhost port 5000. And we'll need to send an HTTP request to the server. So, it'll be like get some path, and we'll tell it that a protocol is HTTP one. I will substitute the path in there. And since I'm using Python three, we'll need to encode it before we can send it over the wire. So, we're transforming from a string into bytes that way. Oh, and an HTTP request ends for some reason with a double carriage return line feed. I have never known or cared why. So, once we send this get request, then we need to read the server response. And reading a response has this sort of annoying API with sockets where you just kinda keep reading and you never know how much you're gonna get until something tells you that you're done. And with HTTP one, the way you know you're done is that the server closes its side of the connection. And so you get an empty read. So, the way we're gonna do that is we're gonna make an empty buffer to store the chunks in. And then we'll go in a loop and we'll just do a chunk is, pardon? Yeah. Good. A chunk is just whatever we get when we ask for up to a thousand bytes. We might get one byte, we might get all thousand bytes. We just say what our maximum buffer size is. And if we get something, then we'll just add it to the buffer and keep asking for the next chunk. Or if we get an empty chunk, that's the socket module's way of telling us that the server's closed its connection. And in that case, we're done. We've read the full HTTP response. So, the response is equal to, this is like a Python idiom. This means the empty byte. And we use it to join up all the chunks in the buffer into one big byte string. And then we'll decode that. So, now we have a text string. And let's print it and let's remember to return cause we're in a well true loop. So, let's see if I got this right. We'll get the foo URL and run that. And it says hello scale a couple hundred times. It doesn't seem very fast. So, let's see how long that this takes. So, we write down the start time and we'll print how long this took. Percent time dot time minus the start time run that. All right, so it takes a second and it's not very surprising because I very carefully coded the server side so that this would take just about exactly a second. And you can maybe see this a little more clearly if I just print the first line of the response just the HTTP status header. So, let's split that by new line and print the first of them there. So, we see that HTTP one 200 okay. And that it took one second. Now, the problem here is that if I get two URLs how long is this gonna take? It'll take two seconds because I'm doing them serially. Each get must read the full response before the next get sends the next request. So, there's a couple of ways of solving this problem. Obviously, what we wanna do is we wanna do these two things concurrently. And in Python, you can do things concurrently with threads. There's the infamous global interpreter lock but that doesn't actually get in our way very much in this case because the Python interpreter, so the global interpreter lock means that only one Python thread can execute Python code at a time. And so it means parallel computation is not possible in Python using multi-threading. But that's not the problem that we need to solve here. We're not actually doing significant computation. All we're doing is just dumping bytes into a buffer and then printing them out. We're not actually using the CPU very much. And the nice thing about multi-threading in Python is that Python threads drop the global lock while they're waiting for socket IO. And so you can actually use Python threads to do concurrent network operations as long as you don't need to use the CPU very much. So that's a perfectly reasonable way to solve this problem. But another great way to solve this problem is async. We could talk about the pros and cons in a little bit but let's just say for now that we've decided that we're not gonna do multi-threading and that we want to do concurrent operations on a single thread. And it sounds impossible. So let's figure out how to do it. We're gonna write an async framework. And an async framework, it has three kind of components. One of them is non-blocking sockets. The second component is selectors. And the third component is an event loop. So non-blocking socket, it's just a socket where you've called set-blocking false. So easy enough. Now I've made my thing async, let's run it. So that didn't work very well. We get this blocking IO error and the line that through it is the connect line. Through an exception as soon as I called connect. Well, I sort of have a rule of thumb for Python exceptions, which is if I don't care about it, I should just ignore it. So let's, I don't know, let's see if that works. Well, we got a different exception, so that's progress. And we got an exception on a later line. The exception was socket is not connected. So this sort of makes sense, right? We told the socket, don't block. And the contract of a non-blocking socket is that every operation either succeeds or fails immediately. It never waits to complete. Since connection can't succeed immediately, it takes some time to set up the TCP channel. So it throws an exception. And this isn't actually an error. There's nothing going wrong here. It's just saying I couldn't do this immediately. So I'm gonna tell you that by throwing an exception. It's obnoxious, but this is how they work. So I'm just going to ignore that blocking IO error. But that means that I immediately get here and we're not yet connected. So I want some way now to wait for the connection to complete. And this is where this second part comes in, selectors. So since the dawn of ages, operating systems have had ways to say that we're interested in events on non-blocking sockets and to find some way to wait for those events to occur. And those are functions with names like select or pull. On Litix, the most scalable version of that is called the E-Pull. On a Mac like this, it's called KQ. Depending on the operating system, you might want to use a different method to wait for an event. But the nice thing is that in Python 3, we've got this selectors module, which means we don't have to worry about where we're running. We can just say from selectors, import default selector. And we make one of those and that's whatever is best. On my Mac, it'll choose KQ. On a Linux box, it would choose E-Pull. I don't have to worry about it. The way I use the selector is down here, I do register the file number of my socket. And this is just a number. It's the file descriptor of the socket. It's four or five or something like that. And I write the list of events I'm interested in. In this case, I'm waiting for the socket to become writable because that's the next thing that I want to do on it. I want to write to the socket. And I'll import that from selectors. And then I call selector.select. And here's now where I block. Select waits for the socket to become writable. Once it is, I'll clean up after myself and then I'll be able to call send. So let's see how this does. Throws another error. Well, but that's good. So I threw an error. Another blocking IO error. But this time I did it down here in receive. So I managed to get past this part and I got down here. And I think that means that I just need to do the same stuff that I did before. But this time I'm not waiting for the socket to be writable I'm waiting for it to be readable. So I'll import that, run this. It's not that great, is it? Like it still took two seconds and the code is worse. So what's so great about ASIC? This is gonna be a theme of the talk, which is things have to get better or things have to get worse before they get better. So let's, how could we make this better? What we wanna do is once we've registered this socket to be and we're waiting for writability, we wanna somehow do other work until that's ready so that we can do these two fetches concurrently. So in order to be able to do other work, this get function, it needs to return so that we can then begin the next call to the next get. So what if I do that? That's not gonna work, right? We need to somehow get here once that select finishes. So let's start to kind of sketch out how we would do that. What we wanna do is we wanna write a function, it's called writable, that will be executed once this socket becomes writable once we're ready to do the next thing with it. And PyCharm is underlining S and path because those are no longer available in here, those were local variables of get and we're not in the get function anymore. So let's say that those somehow get passed in so that we have them again. And we're not gonna need this select call anymore because we're assuming that we're somehow, by the time we get in here, the socket's already writable. So now we need to figure out some way to schedule this writable function to be executed as soon as it's ready to run. And the way we're gonna do that is, here's where all the magic happens. This register function takes a third optional argument called data. And data can be anything we want. We're in Python, it's dynamically typed, we could stick in a number or a string, or we could actually stick in a function there as the data. So let's do that. Let's make a closure using the lambda word. And what the closure is gonna do is it's gonna execute writable with S and path captured from these local variables. This is gonna be a little function that doesn't actually run right now, but when you do call it, it'll call into writable. So that stuff's spurious now. So now we have an async framework, so let's run it. So it was very fast, but it didn't do any work. Why is that? Well, down here we called get foo and get bar, and so we registered the two sockets to, with their callbacks, we're waiting for them to become writable, but then we never call select, I deleted that line. So let's do that, selector.select, and that returns something sort of complicated. So let's print that out and see what the return value of select is. So we run this and there's its return value. And it's sort of this big mess, but we can see it's got two selector keys. What these selector keys are is there a bunch of information about the two sockets that are registered for notifications about. And they got a bunch of junk that I don't care about. We have the file descriptor, we have the file descriptor again for some reason. We have the list of events that occurred. This is what I really care about. Here's that lambda that I registered earlier as the data argument. This is the callback that we wanna execute as soon as the socket is writable. So we can pull that out. First we need to iterate over all of the things that the select call returns. So that's gonna be for key. And it also returns an event mask, which I don't care about, I only care about the key. I really don't care about anything in the key except for the data. Because that's that lambda that I passed in. So we'll call that, that's equal to key.data and we'll execute it. So if I run this, now we have an async framework, right? Okay, so it threw an exception here. Well, okay, so far we're making progress, right? Because writable got executed. So that means that we have successfully waited for the socket to become writable and we wrote to it. But what went wrong was, so receive through blocking IO error that indicates that the socket's not actually ready to be read yet. And I think that the reason that that happened is we called select here and we're not supposed to be calling select here anymore, not inside this callback. What actually happened is that the first socket became writable. So we entered its callback and then we called select again and the event we got was something completely irrelevant. We got the event from the other socket becoming writable. And so we thought, okay, great. So now it's time to read my socket, right? But that wasn't at all what we were supposed to find out. So what we need to do is we need to delete this select call. So now it's gonna get a little complicated. We've entered writable, we wrote our request and now we're waiting to read the response. So we register, we wanna know when the socket becomes writable and then we somehow wanna do something once that's ready. So let's call that writable. So this is gonna be the callback and S and buff are underlined in red. They're not available here. So again, we'll pass those in. And the way that we'll get those local variables into a closure and pass them in when we're ready to use them is once again, we'll make a lambda that we'll call writable with S and buff. Bug, there was a Freudian slip. All right, and we don't wanna do that forever, we just wanna do that once. And I'll fix the indentation here. Okay, and now there's one more thing that we need to do which is if we read a chunk and we add it to the buffer, then this means that we are not yet finished reading the response. So we need to re-register until the socket is readable again. And that means that more server response is available for us. And the way to do that is just to copy and paste, which is all I've ever done as a professional. All right, and I think we might be ready to go. So what's going wrong? Anybody? Why isn't this working yet? Yeah, we call select, but we only call it once, right? So we called it, we waited for an event, we handled that event, we called the related callback and then we're done. And we say, oh, that took zero seconds. Async really is great. So what we need to do here is we need to do while, while Turing, while true, process the events. And whenever an event happens, we call the associated callback and then we wait for the next event. This is great. How long did that take? It's still running. I forgot to ever leave the while true loop. That's correct. I got rid of that while true loop. Yeah. Yeah, I got rid of that. So we've just got one while true loop here, right? There's no exit condition for this loop whatsoever. So I think we want to somehow sort of know when we're finished, right? So once we've returned from here twice, that means we're done. So I think what we want to do, let's say up top that we've got like n jobs equals zero. This is the number of things that we're working on. And then here, global n jobs. So in Python, you need to declare something as global in order to modify it from an inner scope. So we do that and then we increment it. So now we're working on one URL and we work on it and we work on it. And then when we're finally done here, we don't need this return anymore, but we do need a global n jobs. And then here we need to decrement that. And then I think, so these two gets will both increment and then eventually they'll be decremented again. And then we'll just, we'll only loop until n jobs goes back to zero. Well, there we go. So that's pretty cool, right? And what's really cool is, so we got two URLs in one second instead of two seconds. And what if we get like 20 URLs? That also takes one second. So that's pretty neat. We're very efficiently doing concurrent IO on one thread. And that's possible because most of the work, basically there is no work. It's spending all of its time waiting around. So it's hardly ever using the CPU. All it's doing is adding events to the list of things it's waiting for. That's cheap. And compared to the multi-threaded version of this, the overhead for waiting for each of these events is minimal. All you've got is a file descriptor in a list of things that you're waiting for and a pointer to a closure, right? It's tens of bytes. Compared to the overhead of a thread, which has a stack. It has entries in like scheduler data structures deep within the curdle, I assume. I have no idea how that works. There are hard limits on the number of threads that you can run, which are typically much lower than the limits on the number of sockets that you can open. If for every socket that you're working on, you also have a thread. You will run out of threads first. You run out of threads before you run out of sockets. Allocating a thread per socket artificially limits the number of sockets that you can work on concurrently. So that's the problem that Async solves. If you have lots of sockets, but you're not doing much work on each, Async allows you to scale to a larger number of sockets concurrently. So that's cool, right? Yeah, so the question was, isn't this while end jobs loop a busy loop? Are we spinning a lot of CPUs, that the concern? So no, because the select call, this is the one place that the framework blocks. This select call will wait quietly until some event occurs. And this is the only place in the entire framework that we're allowed to actually pause and not do anything. We've essentially said that this thing here, which is the event loop, is the sole part of the code that's responsible for blocking. Nothing else can wait. So what have we seen so far? We are, what time is it? We are 32 minutes into the talk. We've written an Async framework with callbacks and an event loop. And we've shown that this is very efficient and also that it's very ugly. We bloated this thing out from like 20 nice lines and one function to about what, 100 lines, 70 lines, and it's disgusting. How can we get back to the like, Edenic beauty that we began with without sacrificing the efficiency that we've gained? Co-routines. So, co-routines, what I'm gonna show you is how co-routines are implemented in the Python 3 standard library module called Async.io, which was introduced by Guido van Rossem in Python 3.4. And they're built with a future class generators, which are a Python built-in feature which has been around for many years. And the last bit is a task class. So we're gonna go through these one by one. So, first of all, the future. A future wraps a callback and a future represents some value which is not ready yet. So, I can make an extremely dumb one that has a callback, which begins now. And it has this idea that it will resolve, which is some event that we're waiting for. And when the future is resolved, it executes the callback that was waiting for it to be resolved. And everywhere that we use callbacks now, I'll replace those callbacks with futures. And we'll see how that goes here. So, here's a callback. Let's make a future. Let's set the futures callback to this lambda and we'll pass in the future instead of the callback. And here's another callback. So we'll move that out and we have one more callback and that's here. So, once again, we'll wrap that in a future. And now, instead of registering callbacks, we register futures that wrap callbacks. So that means that down here, this key data, this isn't the callback anymore. This is a future. And so we wanna execute the futures call, well, we wanna resolve the future and that'll make it execute the callback. So now we run this. Once again, it takes one second and we haven't gained anything, right? I only made the code even worse. But I warned you that this was gonna be a theme, that things get worse before they get better in this talk. So, I'm fulfilling that anti-promise. So we wrapped all our callbacks and futures. The next portion of this is generators. So let's engage in a digression. I've got myself a Python console here and the way generators work in Python, it's a little bit bizarre. So generators come from generator functions. And a generator function is a function with a yield statement in it. So let's say that our generator function prints start and it's got this statement yield one. And because it has a yield in it, now it's a generator function. We'll see what that means in a second. So maybe next to prints middle and I don't know, it assigns a local, then it yields another number and then it's done. So that's the contents of our generator function. And if we execute it, it doesn't actually run. It doesn't print start, it doesn't print middle. It just returns this generator object. So what's a generator object? Well, it's got two things. It's got some code. So here's its code. It's named generator function because that's where it came from. And you can actually see the code. There it is, very enlightening. That's some compiled Python byte code. We can ask how much is there? There's 40 bytes of it. And besides code, the generator object, it also has a stack frame. So this stack frame, it hasn't done anything yet. So its instruction pointer is negative one. It has not executed any byte codes yet. And like any stack frame, it's also got the local variables. And there aren't any yet because this hasn't had a chance to create any. So the way you make a generator run is you call next on it. Next G. So now it started to run, right? We saw it print start. And the return value of next was one. That was the number that it yielded. So it printed start and it yielded one. And now it's actually stopped at the 13th of those 40 byte codes. So it's in suspended animation here. It's waiting for me to call next on it again. So if I do that, now it prints out middle. So now it's reached the next point in its execution. The return value was two. That was the next number that it yielded. We can see that its last instruction pointer is now 34. So it's advanced farther through its code. And it also, it's had a chance to execute x equals seven. And so now it has locals. If I call next G again, it tells me rather bluntly that it's finished by raising a stop iteration exception. So that was our digression. That's how Python generators work. They're incredibly weird and interesting. And they're also very useful because it's a function that you can start and stop at will that you can cooperatively schedule. So this seems like a pretty cool thing to use to create co-routines. Things that can be scheduled asynchronously to cooperate with each other. So how would that work? I think the idea would be like this get function, this would become a generator function by putting some yield statements into it. So instead of like declaring a callback here and then having the callback run down here, what if we could somehow just say like yield F and then somehow we know the socket is writable by the time this generator is resumed. If that could somehow work that way, then we wouldn't need this readable callback either, right? We could just say like yield this future and then somehow socket is writable, is readable by the time we get here. If that worked that way, then we could get rid of this callback as well. So every time we get a chunk, we would just wanna go back up here. We would wanna create another future, then wait for it again until we had read all of the chunks. So we wouldn't need to keep re-registering the socket down here. Right, if that worked that way, we could actually, we could just get our while true loop back. Then we would kind of be back to where we were where we wait for the socket to become readable. And then somehow by yielding F, we wait until that event occurs. And then we receive a chunk, we append it to our buffer and then we just do that again until we're done. And this loop has the same kind of structure as we originally had in our blocking example question. Well, it's about to be. For the moment, all we know is that yield F, it's gonna pause this generator. And then somehow we need to get this generator resumed once that future is resolved. So how's that gonna work? Well, let's run this and see. Well, okay, so far so bad. The reason is that what I've got right now is I called get, but now that get is a generator function, it just returns a generator when I execute it, right? It doesn't execute its code. We saw that earlier. So we need to somehow call next on this generator until it runs to completion and raises a stop iteration exception. So the way we do that is this third thing, this task class. So I'll write that down here. Task is the name for the class that the Async.io standard library module uses to call next on the generator. And task is essentially the thing that turns a generator into a co-routine by scheduling it, by running it to completion and resuming it whenever whatever it's waiting for is ready. So it's gonna take in a generator as its argument and store it away. And then let's say that it has a method called step, and step is gonna call next on self.gen to run it to the next yield statement. And whenever one of these things yields, what's the return value of yield going to be? It's gonna be the future that it's yielding. So that'll pop out here. Some future object will be F. So now we wanna kind of wait for F to finish. And the way we wait for F to finish is that we assign something to its callback. We say, this is what you should do when the future resolves. So what do we wanna do when this future resolves? Call next again, right? Yeah. So the way we do that is we'd say step, step, is the callback that we wanna execute. Right, so it's like funny recursion. So let's run that. So it still doesn't work. Can you see why not? What am I missing? We do resolve the future. Oh, we never created a task. Okay, that's a good start. Task, right? So we need to create the tasks and wrap up these generators and tasks. Let's see if that was it. No. Oh, still need to call step. That's right. So let's do that here in the constructor. So as soon as you create a task, it gets a generator and starts running it. And then whenever the generator pauses, we say, okay, resume it whenever the thing it's waiting for is ready. Let's run that. This is looking good. I feel good. Yeah, right? And what if we did this like a bunch of times? Too much output, too process. Okay, what if we do this somewhat fewer times? Feel good if an infinite loop here again somewhere. Where is it? Ah, there it is, right? We have a while true loop. So now we need to return again. All right, stop iteration. And that's getting thrown from step. Oh, so that's good, right? It's good that we're seeing a stop iteration because that means that we've started to be able to run generators to completion. We've run the get generator all the way to the point where it returns. So you know my philosophy about exceptions in Python, which is to ignore them completely. So what are we gonna do here? I think we just wanna return, right? Cause this task is done. So if we do that, woo, right? So we're back to where we were. And for once, I've actually made things better. At least, thank you. At least that's my opinion, right? We, it's not as good as it was when we started. It's a hundred lines of code now, whereas before it was like 20. But it's really efficient. It's just as efficient as it was with callbacks. And we're back down to a single get function. It's a little more complicated than it was once. And it's partly because we haven't implemented the sort of rich convenient methods of a real async framework. If this were async IO, it would be just about as short as the blocking example would be. Cause we wouldn't have to do quite so much manually. Wouldn't have to like make our non-blocking sockets and stuff. And by substituting yield statements for callbacks, we've managed to get all the way back to a single function again. We have normal exception handling. We have normal local variables. We don't have to pass them around in closures anymore. It's beautiful again, but it's got all the efficiency of single threaded concurrency. So let me see, I wonder if I have to blather on with the conclusion. Okay, a couple of minutes. So let me blather on with the conclusion now. We learned two cool things. One of them was technical. Co-routines are a beautiful way to do asynchronous concurrent processing in a single thread. It means you can do lots of concurrent IO without allocating a separate thread for each socket. So you can scale out highly concurrent network applications much, much more efficiently with this style than you could with threads. But they're almost as pretty as threads. And if you want to learn more about that, I've set up a page for you at bit.ly slash co-routines here and that's got a bunch of links out to a lot of much deeper information. It's got both the code that I wrote just now and it's also got a link to a chapter that I wrote with Guido van Rossum, which has a much deeper and broader co-routine example than this one. The other, so it's the sort of technical hooray, but the non-technical aspect of this is look at everything that we were able to understand in just one session. We went from absolute nothing all the way to a co-routine based asynchronous framework. This thing that looks like it's totally magical is actually completely within your grasp. You can learn it. You can learn not just how to use co-routines, but how they're implemented from the bottom up. And it really isn't that hard. It really doesn't take that long. So if you follow that link and you read in a little deeper, I think that really grasping this awesome new idiom is completely within your grasp. Thank you very much. So we've got a few minutes for questions if you want. Go ahead. Sorry I didn't get that. Take the microphone. So if you want to write an app using AirBros and try to take as much as resources of the server for my application, it can't be bad, right? Because I block another process to sell system resources. So how can you revamp it? Like you as much as a socket I have for my applications and it just like revamp another application to use the socket on the server. So do you know the way to revamp that from other developer that just try to make their browser as fast as possible and just take away from other application sockets. You get the question or? No, I'm sorry. I'm not totally following. So we have a limited number of sockets we can use on the server. And if I use it across and they try to wrap as much as a socket for my application. Right, right. I've got an idea. I've got an application. Yeah, so do you have any suggestion to revamp that happen on the team? So the question is once you remove threads as a bottleneck how your next bottleneck is likely various caps on the number of sockets that you can open and how to remove that. On Linux there are various ways. I don't know all of the specific commands but I have heard of Linux boxes running million concurrent connections. It requires a bunch of kernel tuning. You probably have to run as root but it is very much possible to end up with only memory as a bottleneck which is actually pretty exciting because you've reached a point where hardware itself is your bottleneck and that's the bottleneck that you wanna have. That means that you're using your hardware to its fullest. Another question? All right, great. Thank you so much, everybody.