 Ah, we're back. So you're still with us here in the Optiver room and the next talk we get is marked as advanced. But don't let you get scared for that because we also have a very advanced Python trainer who's going to help you get the best out of this talk. So let's welcome to the stage Reuben Lerner from Israel. Hey. Hey Martin. Hi there, how are you doing? Ah, I'm fine. It was a fun day today. We had all kinds of problems in the morning, but things are now running smoothly. So I hope we won't have any problems. So I see that your screen share is ready. So if anybody has questions about this, you can type them in the main chat and we'll do a Q&A later and talk about that afterwards. So the stage is yours. Excellent. Thanks so much. Hey everyone. Welcome to my talk. This is called Generators, Co-Routines and Nano Services and I will describe all of these in just a little bit. So just a few words about myself. I am a full-time Python trainer. So just about every day I'm in a different city, different country, working with a different company. Okay, in the last year and a half, I've not really traveled that much, but I've been doing a lot on Zoom and WebEx, a lot of corporate training. I have individual courses, including something called Weekly Python Exercise. I published my first book on Python last year, Python Workout, with exercises to improve your Python fluency. And I'm currently working on a new book, which will very, very soon be released, at least in initial form called Panda's Workout, with exercises to improve your fluency with the Pandas Library. And I've got a free weekly newsletter about Python called Better Developers that currently has about 20,000 subscribers. And I'm on YouTube and I'm on Twitter, but let's concentrate on our real topic. And the real topic is, this is not an async.io talk. I know whenever someone talks about coroutines nowadays, you assume it's about async.io. Now, I will tie this together with async.io later on. But for now, we're not going to be talking about that, which raises the question, what am I talking about? I'm glad you asked. So let's start off with the dumbest function in the world. So here we go. I say def myfunk. And then it has three lines in it, return one, return two, and return three. And if you're thinking, wait a second, that's pretty dumb function. Well, I did warn you, right? And so what's going to happen when I call this function and then I print its output? Well, not surprisingly, when we run this, we're going to get one. And that is it. That is the end. And of course, that's because return in a Python function says, that's it. We're ending the function now, and we are returning this value. Every function has a return value. And what happens if we have those calls to return after that first one? Well, we're never really going to get to it, right? It makes sense. Return means stop the function, return a value. And even if I run pylint on this function, it says, hey, those final lines are unreachable. And here's the most impressive thing to me. Python is, of course, a byte-compiled language, meaning that whatever code we have is turned into byte codes, and then those byte codes are interpreted very similar to what happens in .NET or in Java with the JVM, just that the Python compiler tends to be pretty dumb. And this is on purpose. It's supposed to be more or less a one-to-one translation between what we've written and what happens. So you would think in some ways that our three lines in the function body would actually be translated to byte codes. But that turns out not to be the case. If we use the disk module for disassembly, we can actually look at the byte codes that are generated from a function. And if I say disk.disk, so that's the dysfunction in the disk module, and I run on my funk, this is what we get. These are the byte codes that Python is actually running. It's going to load the constant one, and then it's going to return the value. That's there. It's going to return one. And that's it. So even those second two calls to return are ignored completely. So they're really ignored. Now let's take this to the next level. What about this function? And this function is different. I call it my gen. So the name is different. But the real difference is that I'm no longer using return. I'm now using a keyword called yield. And so I say yield one, yield two, and yield three. So what's going to happen now when I call my gen and I run print on it? What are we going to get back? What happens when we run this code? And here's the amazing thing. We don't get one. We don't get two. We don't get three. Rather we get this. We get this generator object. And so yield makes the whole difference here. The fact that we use yield in our function body turns our function from a regular old function into what's called a generator function. And I'll, by the way, add that many, many people call these functions generators. They're not really generators, but a lot of people call them that. We really should call them generator functions. And so when Python compiles your function, it notices the yield in there and says, wait, we're going to tag this function as a generator function. And so when you run the function, it no longer runs the code. Rather it returns a generator object. And we're going to come back to what those are and how those are important in just a moment. So let's try now disassembling our generator function. Well, first of all, I'm going to use this.showCode. And showCode allows us to get like a high level view of our function object. It shows us the name. It shows us where it was defined. It shows us many arguments and so on and so forth. And it also has a bunch of flags. And those flags are used for Python to understand what's going on with our function. You can see here that there's a flag in here called generator, meaning, hey, this is a generator function. If you don't have yield in a function that it will not be tagged as generator. By the way, these flags also are around. If you use splatR double splatKWR, that's where Python keeps track of this being the case. Okay. And if I actually disassemble my function, my generator function, what do I get? Well, you see it's a whole different kettle of fish here. Now what's happening is it's going to load the constant one and yield it and then pop the top and then load the constant two and yield it. Load constant three and yield it. And then finally load zero or load none and returns that. So what's going on here? What is actually happening? And how does this help? What is yield doing? We see that yield then is a low level bytecode thing inside of Python. Right? Bitecodes aren't really that low level Python, but this is as low as it gets. So yield is really important to the language. So what is a generator after all? What is this object that we got back that our generator function is creating? And the answer is that a generator is an object that implements Python's iterator protocol, meaning we can get the iterator for an object with the iter function. And in this case, the generator's iterator is itself. And we're going to talk about this more in just a moment. You get each value, each next value by calling next. And when it gets to the end, it raises the stop iteration exception. So let's see how this protocol works in general. And then we'll come back to our generator. So let's say I have a simple for loop. So I say for one item in ABCD, print one item. So what's happening here? Well, the first thing that happens in a for loop is Python says, is this object iterable? The second thing it does is it says, OK, if it's iterable, then we're going to ask it for the next thing, the next thing, the next thing. And when we get stop iteration raised, when that exception is raised, then we exit the loop. This is how literally every for loop in Python works. And so sure enough, if we run this for loop, we're going to get A, B, C, and D. That's probably not much of a surprise. But what happens if I put my generator in here in the same for loop? The question is, is what we got back from calling my gen iterable? The answer is yes. It is a generator. We get back a generator and a generator implements the protocol. So the for loop says, oh, you're an iterator. You're iterable. I'm going to ask you for your next thing, your next thing, your next thing, your next thing. And then when we get stop iteration that exits. And what happens then when I run this for loop? I'm going to get one, and then I'm going to get two, and then I'm going to get three. What the heck? What is happening here? Well, let's do this manually. Maybe we'll understand it a little better. So I'm going to say g equals my gen. So I'm going to call my generate function and stick the results in generator object. We can see here is the generate object inside of the g variable. So g is now a generate object. You can see by these ID numbers here that we get, which is really memory location truth be told. We see that g and iter of d are exactly the same thing. So a generator is its own iterator. So when you ask a generator, are you iterable? We recall iter on it. Are you iterable? And if so, who's your iterator? It says I'm iterable. And this is I am my own, my my own iterator, right? Same address, same ID, same objects. That's great. So what happens with next? Well, this is on the right here, my generator function body yield one, yield two, yield three. So what happens if I take then g equals my gen? I say next on g. See what happens? It returns one. And then we run next on g. We get two. Then we run next on g. We get three. And then we run next on g. And we get the stop iteration exception. So what happens when we use next? Next in general, in general tells an iterator, give me your next thing. But when it comes to a generator function, next says run through. And here's the key thing run through the next yield that you see up to and including the next deal and then go to sleep. Go to sleep and wait for further instructions. Well, how much code is between now and the next deal? It could be a tiny bit of code. It could be a lot of code. We don't know in advance. What's basically happening is that Python is running up to and including the next yield and then it goes to sleep. Now the cool thing here is that the function state remains across those calls. So each time I call next, it's just advancing our function a little bit, a little bit, a little bit there. But all the local variables in the function stick around and our stack frame sticks around. Okay, let's see some examples of this. So first example is a fib, right? I'm sure all of you in your day jobs, you're constantly calculating the fibonacci sequence. Here is how we can use a generator function to do that. So I'm going to define fib. We're going to say first equals zero, second equals one. And then I'm going to have it in Phil Loupier, while true, yield first, and then I'm going to use people unpack. I'm going to say first comma second equals second, second comma first plus second. Meaning each time I ask for the next value back from this, I'm going to get the next number in the fibonacci sequence, zero, one, one, two, and so on and so forth. By the way, if you do this, if you call fib and you stick the result inside of list or something else, make sure you do it on someone else's computer because this will take a very, very long time to run because what's happening? Well, it's never going to run stop it. It's not going to erase the stop iteration exception, right? So it's just going to run and run and run until you basically run out of memory. This is great, by the way, if you charge by the hour, but don't do this. All right, let's try one other example. I should ask, by the way, I should add, by the way, that the whole business with fibonacci shows that a generator function is a great way to describe an infinitely long sequence, something that we know will take forever, but we only want it in little bits and little bits and little bits. Next example, readn. This is a favorite example of mine. We know that if we read from a file in general, we're in Python, we're going to get one line, then the next line, then the next line. But a lot of times, files are not in one line records. They're in two line records, five line records, 10 line records. So readn is a generator function that when I read from a file, I'm going to get n lines back. Okay, so what's going on here? Well, we're going to read line. We're going to run f.readline. That reads from the file. How many lines did we read? Well, one, up to and including the next new line. How many lines are we going to read? Well, I'm going to read n. I'm calling the function with the name of a file and n, the chunk size that I want. And then I'm going to read that many lines and I'm going to join those together in a string and put them to output. And then I'm going to say if output, meaning if I have a non-empty string, I'm going to yield that output. So if we have something in the string, yield it. If not, then we're going to break. What happens when we break out of our while loop the function ends? What happens when the function ends? It raises stop iteration. So this is a great way for us to get n lines back. We're going to get a string, another string, another string, each of them being n lines long except for the final one, which could be a little shorter. Okay. Here's another third example. So here's a get vowels generator. So let's say I want to open a file. I only want to get the vowels. Why would I want to do that? I don't know, but let's say I want to. So what are we going to do here? Well, we're going to open the file. We're going to iterate over one line at a time. We're going to iterate over the current line with character at a time. And then the key thing, if the character is in AIOU, then we're going to yield one character. We're going to yield, and what happens after yield? The generator goes to sleep and awaits for the next time that we call next on it. Okay. And what happens when we get to that end of that line? We go back up to the for loop. We're basically going to go through every character in the file one at a time, only returning the vowels. So when are we going to use generators? Well, when we have a potentially larger infinite set of values to return, it's often easier to express the idea as a function, right? That instead of expressing something as a bunch of classes or as even one class, which we can do, and then we can use the iterator protocol to implement it. So it's easier to express as a function. So we have to set things up. So let's say I want to set up a network connection and a database connection, something like that. And then inside of my generator, I can take advantage of the fact that local variable has been set with that database or network connection. And then I can also keep state across runs in local variables because it's just going to sleep. It's not actually exiting. In technical terms, you can think of it as, well, I mean, when I define a function and then I run a function, every time I run a function, I get a new stack frame. And in that stack frame, I have my local variables. But when it comes to a generator function or a generator object, I should say, the stack frame sticks around across the calls to it or across the use set. What if I do this, though? This is a little different than what we've seen so far. Now I'm going to say def my gen, x equals none, while true. And then I do something really weird. I say x equals yield x. Wait a second. What's going on here? Yield returns the current value of x and the function goes to sleep, right? Yes, that's true. But here, in the case of yield x, it's on the right side of assignment, meaning that it's going to get a value from somewhere. Where the heck is it going to get a value from? Well, from us, of course. When we use yield as an expression, the same rules apply. Each time we call next, the generator runs to the next yield and after yield goes to sleep. But yield is now for two-way communication. Our function, or our generator, I should say, sticking around in memory across the calls to next, it can now receive messages from the outside world. We can send something and another something and another something. And whatever we send to it replaces that yield in the expression, typically in assignment. And the way we send something, well, we send something with the send method. So we can advance a generator, next, next, next, next, with the next function. We can send a value to a generator with send. And so the difference between them is that when I say next, it's just like saying g.send of none. It's basically sending a none. Now I should add, here, let's take a look at this. So if I now say def my gen, x equals none, while true, x equals yield x, then x times equals five. So I'm going to create my generator. I'm going to now say next of g. I have to do what's known as priming it. I have to get it past that first yield so I get a value back. What is that value? Well, it's going to be none. Who cares? Then it went to sleep just after that yield. We're still stuck on the right side of the assignment there. And so now I can say g.send of 10. What happens? 10 replaces the right side of that assignment. So now it's x equals 10. Then we say x times equals five. And then we come back to the yield. We get back 50. And then it goes to sleep again. Now I say 23. We get back 115. And it goes to sleep again. It's always going to continue up to and including the next yield. The difference now is that yield is both receiving and sending. And what if I say g.send of ABC? Well, as always in Python, variables don't have types. And so as long as I send something that is, you know, can be multiplied by an integer, we're doing fine. I can even send a list, right? One, two, three, one, two, three, one, two, three, and so on and so forth. This is what's known as a coroutine. A coroutine is a generator. It waits to get input from elsewhere using send. The data is received with yields and expression used on the right side of assignment. And our local state remains across calls. So here's another way that I can do exactly the same thing just in a cooler, more modern and more controversial way to do it. I can use the walrus operator, the assignment expression. What's going to happen if and when I send a null value or a non-value, I should say, or something that's false in a boolean context, then it will be seen as false and will exit from the loop. So what's happening here? I'm saying def my gen, x equals none. And then while x colon equals yield x. So we're going to yield, we're going to wait, and whatever is sent to our generator is going to be assigned to x, and while it's going to see that and say, oh, it was false in a boolean context, I'm going to exit now. And this yield here, you have to put it in parentheses in order for it to work, but it works just fine. By the way, David Beasley gave a great talk like more than 10 years ago about generators and so forth, and he suggested that you don't want to prime a generator, or I'm sure to say a coroutine, you can use the decorator to do that, that it will automatically run next, and then it moves ahead. So what? So what? This is all like fascinating technically, but I've often said that these sorts of coroutines kind of seem like a solution looking for a problem. Like, okay, it's technically really cool, but where can I use it? And that was sort of the genesis of this talk. Like people have talked about coroutines for a while, but it's not clear necessarily where you can use them or how you can use them. And so what I want to do is give you some ideas of how you can take advantage of this technology inside of your programs. Food for thought, and I think it can help with certain kinds of architectures. And my sort of proposal is that you think about coroutine as a nano service, right? Really tiny, timeier than a microservice. So think of it this way. Many applications use what's called a microservice architecture. So you divide your application to many, many, many different parts. And each part is on a different server. Because it's own state, it's separate, but it keeps that state across calls and so forth. And then you access each microservice via a distinct API. So what you can do is you can think of your coroutines as nano services, as services that are sitting inside of your program. They're not on external servers, right? And so there's no network overhead. There's no object or thread of process overhead. It's just like sitting around. It's always running. It keeps its state. And you can just contact it via an API. What does that API send? You say, here's some new data. What do you think? It comes back with an answer. Now you will need to create your own protocol. This is very low level, right? But you can use Python's data structures in order to do this. So let's use an example here. MB5, right? MB5 is a popular hash function. It's in hashlib. It's a little bit of a pain to use it in Python. So it'd be nice if I had like a little service running in memory. Like you just handed a string and I get back the MB5 signature from it. So let's create a coroutine that does that. So here I'm going to do that. I'm going to say import hashlib. That's MB5 gen. And here I'm going to say output equals none because I have to get something at first. And while s colon equals yield output. So what's going to happen? I'm going to get something from the user, right? That's going to be output here. I'm going to turn that. I'm going to do MB5 on it. Well, I'm going to initialize the MB5 engine. I'm going to encode whatever I got into bytes. I'm going to turn that into M update, which you have to do in order to get the MB5 signature. I'm going to get the hex digits from it. And then I'm going to output that. And then we wait around for the next one, the next one. So if I say G equals MB5 gen, I'm going to send none, have to prime it. And then I send hello. And I get back the MB5 of that. I send goodbye. You get back the MB5 of that. And here's the cool thing also. I can say G dot send of none, and that stops the service. Then what happens? Well, it's false in the while. And so it exits from that. Fantastic. Here's another example. Let's get weather forecast. So I'm going to have here a service where deaf get forecasts of city. I'm going to pass at the number of a city. You can see the like world weather dot WMO dot inch. This is like the world meteorological, meteorological organization. So you could basically pass the number for any city you want. And you'll get back JSON indicating what is the weather forecast. So you can see it a little deep. I had like, you know, poke around, but we're getting weather for that city, for that forecast, the forecast day. And I'm going to get each of the forecast for the next two days. So if I say G equals get forecast of 44, I can get a next and get a next and get a next. And I get each of those in turn. Right. Looks pretty great. Right. Sort of. That's not a co routine because our yield is yielding. It's not actually getting anything from us. So let's rewrite this a little bit. I'm not going to rewrite the same function. It's still going to be a generator function, but no longer am I going to pass the city ID as a argument. It's not going to be captured by a parameter rather on that first line there. I'm going to say while city ID colon equals yield, send a city number or not. So if the person sends none, then well, the coroutine exits, but if someone sends a city number based on the numbers in the World Meteorological Association organization, which we find. And so now I can say get forecast. We started up next G, my primate. Tell me about city 44 and 44 and 44. Right. We're going to get it again and again and again, but I can then say send 44. I can get like, I got to the next one there and I'm going to send 45. Give me the next city and the next one and the next one. I'm interested in each of these one at a time and it's going to sort of forward. You can see for city 45, we're getting the next one here. We're getting the next one here. And so basically when we get to the end of the forecast for a particular city, then it comes back to this and said, okay, I'm done with that. Send me a city number or none to exit. Here's a third example and then we're going to pull this together. Then you're going to knock your socks off. So if you're not wearing any. So here I have a database connection, right? Connect using psychopg2, which is a pretty standard PostgreSQL connector in Python. And what am I going to do? I'm going to just like select from up. I created a tiny, tiny database for this example that has first names, last names and birth dates from people. And what is it going to do? Well, it's basically going to have a really primitive sort of API. It's like an object relational mapper and ORM where this is going to get a dictionary and it's from the caller and that dictionary is going to allow us to sort through the results that we only get a few results at a time. And then we're going to iterate over all days and return them. Let me give you an example of how this will work. So I'm going to say G equals people API. So I'm starting up my co-routine and I say next to prime it and then saying send a query and under quit. So I'm going to send a query and the query is going to be in the form of a dictionary. So I'm going to say here last name learner. Send whatever. Okay. And then send again. And now you can see that I'm getting different things and I get a next one and next one and next one. But really we're doing okay on this front. Like we have this nice little object relational mapper thing going on. And I can query my database through this service. I don't have to worry about connecting to the database. Now how can I tell the generator that I'm completely done? I'm going to use the close method. That tells the generator and now. Oops. But what if I want to exit from the current query? Then wouldn't it be nice if I could send a signal. Tell it, hey, I want to finish. Well, guess what? We have another method. We have the throw method. So I can exit early from the generator with close. And I can send it an exception with throw. So look what happens here. I've now got this try accept block here. So if someone on the outside wants to get out of the current list of weather forecasts, they can do that just by raising an exception. So we've now got this whole API going. So watch this. I can get the forecast. So I can do the same with the primary one, like the prime said. Send 44. Send 44 for the next thing. We're going to raise the exception different city exception. And it gets out of that loop and allows me now to get a different city altogether. Our nano service is available through the simple request response interface. And I can cache data. I can keep state across. I can do whatever I want. Now what if I want to have lots of functionality in our code Q&A time, but I will be in the breakout room afterwards. I just want to get to this last point or two. Watch this. I'm going to combine md5 and weather into the same coroutine. So I'm now going to have my combined generator, and it's going to say, hey, send one for weather and send two for md5 or none to exit. So if the person selects one, if they send one to the generator, they'll get this. And if they send two to the generator, they'll get that. But this is ugly as thin, right? This is terrible. The whole point of having these nano-services, the whole point of coroutines is to break things up. We want things to be small. So I know what I'm going to do. I'm going to refactor it so it's much smaller. I'm going to have a combined generator here. And it's going to say, if s equals equals one, we'll call the city generator. If s equals equals two, we're going to call the md5 generator. Guess what? This will not work. This will not work because how we'll throw, go from our outside generator to this inside generator, how will clothes go from the outside one to the inside one? It will not work. What we need to do is yield from. Yield from is this piece of syntax that no one seems to understand. Everyone's like, oh, that's instead of doing a for loop on something internal. No, no, no. It's so that you can have a generator, a coroutine that has, in fact, a bit other generators and coroutines, and it will pass through the calls to close and it will pass through the calls to yield and throw and yield from. It allows this to be a conduit from the outside world inside. And with this in place now, what does yield from do? Bidirectional communication. So with this in place, I can create my combined generator. I can choose two to get md5s. I can then none back up to the top. I can then choose one to get weather. I can do that. I can even raise an exception. I can throw the exception there. So basically yield from allows me to write these coroutines in a decomposed way, breaking them apart so that I can get them into a more serious kind of place and take them seriously, not have either these gargantuan generators that have lots of functionality in them or just ignore the functionality altogether. Now, I have to say a few words about async.io because everyone in the async.io world says, oh, those are coroutines. What do you mean these are coroutines? Well, these coroutines came first. The generated based coroutines came first. And this is how async.io was originally implemented. Nowadays, we have async.f, we have a weight, we have a whole new set of keywords that we can use. But the basic idea of async.io comes from generators, comes from coroutines. And this is what's known as cooperative multitasking. This is what operating systems used to do back when your grandparents were using computers. Okay, even when I was using computers like in college. But basically, it means that a process, a task will run up to and including the yield that it has there. Now, all this is very nice, but should you use coroutines? On the one hand, they're really useful. They're speedy in-memory arbitrary API. They can allow us to work with large data one chunk at a time and we can divide them into smaller pieces using yield from. But they're not so well understood. It definitely seems weird to many people to use send in this way. And when things go wrong, debugging can be kind of hard. But I think this is an interesting way of understanding and working with certain types of data and certain types of functionality in Python. And I hope that this opened your eyes to how these things work. We still have a few minutes left for questions. I'll be very happy to take them either now or later. And contact me afterwards if you want. I will be in the breakout room as well. Hey, the camera is on and I think you should then head over to the breakout room. Because there was a lot of activity in the chat and we will not have time for any complicated question where anybody can learn something. So just a short reminder that the whole slides for this talk are linked in the schedule. So you can just get them and look at them again. And there's also information about how to get your newsletter, which I really recommend getting. Okay, so let's get you over to the breakout room and after a short break we'll continue here with the next talk.