 Hello, everyone. I'm Rajat. I'm a student at IT Madras here. And I'm going to talk to you today about generators, right? So to start off with, we all know what iteration is, right? We just want to repeat a bunch of statements. And the way you do that in code is you use loops. And as an example, I'm going to talk about, let's say, we want to iterate over the squares of natural numbers. Now one very naive way to do this would be to write a function like this. And it's a simple while loop. Everybody knows how to do this. And you run it, you get your list of squares. I mean, sure, you could do it using a for loop, a list comprehension. But in the end, you'll get a list of squares. And if you think about it, this has some problems, right? First of all, suppose we don't know how many squares we want. Suppose there's no limit on the number of squares we want. And we don't know that beforehand. So in that case, we can't really use this loop. Then you could say, well, I'm probably not going to need more than a billion squares. So I'll just call squares of a billion every time. But then that's inefficient in terms of memory because I'm going to store a whole list while doing this iteration. And I'll probably just want to work on one element at a time. And the way we solve these issues is using what's called lazy iteration, right? Most of you would be familiar with this. And before we get into that, let's just talk about how Python actually does iteration. So when you write a for loop like this, what exactly is happening? So what Python does is it looks at what you're for looping in. In this case, it's squares of phi. And then it calls the iter function on that. And that pops out an iterator. And then what it does is it takes this iterator and it keeps calling next on it. And when you call next on it, it raises a stop iteration exception. That's when it exits the loop. And if you do for i in squares phi, you get your phi squares. So now, going back to lazy iterables, right? So using this scheme, we can easily write our own version of a square's lazy iterator by just implementing iter and next, right? You, in your iter, you just set your n to 0. And every time you call next, you return n square, right? And you can get your infinite squares. And it's only one square at a time, so no memory problems, right? So that's why I had to put in this if condition in there to make sure I could break. Now, where generators come in is that they're basically an easy way to make lazy iterators, right? The way you make a generator is you use a special keyword called yield. Whenever you stick a yield inside a function, that whole function is no longer a normal function anymore. It's a generator function, right? Now, you look at this function initially. It has like an infinite loop in it with a yield of i square, right? So to see what happens, let's just run it, right? That's the best way to get to know what happens. OK, well, you run it. You get something like a generator object. OK, well, first of all, you're not getting any squares. And well, there's probably no infinite loop running either. So it turns out if you want to get the stuff out of these generators, you want to get what they yield, you need to use next. So if you call next on a generator, on this generator, you get one. If you call next again, you get four. You can keep doing this. You can keep calling next, and it's keep giving you squares. So let's break this down line by line to see what's happening. The first time you call next is when the stuff in the function actually starts executing. So in this case, it would set i to 1, enter the while loop, until it hits that yield statement. Once it hits the yield, the function stops executing. Well, to be precise, it pauses executing. And it returns whatever you were supposed to yield. In this case, one square, so that's one. If you call next again, the function continues executing from where it left off, right? So it continues from there. And it retains all the local variables in the function. So in this case, the next statement it would see is i plus equal 1. But it remembers that i was 1 before. Becomes 2, then it goes back, loops back, and hits the same yield again. And that's why you get 2 square equal 4. Now it's easy to see why you can keep calling next and getting your infinite squares. So the main takeaway from this is that these generators are stateful objects, which retain their local namespace. Well, this is all fine and dandy with infinite yields. But I could just as easily write a generator with finite yields, right? Like this one, which just counts down from some n. And then there are no more yields. Well, what happens then? Let's just run and see. So the first three nexts give r3 to 1. If you call next again, it raises a stop iteration, right? So what this means is that if there are no more yield statements, or in simple terms, there's nothing left to generate, it raises a stop iteration. Now, if you think about it, this fits perfectly into the iteration protocol we already had, right? So you could stick in a generator in a for loop and it'll work beautifully, right? There are no errors thrown and you didn't need to bother calling next every time and you can get your squares or whatever numbers pretty easily. Now, a really nifty trick is that you can actually make generators using one line generator expressions. So you don't actually have to use yield. And they're basically the same as list comprehensions with the only difference is that instead of using those square brackets, you use normal brackets. So you look over here, we have this generator expression i square for i in range 10, and it's a generator object and it works exactly like the same squares generator object we had before. Now you might think, oh, well, I can't make my infinite generators with these, but these things actually have a really nifty use case in terms of pipelining data, right? And they let you create a declarative style of working with data pipelines. So let's see an example. Suppose you have some records, a text file, right? It has a bunch of records with a bunch of columns with numbers in them, let's say it's like you're training some machine learning model or something and the second column is like losses or something. And what you want to do is get the minimum loss in the second column. You can do this with generators pretty easily. So first we'll open this file and we'll get this source object, right? And now I'm going to make a records generator expression which basically takes each line in my source and then removes any white space at the end. Then I'm going to pass this records generator into a new one called losses and all it does is it extracts the second column, right? You look at the code and it converts it to a float and then I'm going to pump this final losses generator into the min function which consumes a generator and we get our result. Now you can do some more fancy stuff with this, like let's say for some weird reason I wanted to sum up all the numbers in all the columns after the first one, right? You can do that using the same sort of paradigm, right? Now notice that what I'm doing with my records is I'm getting for each record, I am looking at each column in each record after splitting it, right? And then the columns generators basically give me the value of every single number in every column, right? So this is sort of like a stacked for loop list comprehension but now it's a generator expression. Then once you feed it into the sum function you get the result you want. Now that seemed like a kind of cooked up example and maybe most people wouldn't do that with their machine learning models but one example which is really real world is like an image processing pipeline, right? Let's say you have some data set of images and labels. Well, a pretty valid pipeline to apply to them would be let's say you want to whiten the images, right? Normalize them, then maybe you want to augment them and when you augment an image you like rotate it, translate it so one image gives out many more so that's why you would use something like that stacked for loop generator expression then maybe you want to play around the labels and stuff. So you get these three lines of code but if you look at it closely when you run these three lines of code nothing actually happens, right? This is as lazy as it could get. It's only until you push in that last generator into some processing function that some execution of code actually happens, right? And the main point which you should take away from this is that nowhere in this whole pipeline did we actually store a big list of items. Each item was passed through the pipeline one at a time, right? And it's much more readable because it's a declarative style it's pretty easy to see what's going on and in fact, if you test it out it's actually a bit faster than if you were to write it using traditional for loops. Now, so far we've seen basically generators generating stuffs but you can actually send stuff back into generators using that same yield keyword. Basically you can put it on the right hand side of an assignment. You could say something like A equals yield. Now, first line that looks a bit weird but to see how this works let's just run it, right? So we get our receiver generator. You need to call next on it first so that you move to that yield statement. And the way you send something into this generator is you use generator dot send. Let's say you send something like 24 it prints I dot 24 because whatever you send is put into the variable A and if you notice incidentally it also raised stop iteration because there were no more yields after that. Now, because of this you have to call next on a generator first before you could send anything into it and if you try to send something before calling next on it Python will complain it says you can't send a non-none value and incidentally you could also figure out that you could send none instead of calling next anytime. Now you can use one yield statement to do both things to both receive stuff and return stuff. Now this is a simple example. All it does is it returns whatever you send into it. It's an echo generator. And if you run it that's exactly how it works and it has this weird A equals yield A statement in it, right? Now you look at it it's like, okay, it's kind of hard to make out what's exactly going on. So the way you understand a variable equals yield expression statement is you split it up into two different statements. First one is just yield expression and then the next statement after that is variable equals whatever you send, right? So what this means is when you call next and if you were to end at this yield statement all you do is return whatever you're gonna yield and then if you were to send something and start off from this yield statement whatever's on the left hand side gets assigned whatever you send, right? Or to put it concisely, what you yield doesn't affect what is assigned to the left hand side despite what the syntax may suggest. Now a pretty cool example of where you could use this is let's say you want to build a running average, right? Now it's pretty easy to do this using our good old object oriented programming principles, you know, make a running average class, send function which accumulates whatever you send in a running sum and increment account and return the average. Pretty simple. You can test it out and it works as you'd expect but you had to write a whole class to do this, right? With generators you can do this using just one simple function and that looks like this. So let's look at that yield statement in the while loop. It says running sum plus equals yield. Now let's not look at what we're yielding because when you send something in that doesn't matter. So whatever you send is accumulated into running sum. Now that's clear. Okay, what does it return? All it yields is running sum by count which is the average and if you test this out, it works exactly the same. Not only is this code much shorter and cleaner, it's actually a bit faster, right? If you time it, it comes out to be approximately 10% faster. And yeah, you can look at the code in comparison to see how much shorter it is. Now, what you can also do with generators is stack them. We already saw a bit of this with the multiple fours in one line generator expressions and let's look at a simple example to return to that topic. Let's say we have one simple generator just yields three stuff and another one which says, well, first I'm gonna yield everything in the first generator and then yield my own stuff. So when you run this, you get what you'd expect, gen one stuff and then its own stuff. Now there's a nice shorthand in Python to do this called yield from. So instead of writing that for loop, you can say yield from gen one. So what this means is it'll delegate all next calls into gen one. Then once gen one finishes, you go back to your own yield statements. So it can roughly be translated into that for loop expansion but it actually has a lot more subtleties which will get into the later part of the stock. Now before we go into that, I wanna show you guys a really nifty example of where you could use stacking generators to do some crazy stuff. So we all know what binary trees are, right? And you can traverse them in order, pre-order, post-order and stuff like that. And basically I wanna write some code to do that. Now before that, let's just set up our binary tree data structure. Very simple, left and right subtrees with some data. And this is the example binary tree I'm gonna work with, six elements and if you look, you traverse this in order, you would get one, two, three, four, five, six. Now if you wanna write a function to do that, it's pretty easy, right? It's good old recursive nature of binary trees. You get the in order list of the left subtree, then the right subtree and then return them like literally in order. And you run this on your route, you get what you want. If you wanna change it to pre-order, very simple, just change what you return. But then suppose your tree is very huge and you wanna do this lazily, right? You only wanna do this in order but one element at a time. And you can do that with stack generators, right? So now this function at first may seem a bit crazy because, well, let's dig into it. First, it's definitely a generator because it has yields in it. And then the first yield from is from itself. It's yielding from in order generator of the left subtree. So not only is it a stack generator, it's a recursively stack generator. When you yield from the left subtree in order, then you yield your data and then you yield from the right subtree in order. And when you run it, it works because we have that base case up there so that this doesn't run into some infinite yield from recursion. And if you wanna change it to pre-order, again, it's a pretty simple change. You change the order in which you do the yielding. So this version is way more sleek and memory efficient. Now let's get back into how yield from actually works, right? It's more than just a for loop expansion. First of all, remember how you could send stuff into generators. If you just do that for loop expansion and you send something, it is not gonna be sent into the generator you delegated to, but yield from takes care of that. Another difference is when you put yield from on the right-hand side of an assent, right? That functionality of yield from is actually what's central to asynchronous concurrency in Python. Now, let's go over some definitions which would help us in understanding how concurrency exactly works. And the first thing you wanna know is what's coroutines. Coroutines are basically the building blocks of concurrency and, in short, they're functions which basically bounce control between each other. They can execute a bit, pause. Let's, it says, okay, I'm gonna stop for a while, why don't you execute, then I'm gonna continue. So the main thing which they need to be able to do is that they need to be able to pause and then retain their state. Now, that's exactly what a generator does when it hits a yield statement, right? And that makes these generators prime candidates for implementing coroutines in Python. Now, why would you want to do concurrency, first of all? Well, let's say your function has some blocking operation, like IO or a HTTP call or something. And all you're doing is just waiting for that thing to happen. What you could instead do is go to another coroutine and execute some stuff. And once this blocking operation is done, you could come back and continue executing, thereby minimizing how long your CPU was idle. So let's just look at an example coroutine using our yield statement to pause, right? So we have a simple one here. All it does is it sums natural numbers until n, but in reverse order. And then there's a yield at each iteration. And I'm only doing that just to pause it at each iteration. But what's different about this generator from the ones we saw before was that it has a return statement. Okay, so now these generators can yield and return. Okay, so the way return works in generators is that remember when it hit the end of a function, it raised the stop iteration exception? Well, all the thing, all that's gonna change now is that it's still gonna raise the stop iteration exception, but the value of that exception is gonna be whatever's gonna be returned. Now I'm gonna write another coroutine, a very simple one. And this is where I'm gonna show you how yield from works when it's on the right-hand side. So it's a pretty simple one. All it does is it prints the result from some other coroutine. Now, the way this works is all next calls to print result would be delegated to this other coroutine because of the yield from. But once that coroutine is done, it doesn't actually raise the stop iteration exception because yield from takes care of that and it takes whatever that return value was and it puts it into result. So basically what this says is run this coroutine and once it's finished, tell me what it returns and it puts it into result. Now these coroutines by themselves, sure it's all nice, you know, you can use yield to pause them and stuff, but by themselves they really don't know how to do this fancy bouncing control around each other. And to do that you need some sort of overarching thing which is called an event loop, right? Now I'm gonna write a very, very simple example of an event loop and so it takes a list of tasks. Tasks are synonymous coroutines in this case and while you have some tasks to run, all I'm gonna do is I'm gonna look at the first task. I'm gonna try to run it, try to call next on it. If I get a stop iteration, that means, okay, it's done. I'm gonna say completed task and I'm just gonna remove it from my list of tasks. If I didn't get a stop iteration, that means it stopped at some yield and it still has stuff to do. So what I'm gonna do is I'm gonna push it to the end of this task list. Basically I'm implementing a queue here. So effectively what this does is it's a round robin schedule of coroutines. And we can see this in action. Let's say we have two of those print result coroutines and one which sums up to three and one that sums up to five and if you run this, you can see the prints, they're alternating. There's a three, then a five, then a three, then a five, then a three, then a five and once three is done, it says result equals six. Now this is the print statement from our print result coroutine. So whatever result was, yield from put six into that and print in it. Then it says completed task. This is from our event loop and then the five coroutine completes. So the main point here is that these two functions were executed alternately and that's basically what concurrency is. Now the way you do this in Python now is you would use the keywords async and await, right? Now they were more or less using the same principles as yield from, but the async IO library also offers some really nifty tools, one of them being an event loop and also async versions of blocking operations like sleep and IO. Let's go back to our example again of summing up, but this time let's write it using async IO and let's also put in a delay at each iteration step. So the way you write a coroutine using these async and await keywords is well, first you just write async def. That basically says, okay, this is a coroutine, right? You don't need yields and stuff. Everything's the same, but now at each iteration I have two choices. I could use the normal time.sleep function or I could use the one provided by async IO. The normal time.sleep, it's not a coroutine, per se. So I can't really control when I can exit it and enter it, but the async IO.sleep is a coroutine and the way you execute a coroutine is you await it. So you can read await basically the same way as you would read yield from. So in this case I have the option of using either form or sleep. Now in Python 3.7, if you want to use the event loop, you can actually run multiple coroutines concurrently by gathering them into one single coroutine using async IO gather and just await that single main coroutine. Now let's try to compare how using the two different types of sleeps would affect what the result is. So now because I'm running this code in a notebook, I'm just gonna write await main directly, right? I also have two statements to time it, but if you were to write this in a script, you would use async IO.run of the coroutine, right? Now this is the first example without using the async sleep. So this is just normal time.sleep. And you can see that first of all, they're not executing alternately, right? Three executes first, finishes, then five executes and finishes. The whole thing takes 11 seconds because the two coroutines I gave had two different durations of delay. The three one had a delay of two and the five had a delay of one. So that sums up to three into two plus five, 11 seconds. But if you were to use the async IO.sleep coroutine, they run alternately, not exactly because of the difference in delays. And the main thing is that the total time taken is just six seconds. So if you think about it, once the three coroutine starts and it hits that sleep, because it's an async IO.sleep, the event loop knows that you don't actually have to wait over there and it bounces to the five coroutine. And once it finishes its delay of one, this three coroutine is still waiting because it has a delay of two. So that's why it runs two five iterations. But the main thing is that the three sleep and the five sleep are happening concurrently, right? We still have only one CPU executing, but the blocking operations are happening concurrently and that's where you get the time gains, right? So now the async and await I've shown you so far is a pretty simple example and I'm not gonna go deeper into this. And if you want a really good understanding of what's going on there, you should probably attend the async workshop tomorrow easily. So to summarize my talk, we started off by looking at just a lazy iteration, very simple thing and we saw how generators could help us do that. Then we saw that you could actually make generators using one lines, neat expressions and then we saw how actually that could help us write a more declarative style of data pipelining. Then we saw the cool ability to send stuff back into generators and made some fun stuff with that. Then we saw that this yield from was more than just some fancy way to do lazy iteration, but was actually central to how you would implement coroutines in Python and how they form the foundation of async. So yeah, that's the end of my talk. Thanks for listening and yeah, here are my links. Okay, thank you Rajat for the talk and we have like time for one question. Any takers? Only one question. No? Okay. Hello. Hi Rajat. Yeah, hi. Great time and great talk. Yeah, thanks. For this introduction to generators and so my question is how would I visualize these yield statements and the control flow of the, because I have a little difficulty understanding how the total flow is. Okay, that's an interesting question. So what exactly do you mean by visualize? Any profiler tools or any? Okay, profiler tools. I mean, I'm not aware of any tools exactly which would show you how control is moving between these yield statements. So I guess one way you could think about it is you could split out the function manually and see how it works. But yeah, I'm sorry, I can't, I don't have any tools at hand per se to do that. Just a follow up question. Yeah. If you have used debugger, how would that work in this context? Does it have same thread issues when? Sorry, I didn't get that last part. So if I use a debugger, like PDB, sorry, some other debuggers, there are issues when I use threads, so does it connect? Yeah, I think there are no problems once you use debuggers because any time you run, if you were to stop in the middle of a generator, it has a local namespace and stuff and the debugger can access that, so there's not really gonna be any difference. Yeah.