 So, this talk is motivated by a use case we had in scikit-learn. We needed to do something, and we were like, how do we do that? And it turned out to be a very cool thing we did. And then I thought, why not talk about it? You can find the talk on the GitHub repo, so the slides are there already. I did, a bit about me, I did my PhD in cancer diagnostics and did like algorithms and machine learning there. Then there's some machine learning consulting, and these days I mostly work on open source stuff like scikit-learn and scops and fail-in. There are some prerequisites to understanding the talk. It would make it easier, like knowing that functions can take a variable number of position logs and keyword arguments, set attribute, get attribute, those kind of things. The fact that in Python, if you have very specific and dunder methods in certain places, under certain conditions, they're called. With their output, something is done, or they have a side effect. Inheritance, type annotations, not necessarily, but we use them. And then that when you do help of an object, it tries to read certain information about that object and show you. So, what is the motivation for us to do this thing that we're gonna talk about? In scikit-learn, we have estimators, and then we have meta estimators. You can put estimators inside meta estimators, and these meta estimators will route certain things to your estimator. Like, you put a transformer and a classifier in a pipeline, and then you fit it, and then it passes x and y to your transformer, then it passes it to your predictor. But you can also have certain metadata, like sample weight would be a metadata, or in this case, my custom transformer would have a sample weight and another metadata. And what I wanna do is to be able to say, hey, my fit is requesting sample weight, it's not requesting this other metadata, don't use that. And then put that in a pipeline, and then call fit on a pipeline, and the pipeline should know how to route things. Right now, you can't do that, you almost can if you use the nightly build. But the idea is for us to have that, and we want it for this method. So, there are certain requirements on this method. One is that we don't want to change our estimators. We want that method to be dynamically generated, specifically because we also wanted this thing to work on third-party estimators. If you inherit, if you're doing your own estimator, you inherit from the base estimator, these methods should just exist. And we didn't want it to have a generic signature like we accept everything. No, if your fit accepts only sample weight and this other metadata, then you should only be able to do that here. Not any other random metadata. And we wanted a good doc stream. Seems very basic, turned out quite interesting. In short, we're going to use different pieces to build this puzzle. We're going to use introspect functions. We're going to use the signature object to read and create a signature object. And then we're going to use a descriptor returning a function to add these methods to the estimators. And then we use init subclass to create the right descriptors attaching them to the estimators when needed. Now we talk about these steps. So inspect and signature. Let's say I have a function f, it takes a, it's type hinted, and then it takes a bunch of position logs, a keyword argument, and variable keyword arguments. Then I have a class a, it has a method with a very similar kind of signature. Now I can check if f is a function. Yes, it's a function. Is a.g a function? No, it's not. It's a method. We'll see immediately what really the difference is. But is a.g like not an instance, but the class a function? Yes, that one is a function. We can do other things. When you get the signature, it returns an object which has information about whatever input your method takes. I can go through them. It has this nice dictionary. I can go through that dictionary. It gives me the name and different things. Like what is the kind of this argument? What's the default value? And are there any type annotations on it? If I run that on f, I get what you expect. A, rs, b, kwrs, and you see like they have different types. Like positional or keyword. Or this one is a keyword only one. Is there any type hint here? Or is there a default? If I do that on a.g, I get everything including self and all the other arguments. But if I do that on an instance of a.g, I get the same thing except self. And that's pretty much the difference between bound, like changed, but like think of it bound methods and not bound methods. There's a lot more to inspect. You can read it like there. But this is what we're gonna need for the rest of the talk. Then next thing, let's say I want to have a function that adds five to my input. Easy. This is how you do it. Now imagine I want to create a bunch of these functions that add different numbers to my input. I can create a function that creates a function inside it and then returns that function. And then I can use that. So here I have a create adder. It creates a function. And then here I return f. I don't return f of something, I return f. And then here the output would be that function which I can actually call. I could do that same thing with a lambda expression. We're not gonna use that because our functions are complicated, but you could do it. Oh, this is a lot of you. But any major questions up until this one, I don't want to lose too many people. If you do have questions, every now and then I'm gonna ask if you have questions, like move to the microphone. We can have it like a little bit more interactive. So now we have a function. We want to change its outfit. Let's say I have a very simple function. It accepts a bunch of position logs and it calculates the sum of the inputs. And here I get help on f. When I run that it says, well, it's f. It takes a bunch of arguments. And this text is what you have here as your doc string. Where are these things stored? Well, first of all, I could call it. It does what it's supposed to do. The donder duck includes the doc string. The donder name includes the function's name. And then when I do inspect.signature, it gives me the signature of the function. Now what I want to do is to change all of these. I'm gonna say, I'm gonna change your name. f.name is adder. I'm going to create a signature object and assign it to donder signature. Basically if a function or a method does not have this donder signature, then signature tries to read the actual signature. But if this exists, then inspect.signature would return this. Which is also used by most of your IDEs and whatever tool you use to develop, which would mean that the hints that you get would be more meaningful now. What I'm doing here is I'm saying I have one parameter called a. It's positional or keyword. The default is zero. And it's an optional float. I have another parameter b. The same thing. And then here in my signature, I say I have these parameters and I'm returning a float. I also changed the doc string of f. Just the whatever string you wanna put here. So I do that. And then I get help on f. Now I get a much nicer thing. You might ask why bother? I could do these things on the method itself, but I'm going to be generating my methods dynamically. So I would need to set these dynamically as well. And now if I try to get the signature of f, I say that I can only pass an a and b. But is that really true? What happens if I do f123? It works. You're not changing the actual function. You're just changing things about the function. That's why my way of saying that is that I'm changing the outfit of a function, not the function itself. You could try, inside your function, you could try to simulate the same thing that Python would do if this was the actually the signature of the function will show somewhat how to do that later. Question still now. Easy. So the Python signature object, the way you're using it, it was introduced in this pep, I think in 3.6. You can read more about it there. At this point, for me, the best documentation in Python are these peps. They're very extensive, really nice. So now the script tours. What's the descriptor? A descriptor by definition is a class that implements don't forget. And it's a descriptor when it's assigned as a class attribute. Not as an instance attribute, but a class attribute. Wait. I feel like something's not working. Gods of demo. I'm going to disconnect and connect again. Let's see what happens. I hope I don't have to reboot. Mm-hmm, mm-hmm, mm-hmm, mm-hmm, mm-hmm. Best thing that can happen during the talk. Yes, because I had to do, I had to, everything collapsed. I'm trying again now. Let's see what happens now. Yep, it should work. We did it. Okay, let's, just to be sure, I'm going to run from top. So, descriptors. It is this thing, it's a class that implements don't forget. And it is, like it behaves as a descriptor when it's assigned as a class attribute. Not an instance attribute, but a class attribute. What happens is that if you do that, and then you try to access it on an instance, then Python will call this dunder get, and it will return whatever this dunder get is returning. I've put a few print statements for us to see like the sequence of things happening. When we run this, first we pass here, then we go to the descriptors in it, here, and then we try to access pet here, and then we access it. And when we access that, we go inside get. So you define that, first it's like this thing runs, and then when you define that, well, you're creating an instance of pet here, and then you go down here. Is this kinda clear? And then what's important here is that your get also has access to the instance that owns you, the instance of the object that owns you, and the type of the object that owns you. So we could also try to use that. Here, I'm changing a bit, I'm this string, I'm say, well, I'm myself and my owner is this. And then here in my instance, I would add a name. And then I set the instances name here, so this would be the constructor, and then it comes to the get of the descriptor, and then I would have access to instance.name. And now it means that I have access to the instance, I can not only just read it, I could also change it, also the type itself. Now we're getting closer to the point where what do we do for the request methods that we had? So this is like much closer to the actual code we are using now in scikit-lan. I'm gonna get the keys, these are the list of arguments that my method, that set fit request method accepts. This is a descriptor, and as we talked about, this descriptor can also return a function, among other things. So we're going to create this function and return that. In this function, I check if the given keyword arguments has anything that I do not accept. And then if that happens, I raise a type error. A type error is also happens to be the same thing that Python raises if you try to call a function that with arguments that the function doesn't accept. So we're kind of here trying to like simulate the same behavior. And then for everything else, we have a side effect on the instance. We just set some attribute. So for example, if you say sample weight equal true, we are gonna set request sample weight equal true. That's what this set attribute does. And then I'm going to return the instance which would result in me being able to chain my methods on the instance. And then here I return F. And then now I can assign this descriptor to an estimator class. So if I have an estimator, I say set fit request is this request method and only accept sample weight. And then when I do that, then I can have an instance of this estimator. Then I try to access set fit request which returns that function that I defined. And then I can call that function and say sample weight true. And when I do that as a side effect, this estimator would have this request sample weight set by this line. And then if I change that to like blah, it would also be blah. If I do that with a parameter that is not expected, I get a type error. If I get the help of that method, not very helpful. So let's try to fix this one now. I'm going to have a helper function to return a signature which would be the desired signature of this thing. It requires the owner, the object owning it and the type of the object owning it and the keys, what does it accept? The first argument is an instance method. So it accepts self. It's a position or keyword argument and the type of it is the owner. Then I extend that for every key in these keys. I say create a parameter. It's a keyword-only parameter. The default is none and it's an optional bool. And then I create the signature object with these parameters and the return type is also owner. That's because I was returning the instance itself. And now I'm going to change my descriptor slightly. Everything that stays the same. I'm going to add the method name here which would be the name of the thing that I'm using in the owner. So if it said fit request, this method name would be set fit request. And here I'm going to set the name of the function, some constant doc string that in reality we're going to do a much nicer one. And then I set the signature with that input function, the helper function that I had. And now when I create it, it's the same except that I also say hey, I'm actually the set fit request. And when I get the help, I get this nice thing. It is set fit request. It takes sample weight as an optional bool. It returns the estimator. We're almost there. There is a really nice how-to on descriptors. You can do a lot more. There's a lot more to descriptors. And you can also have things like what happens if the user tries to set a value or they try to delete something, they're all there. The next thing is init subclass. We're not finished yet. We need to generate these descriptors. Like this is now hard coded. We need to make that dynamic. We need to make that dynamic in a parent class. Again, as a reminder, we had an estimator. It has fit and we want to have this set fit request, but we want this set fit request to be generated automatically. We create a parent class with this init subclass method in it. What this does is that for every class that inherits from you, it calls this thing. And then this thing would have access to the class, not the instance, and then you can do things to it. So here, for example, I'm gonna have, I set the attribute to five and then I call super init class. And then I create a child class with some init. That's all. Let's see the sequence of things here. This one is also very interesting. Here, I'm printing the class that I get. Here, I have a print statement in it. Here, I have a print statement. And then I access this attribute. So the way it works is that first, we said yet we define this parent class, but then before defining this, before we get here, we get this print statement. So as you define your class, this init subclass runs. And then once this is defined, then it's done, then you can create your instance and access your attribute, in which case, in this case, it would be five. Now let's try to get closer to what we actually wanna do. Now here in my init subclass, check if my class, this is the child class, has a fit and if this fit is a function. If yes, then I'm going to print the signature of that function. That's all. And then I have an estimator inheriting from parent and then it has a fit method. When I run that, I get the printed version of the signature. So it's kinda working. Now developing on that, we have that. Yes, we have fit, it's a function. Now I'm gonna get the signature of that fit. I'm only interested in parameters that the parameter name is not self x or y. Because in the scikit-learn API, self x and y, I mean self itself, and x and y are not metadata. And we say everything else is going to be some metadata that I need to route. x and y, I have a very explicit routing for them. And then for that, I'm gonna say, well set an attribute on class. The name of that attribute is the set fit request. And the value of that is this request method, which is the descriptor that we just defined. And I say the parameters that you should accept are these parameters and this is your name. And then my estimator would inherit from that parent. Now the estimator dot set fit request just exists. And if I get the help, it's what I wanted. We do in scikit-learn, we do the same thing, except that we also loop around all the methods that we care about, like fit, transform, fit, predict, like all of those. And then this in itself class is a much easier way to do these things that you could also do with the ABC meta classes. And it's this pet for it, you can read, it's really nice. So in summary, we use the inspect and signature. And then we did a descriptor returning a method with the custom signature and doc string. And then we use in itself class to customize all the child classes. Thank you. Can I take questions? I mean, don't be shy, like last time I gave this talk it took an hour and a half because people were confused. Okay, so thank you for your talk. I did have one question, though I'm afraid I already know the answer. I really like static autocompletion, especially for when you start using type hints in Python as you are. Using either Jedi or Pyrite or any other sort of language server protocol. Do I have any options to translate all this to static autocompletion? I don't know about Jedi, but one of the reasons we wanted to have a non-generic signature was exactly that. If I try to get my hint here, I get what I set. So my IDE knows exactly what the signature of the method is. So if your autocomplete is doing the same thing that like here, I Python or VS Code or like any other one does, it should work. Okay, thank you. Is there a way to actually create functions that dynamically not redefine the name in the init subquasm method? Yes, you can make things a lot more dynamic. If you, so when you do like creating functions dynamically, I think then you mean attaching them to a class. Right, okay. You have this thing in Python that you can override two things. One is I think get attribute. This one is called in certain ways and then the other one, def get adder. This one, if you customize this one or like both of them, then you can return anything you want when a user tries to access anything in your class, which means that if you have implemented those and then I do like the estimator that blah, this blah comes to your get adder and then you can be like, okay, now I want to return this thing and that thing could be a function. And then that would be actual runtime. Cause what I'm doing here is like when you parse and then like your class is set. But if you want it to be done runtime, then you can do that through get adder. Hi, thanks. I have a question with the descriptor protocol. Is there a reason why you're not making use of the dunder get name? Or is that because you're doing a set adder on the class that get name is not being called? Yeah, so. Love this question. Somebody asked it. So as I said, descriptors expose also other methods. One of them is set name. The set name would work in this case because what happens is that first you're gonna have this, then you're gonna have new called and then after new is called and then set name is called in new, after new in its subclass is called. And then because in this case, I am assigning the descriptors to my class in its subclass which happens after new, then set name wouldn't be called. Right, okay, thanks. Hi, so you're using a descriptor, but am I correct in assuming that you could do everything that you've done without the descriptor and just creating directly the function there? Or is there anything that does require descriptor and is there any other pros and cons in using descriptor instead of going for it directly? Yeah, so I could have just implemented these. It would have been tricky to change their signature because the signature of these functions depend on the method that you have in estimator. And I don't know what my child estimator would have as signature of fit. So I need some place to introspect that the signature of that fit method and then change the signature of this one. So there is something that I would need to change. Also, inside the function, I would also have to inspect the signature of the other method to make sure that the parameters that I'm receiving are exactly the ones that fit will also accept. So in effect, it would be the same as like doing the descriptor because you need to do some modification on method anyway based on the child class and it's easier to do that in a descriptor. But there are at least three different ways to do the same thing. This is only one of them. To me, it was the easiest one. Is there a way to access the parameters of a function in a way that can't be faked like you could with Dunder's signature? Ooh, not to my knowledge. All right, thanks. And there might be like no, so when I was reading the PEP on signature, my understanding was that there is a way, but we shouldn't do that. If we want people to perceive a function with a different signature than the actual signature, we should be setting the Dunder's signature. But there is an ugly other way. I think my time is over. Thank you very much.