 Obrigado muito. Eu acho que este é o auditorium de fãs que eu nunca falava. Eu espero que esteja em cima. Então, tipo protocolo, tipo hints, como o Guido entende. O título do talk é uma referência a este t-shirt, que eu espero que eu tenha, mas eu não. Programando o way Guido indentou. Então, é meu livro. Quase tudo que vou mostrar para vocês hoje, veio de um monte de pesquisa que eu fiz enquanto escrevi o livro. A segunda edição tem mais de 100 pages sobre tipo hints e outras coisas. Então, vamos falar de quatro coisas. O que é um tipo? Os quatro modes de tipo. Algunas exemplos de tipo protocolo e a conclusão. Então, o que é um tipo? Então, este é um quotes direto da PEP 483, a teoria de tipo hints, que foi escrito por Guido Fernossum e Ivan Levinsky. Então, há muitas definições do conceito de tipo em literatura. Aqui, nós assumimos que o tipo é um sete de valores e um sete de funções que um pode aplicar a esses valores. Para mim, o meu primeiro entendimento de tipo foi muito muito fócuo sobre o sete de valores, porque você pensa sobre, ok, você tem o sete de números complexos e então você tem os números reales e então você tem os íntegros e etc. Mas isso é no mundo pura da matemática. No mundo real, há algumas pequenas questões, a realidade é complicada. A realidade é mais complicada do que a matéria. Então, tenho um íntegro e converto em uma flor. Parece tudo bem, excepto que um não é igual ao outro. Na verdade, no sete de valores, não há valores que representam exatamente esse número. E se você importar o módulo decimal, quando você cria um instante decimal, ele mostra você, então, uma representação de estrangeiro com prédio completo. E, como você pode ver, eles não são realmente o mesmo número, ainda que eles são bem próximos, né? Então, um pouco de módulo do lado, é interessante que o tipo de flor que tem sido, tem sido por um tempo, é basicamente embedido em todas as cpus modernas, né? Todas as operações básicas com flor estão embedidas em cpus, mas é um tipo que é mais útil para cientistas e engenheiros que para pessoas, por exemplo, que vivem dinheiro, né? Que é, eu acho, a maioria dos nós, ou muitos dos nós, eu não sei. Então, eu sempre pensei que isso era meio estranho que a maioria das línguas mainstream não tem uma maneira muito boa de representar dinheiro, e as pessoas tentam usar flor para isso, mas isso não é ótimo. Então, o décimo é uma alternativa em Python para fazer isso. Mas o sete de definição de valores não é útil para outras razões também, porque Python não dá práticas formas para especificar tipos como setes de valores, excepto por um ano. Em prática, no Stander Library, nós temos muito pequenos setes, como o sete de non-types, que tem apenas um, um estresse, um bul, que tem dois estresse, e ஆ que, no outro lado, temos extremamente grandes estresse um com milhões dos estresse. Os int e os string types, por exemplo, os valores possíveis são apenas limitados pela metade da **** na máquina. Então, e também, nós não temos maneira de dizer, Por exemplo, um tipo de quantidade que eu gostaria de usar, por exemplo, em uma aplicação de e-commerce para validar que os ordens não são colocados com zero de um item ou mais de 1.000, isso poderia ser uma medida de segurança, né? Normalmente, é raro que você ordem 1.000 ou mais de algo em um store de retail. Então você não pode dizer isso. Não há um jeito, em Python, de dizer isso em um tipo de sistema para expressar isso, né? Por exemplo, também não há um jeito de dizer que o clube do aeroporto é o sete de todas as 17.576 combinações de 3 letters de asquias de uppercase. Nenhum desses setes podem ser expressados em um tipo de sistemas. At least not in the type systems of mainstream languages and certainly not in the type system of Python. So in practice it's more useful to think that int is a subtype of float because it implements the same interface. Basically the interface of float has to do with arithmetic and all the operations also apply to integers, right? But ints also have a few other methods, like for instance those that do bitwise manipulation, shift right, shift left, things that wouldn't make sense in a float. So that's the main reason why we can think of int as a subset of, not a subset, a subtype of float because it implements all the types, all the entire interface of float and an additional few methods. I put a couple of asterisks here because just yesterday I discovered there are a few float methods that int does not implement, okay, but they're kind of bizarre. One of them is one that generates an hexadecimal representation of the float. But anyway, as a good approximation we can think of subtypes and in general that's true for subtypes, that they implement the entire interface of the supertype in addition to more methods, right? So for instance, here I continue that example by showing that the pipe operator, which is the bitwise OR, you can use it with an integer but you cannot use it with a float, right? Another example of how it's more useful to think about the interface is that in Python's static type system, we have now the any type. And basically, because any value can be assigned to an object variable and any value can be assigned to an any variable, you can say that they are like the universe types, right? But the super important difference between them is that the object type implements a narrow interface, right? Pretty much every sub... Most subclasses that we create from objects add methods, right? That don't exist in objects. And but any, on the other hand, is this magical type that is required because of the nature of the gradual type system that we have, which means that, you know, sometimes you may have a piece of code that is not annotated or you're using a library, a third party library that is not annotated, so you get a result from there and all that the type checker can infer is that the return is any, right? And any satisfies every interface. It's like this magical interface that contains all possible methods that exist today or that will ever exist, right? So this is a crucial difference between them. So in general, if you think of subtypes as sets, you get the biggest set is in the top, right? Object is the set of everything. And then you go down, down, down, down to more specific types, more specialized types that usually have fewer instances. But on the other hand, the interface grows, right? At the top of the class hierarchy, the interface is narrower. And as you go down, it gets wider, more specialized and it gets more methods. So from that definition of PEP483, we are going to adopt that point of view that a set is a set of functions that one can apply to those values. A set is defined by the set of functions that you can apply to an object, right? And then here's another quote from Smalltalk. And I'm not talking about Smalltalk just because I have a gray beard. I'm talking about Smalltalk because it's really very, the type system of Smalltalk is very similar to the type system of Python. Just like the type system of Ruby and the type system of JavaScript. And in the Smalltalk community, they use this term protocol, right? So this quote is from one of the designers of Smalltalk. Every object in Smalltalk, even a lowly integer, has a set of messages, a protocol that defines the explicit communication to which that object can respond. In Smalltalk, there was no explicit way to declare a protocol. But some browsers that you use to code in Smalltalk did present clusters of methods as a protocol. And of course, a class could implement several protocols. So that was the way they used it more as metadata for the programmer than something that was checked, right? So the main takeaway from this first part is that types are defined by interfaces. And protocol is a synonym for interface. So that's where the name of the typing protocol class that we're going to talk about comes from. Now let's talk about duck typing. So last time I looked, Alex Martelli, who is my friend and also one of the reviewers of the first edition of my book, was credited by Wikipedia for being the first person who used in a written message that you can find online, this duck metaphor. I don't know if he invented it, but he was the first one who used it in a Python mailing list to respond to a question about polymorphism that had to do with type checking in Python many years ago when there was no static typing. So the idea is that you don't check whether something is a duck. You check whether it walks like a duck, etc., depending on exactly what subset of duck-like behavior you need, right? I started using Python in 1998, and at the time there were no ABCs in the standard library. So in the beginning I was a little bit mystified by the appearance of phrases like, oh, this function takes a file-like object. Okay, what is that? There was no definition anywhere, right? But it was a common practice to have these kinds of expressions. I think there are still phrases like that in the documentation. A buffer-like object is another thing that is super important for data science. Then there was some formalization of what that means, but the interesting thing is the precise meaning of a file-like object depended on the context, right? Sometimes all you needed was, I need something that has a read method that returns bytes, for instance. That's file-like enough for me in this moment. Does that make sense? In another situation, maybe you need something that has a close method and a write method, things like that, right? But it was always very informal, similar to how it was informal in small talk as well. So one definition that you can give if somebody asks you what's the difference between a protocol and an interface. If we are not talking about static typing, you can say that protocols traditionally were informal interfaces that were not defined explicitly in code, but where the code depended on one of those interfaces on a few methods to do something, right? So, for instance, I've taught Python to lots of people and most of my students over time came from statically typed languages like Java, mostly in C++, C sharp. E, so, I always like to show them this example of a very simple function double to illustrate duck typing, right? So I create this single function double and I can use it with an int, with a float, with a complex, with fractions, and those are all numbers, but I can also use it with sequences, right? So what is the idea here? What is the protocol that double expects x to fulfill? Double expects that the argument x is able to multiply itself by an integer. That's what is implicit in that code and this is why all of those objects support that, because under the covers, they implement a method called dundermoo, right? So dunder double underscore in front, behind, and some special name. Dundermoo is actually the method that any class that you write can implement to support the multiplication sign, this operator, right? Now here's another example, a more dramatic example. So I have this simple class that represents a train, okay? And its only attribute is a length that is established when it's constructed and so then we have the dunderlan method that returns the length and we have the dunderget item method and basically it checks whether the i is within the range, right? From 0 to length minus 1 and if it is, then it returns a string saying car number, such and such, right? Or, otherwise, it raises index error. So, for instance, if I have, if I build a train with three cars and I ask the length of the train, I get three. So this is duck typing, right? Notice this class inherits implicitly from objects. There's no other special class that I'm using here. So what makes something work with length in Python is, basically, for built trains, Python has a shortcut, okay? Built-in and extensions, like extensions written in C or Rust or other languages can implement in those lower level, using the Python CAPI, what is called a slot, that will respond to the length. E, even if you don't do that, there is also, you know, some, it's funny because the C code of Python is kind of object-oriented C, okay? Not C++, object-oriented C. There's this kind of nested structures in which the, everything that has a length, everything that has multiple items in it has a field that Python can inspect to know how many items there are, okay? So that's very fast because it doesn't involve any method call. But if that field doesn't exist, then Python will try and find whether there is a dunderlan implementation either in the low-level language or in Python if it's a Python object, right? So that's how LAN works. And it's an example of duck typing, right? You don't need to inherit from anything to implement, to support LAN, you just implement that method. The second example is what triggers the dunderget item method, right? And again, another example of duck typing. But the third example is really dramatic because the target of a for loop, in this case, the T, is supposed to implement an interface of iterable, right? And how do you implement the interface of iterable? Formal, you implement a dunderiter method that returns an iterator, which is another interface that implements a dundernext method. However, I didn't implement dunderiter here. Okay? Well, anyway, in a regular duck-type language, any class that would implement dunderiter would work for iteration. But Python goes even further than that because that's so fundamental, iteration is so fundamental to the language e also to support legacy code. It just so happens that the Python machinery, this is actually built in the iterability function. The iterability function looks to see if the object implements dunderiter, but if it doesn't, then there is a fallback and the fallback is, does it implement the dunderget item e then it tries to send a zero, it tries to call the dunderget item with a zero argument. And if that works, then it starts iterating by providing more indexes until the dunderget item returns index error, right? So this is kind of an extreme example of duck typing, right? Of the machinery of the language trying to use the object even when the object doesn't implement one particular interface, it falls back to another interface and there's what you expect, right? Okay, so now I'm going to show the first example of a protocol. And this I think is the best example of a protocol in the whole talk. It's the most similar to the kinds of protocols that you are likely to use in your job. So what I'm saying here is I have a function called run file and it takes one argument called source file, which is a text reader. And then immediately above, there's a definition, right? Of a class that inherits from protocol, right? And that definition is exactly like I wrote it here. The ellipsis in line 203 is part of the syntax, okay? The ellipsis is telling you that the body is omitted. Instead of pass, which explicitly says, this doesn't do anything, what this is saying is the body will be defined somewhere else. It doesn't matter what the body is. So in order to satisfy that protocol, an object, any object in the world can satisfy that protocol by implementing a read method that returns a string. That's it, okay? So and often you will see code like that where the interface is defined right above the place where it's used. This is common. We see that a lot in Go code. And why am I talking about Go code? Because Go is the first mainstream language that is statically typed but also supports this idea of declaring protocols. Notice that whatever class I use to pass, to produce a text reader doesn't need to inherit. In fact, they won't inherit from text reader. All they have to do is to implement the interface, okay? So the benefits of using typing protocol are that you can preserve the flexibility of duck typing. So you can let your clients know what is the minimum interface expected regardless of class hierarchies. You can support static analysis, right? Ideias e nintas can verify that an actual argument that appears in a call site for the run file function does satisfy the protocol, right? And you reduce coupling because the client classes don't need to subclass anything. Just implement the protocol. And this also makes testing easier, right? Because if you need to create some kind of mock to test, all you have to do is implement that one method. You don't need to inherit from anything, right? So let's talk about the four modes of typing. We all know there's this duality between static typing and dynamic typing. And this has to do with when the types are checked, right? Static typing is designed to support static checking, which is checking then by tools and compilers and so on. And runtime checking is... I mean, dynamic typing was invented to support runtime checking, right? The authors of the type hints, PEP, promise that Python will remain dynamically typed. But actually the situation is more interesting than this. The situation is that we have these two axes. Where besides static checking and runtime checking, we also have the issue of how the types are defined or identified. So in static typing of languages like Java, you have nominal types. Things belong to a type because it's explicitly declared. You inherit from something or you implement an interface, right? But with static typing, what we have is structural types, which means objects belong to a type because they implement a certain interface. But they don't need to explicitly declare anywhere that they do implement. They just have to actually implement it, right? So these are like the two opposites. But this diagram actually has other areas. This area, Alex Martelli, when he was reviewing the first edition of my book, he invented this term goose typing. I don't have time to talk about that. But basically that's the use of ABCs to do explicit type checks at runtime. And then Lukash Lange and others. Lukash is speaking right now at another room here. They proposed PEP 544, which implements structural typing or the idea of, I mean, structural subtyping or the idea of static duck typing, right? Duck typing, static duck typing. So, and that's supported by typing protocol. Okay? There are other languages that support things like quadrants. See, Go appears in the static duck typing quadrants. But also in the goose typing quadrant because it has mechanisms for type assertions at runtime and so on. Okay? So let's see some more examples. We were talking about the double function. How could we annotate that with a protocol? So can we do that? Yeah, okay. This has a serious problem. The problem is the type checker will complain that the object type doesn't implement the undermove. Does that make sense? So this code, without looking at anywhere else, the type checker knows that this code is invalid. Okay? Can I use any? Yes, but it's useless because it's the same thing as no annotation. Any annotation pretty much defeats type checking. You have to use it very carefully and like an emergency exit. I could do this and say that X is a sequence of T. And here I have to introduce this idea of the type var, right? Because sequences is a generic. I'm sorry, base class. So I'm saying this. If I get a sequence of floats, I'm going to return a sequence of floats. If I get a sequence of strings, I'm going to return a sequence of strings. This is what the type variable is doing. Right? When the type checker analyzes this, it gets a concrete example of a call site. It looks at the type of the elements in the sequence. E then it binds T to that type. And then it will infer that the return is a sequence of that same type. Okay? So this only works with sequences. It doesn't work with numbers. So the solution is to create a repeatable protocol. So the repeatable protocol is saying that is anything that implements the undermoo and that takes a repeat count, which is an integer. Okay? And now I can say that X is a repeatable and this returns a repeatable. Are we done yet? No. Because now when I call double, the result that I get is inferred to be something that only implements the undermoo. And if you need to do other kinds of operations with the object, you won't be able to. The type checker won't let you. Of course, at runtime everything happens, right? Because that's the characteristic of the gradual type system of Python. At runtime only the dynamic types exist. There's no constraint. But the type checker will point out that if you try to do something else with this result, no, this result only supports the undermoo because that's what it says. So, the way to solve this is a little bit more involved. You need to create another type var. So, the protocol is the same. Okay? The new thing here is this type var here. And here I'm saying that RT can be bound to any type, but it has to be limited by... I don't like the keyword argument bound. I think they should have used another word for this. Maybe limited, or upper bound, or something. But because there's this... the binding of the variable and the bounds, which is kind of a boundary, right? What this is saying is that RT can only accept something that is repeatable. But because of the way that this variable is declared, the actual type of it will be preserved as well. So, if I pass a float, the type checker will be able to infer... First, it will accept the float because the float implements the undermoo. And it will understand that this returns a float. Okay? So, that's one thing that we need to do in this case. So, when I discovered these things, I was doing research for the book and that was the time of Python 3.8 when typing protocol was introduced. There's this project called typesheds, which has the type annotations for the standard library and other projects. And I contributed several fixes by implementing protocols there. So, for instance, there was this bug that I found that there's a... The medium-low function in the statistics module actually works with strings, with a list of strings. Okay? And there's an example just like that in the documentation. So, but it wasn't... But at the time, we had a false positive, which was that if I pass... If I submitted this code to a type checker, I would complain. And why is that? Because the annotation at the time was saying that the arguments had to be uniterable of a number. And number was a type that they invented there, which was the union of int float and complex, int float and fraction, I think. Anyway, it didn't work with strings, the annotation, although the function did work with strings. So, the solution was to create this sortable protocol. And again, you see the same formula that we saw before. You have a protocol, and you have a type var, which is bound to that protocol. And then I can use that type var to say that medium-low accepts an interval of sortable t, which is the type var bound to sortable. And it returns a sortable t. Right? When I contributed that, I looked around and I found many other places in the Python 3.9 standard library where we could improve the annotations by using protocols. Okay? And then there was the biggest challenge for me at the time was the max function. The max function is super powerful, right? Like I said, I've taught Python many times. I never saw a student say that, oh, max is super complicated to understand. No. It's pretty simple. You can pass an iterable or a series of arguments. And then, if it's an iterable, then it will return the biggest item in the iterable. If it's a series of arguments, it will return the biggest of the arguments. Right? But then there's some optional arguments and so on. So it's very flexible, easy to use. I like this API. Okay? But then somebody posted a bug about that and I tried to fix it. The old type hints were like that. So overload is something that comes, this decorator comes from the typing module and its mission is to allow you to do... do multiple type annotations to different ways of invoking a single function. Right? So it was already pretty complicated. My time is running up, so I'm not gonna delve into the details here. But this wasn't working. This was producing that bug that this person reported. The bug was that if you passed five and top where top was none, it exploded. And the solution was this. It took me several hours, actually a couple days, and with support of people more experienced than I in the type system to get to this. My first attempt had eight overloads, then one of the maintainers of typesheds showed me how to simplify and reduce it to only six. Okay? In order to do that, because Max is written in C, I wrote it in Python to test and so on. And what caught my attention was that my implementation was shorter than the declarations to satisfy the type checker. Okay? So there was a lot of stuff that happened after that. If you go look at the code, it's not like that anymore. I won't go into these details here, but it's just for you to know that with time, supports less than became supports greater than and then they invented another name and then they created. Now what they are using is supports rich comparison, which is a union type of supports ThunderLT and supports ThunderGT, which means that in most of those functions that appear in this list here that have to do with ordering stuff, what they do is they accept, the current annotations accept something that supports either less than or greater than either way it will work. Okay? There are some protocols in the standard library. Most of them have to do with numbers because unfortunately the numeric tower, remember the numeric tower, it's completely broken for use with typing. And why? Because the base class of the numeric tower, which is called number, has no methods, which means that objects of that class are useless. You can't do anything with them. So, but anyway, so they invented those protocols that have to do with whether something supports a conversion, right? And one of them, for instance, I use in one example in the book, which is supports index. Supports index is a particular protocol that means that the object implements the under index, which as opposed to the under int, which floats also support to convert to int, the under index is supposed to be implemented only by integral types. So, for instance, if you use numpy, all the integral types of numpy supports implement the under index method. So they can be used as indexes. OK. So, to summarize, I recommend that you use typing protocol to build pytonic APIs. In fact, you can see that the standard library is very pytonic, although there are parts of it that I don't consider very pytonic, but most of it is very pytonic. And the standard library really cannot be annotated without the use of typing protocols, right? So, you should use typing protocol because then you can support duck typing with type hints, and duck typing is the essence of Python's data model and standard library. This allows you to follow the Integrate Segregation principle, the I in the solid principles, which means that client code should not be forced to depend on methods that it does not use. You know, I remember when I used the code in Java, sometimes you had to implement methods to satisfy the compiler, right? And this is a recommendation that I learned from studying Go, okay? Narrow protocols are the best, okay? Most protocols that come in the Go standard library have only one method, and that's how your protocol should do as well. That's why trying to think of an interface in terms of an ABC or a Java interface with many methods is not a good way of getting started with designing a protocol, okay? Closing words, I have to say these things and every time that I talk about Python typing, okay? Being optional is not a bug or a limitation of Python type hints. It's the feature that gives us the power to cope with the inherent complexities, annoyances, and limitations of static types. So please do not decide in your team that everything has to have type hints always, okay? That's the only way to use a type checking is in the strictest possible mode. This is really bad. It's like deciding that every code has to be a 100% test covered. No, 95% is good enough a lot of times, okay? So that was my talk. Thank you so much. I don't know if we have time for questions. Maybe one question. Yes. Thank you very much. We have time for one question. Just remember to go to the microphone so that everybody can hear it. Yes, please. Obrigado. Thank you for your talk, Lucian. Asking as a beginner in typing. I could not follow why when we define the repeatable protocol. Why do we say that more will return something that's also repeatable? Why don't I say that I'm just returning anything? Let me see. So the repeatable protocol this? Yeah, yeah, exactly. So what's the specific question? Why am I saying that the under mode will return t? Is it something that I decide or is it the typically the thing that makes sense? Yes, it's something that you decide. But this is how this is how I see the because the idea of this double function is that it repeats something. If it's a number that means multiplying by two but if it's a sequence it's repeating it. And that's actually why I called it repeatable and repeat count. In both of those cases the output type will be the same as the input type. Does that make sense? Yeah. Actually this is the right solution, right? Sorry. That one has the problem and this one is the one that is correct because it uses the bound type var. Okay? Okay. So you can catch up with Luciano Well, during coffee break or during lunch. Thank you very much for the presentation and let's give him round of applause.