 Hello, everyone. Hopefully you can hear me well. I'll be joining you remotely. Thank you for coming to my talk. I am aware that the last talk in this hall was also about protocols. So if you haven't had the chance to listen to the previous talk, hopefully you'll get similar benefits for mine. So thank you, Rogier. Hope I said that right. And can I just get a confirmation that the sound's working from an operator so we can start? Okay, I'll assume that it works. Okay. So a little bit about me. My name is Ron. I'm a data engineer at Bluevine. I enjoy learning about how programming languages work, new features and how I can use them better. I love traveling and hiking. I'm currently based out of California. So a little bit about Bluevine. It's a fintech company that provides online business banking and financing to small to medium-sized businesses. We use Python everywhere. We have Django on the server side and use it for our data pipelines, data science, analytics, and machine learning. So what are protocols? Let's first talk about the existing definition. So first of all, they're added in PEP 544, aptly named Protocol Structural Subtyping. They're introduced for Python 3.8 and they are an extension of the typing system. So the term Protocol is a widely understood construct in the Python community, which basically means an interface which requires the implementation of a Dunder method. So the iterator protocol implies the implementation of the iter and probably next Dunder methods. The size protocol, for example, requires the implementation of the Dunder Lend method. So for example, if we want to create a class which implements the iterator and size protocols, we would do it by inheriting the abstract base classes, size, and iterator. An iterator is any object that it can be iterated upon, meaning it implements the Dunder iter and next method, and a size is any object with a known length. So aside from these type annotations that you see in this slide, there's this is nothing new. Before we continue to the new definition, let's give a little bit more context and talk about the typing system a little bit and why it's important. So what is the typing system in Python? Type annotations make it easier to read, refactor, and validate code correctness statically. And what I mean by statically, it means that they do not get evaluated during runtime and evaluated only by static type checkers like MyPy. They also provide great means for documentation in the code. So let's look at an example here. Here we define a class that registers callback functions to receive an integer as the sole argument and return a Boolean value. And on the second line, I define a type alias which basically specifies what this type is. So in this case, it's a callable that can accept only an integer as an argument and returns a Boolean. This means that it can accept functions or sometimes even classes. So this provides better readability and to avoid repeating the long annotation. The callable annotation means that basically every object that implements the under call method. So as I said, it's either a function or a class. And you can see here that the is even function matches the type pin we define and then we can register it to an instance of the class. So with this little bit of typing annotations, we get a lot of clarity for our code. So this is kind of part of the strength for the typing system. So let's talk about the new definition for the protocols. So they provide the goal in the rationale of the PEP is to provide the static type checkers like MyPy and other tools, the ability to verify co-correctness prior to runtime. It enables the implementation of interfaces without explicitly inheriting from abstract base classes. The PEP defines this behavior of inheriting from base classes as something that is un-Pythonic and unlike one would normally do in idiomatic dynamically typed Python code. Let's say a mouthful. So let's let's look at a comparison between the old definition and the next. So let's simplify the example from before a little bit. This is how one would normally register a class on the protocol on the left side. We can see that we inherit the classes and on the right side is the same code which implicitly implements the protocols. For both cases, the code actually passes the type check, which is the new feature really. And it is important to know that the goal is only to support this behavior statically. Since abstract base classes already provide support for this during runtime. So before we continue, let's let's clarify the difference between the two forms. So of subtyping. So first of all we have the nominal subtyping which is explicit basically means that it's in the name. And one type is a subtype of another if and only if it explicitly is declared so to be in its definition. Which means that if we run the instance method on it, it gets evaluated to be correct. A structural subtyping means that one type is a subtype of another if it behaves similarly to it. So basically duck typing. So let's to illustrate the difference. We'll create a dog protocol which defines a bark method. And here we have scooby-doo which is a dog and since its parents class is a dog and therefore it must have a bark method. So this is an example of a nominal subtyping, very classic behavior. However according to structural subtyping, eucalyptus tree is also a dog because it has a bark. Now obviously this is a joke and if you get it and it's kind of obscure then you're probably in the intersection of this Venn diagram. So obviously what we want is a supports bark protocol because trees are no dogs. And in this case both of these types are considered subclasses of the supports bark class. So finally let's show how to create a new protocol. Protocols are defined by including a special new class called typing.protocol or if you're using Python 3.7 or below typing underscore extensions is the module you would import it from. All it takes for a class to be a subtype of supports close is to implement a close method with an identical signature. And yes this ellipses is a valid notation, it is the same as the pass keyword. So here we're implementing, we're actually using the supports close in the signature of a function. We define a function close all which accepts an argument named things. Which is of type iterable supports close and you can see how we use it below. We open a file and a resource and we pass them all to close all and you can see that it works. So the interesting thing here is that F is a built in type which can be accepted as an argument for close all without modifying the base class. This implicit subtyping prevented it from writing extra code to wrap it like so. So here we would have to, without protocols if we wanted to enable the same type, it would have to wrap the file type within another class and then subtype supports close class. This makes things a lot neater. This is example specifically is very messy verbose and specifically this would be in my opinion the best use case for protocols. So in order to turn a subclass into a protocol and to extend it, you must inherit typing protocol otherwise it will be downgraded to a regular abstract base class. So in this example, if we want to create a new protocol which defines that you have to define the Dunderland method and the close method, we would have to create a new class, inherit both of these protocols and then inherit protocol again. There's actually nothing wrong with not inheriting protocol but then it would just be a regular abstract base class and that feature already exists. So just be aware when you do it. You can also create generic protocols and you can do that by using type vars and specified using the double brackets notation, the square brackets notation. For example, the iterable protocol is a generic protocol that can accept a generic argument and then we can define which type it accepts. So here's how you define that. You can actually use the special type var object from typing. You give it a sort of an alias and then you can create your protocol like so. You create the class and then you inherit from protocol and then you specify the generic protocol and underneath you can see that for the variable collection we passed the iterable of size and closeable type. So let's get to a real-world example that we actually used in our day-to-day. So executing a procedure on batches of data is something that we do a lot. The data usually changes somewhat between projects depending on the use cases but I wanted to create a data batcher which can iterate over some abstract type so it can be easily reused. So here's an example of how it would be used. Here we're creating three different collection objects which are quite different from each other. One of them is a list of integers, one of them is a string and one of them is actually a data frame from a third-party library pandas. And here we're creating three data batch generators for each one and then we loop over all of them in a single loop and we get the output below. So this is kind of what we want to achieve and now obviously all of these can be iterated in some form. However if we iterate over them we probably are not going to get what we expect so we have to define this class which implements this behavior and in order to define it it needs to know which type to accept. So let's first start by defining our protocol. Let's start with the first part. So we want to iterate over all these objects and when we want to iterate over some object you need to know first of all when to stop the iteration and second how to divide it into batches of equal size. So the first part is actually easy. We already know this protocol, it's called the size protocol. We just need to know the length of the data to know when we reach the end of it which means it needs to implement the Dunderland method and we already have this built-in protocol for that so that's out of our way. And the second part is a little more tricky since we want to achieve the following in order to batch the data and this kind of looks like we're accessing the data using indexing. So let's create a supports index protocol. Here we're creating this protocol which basically states that you must implement the Dunder index method. However there's a problem with that because a Pandas data frame cannot be indexed unless you're using some other method which is like the Ilock method like a regular collection. Okay so let's try something else. Let's try the Dunder get item method instead. So this would be a container protocol but they kind of solved it but now the type checker will allow a dictionary to be used since it can be accessed using the script brackets notation. Now this is not too awful but we want to be more precise about the types we can use and also we can't really iterate over dictionaries like we can other collections. So as it turns out that this notation of I colon I plus N is actually a slice object so our protocol should look something like this. And you can see that the item it accepts in the function is of type slice. And we're able to define this protocol with really great granularity up to the type of a method argument and we can combine them both into one protocol which is our final product. Here we created the support slicing size class which inherits both from size support slicing and the protocol which makes it a protocol. So now let's take a look at the final code. So this is the final code and you can see here this is nothing really special. It's just some sort of batcher that well let's actually take a look at it a little bit and the interesting part happens in the Dunder next method. First of all the first line actually uses the size protocol and we check the length of the data and then a few lines afterward we're actually batching it. And you can see here that the in the init method that data argument is of type support slicing size. And this enables us to define quite a strict interfaces which enables built in and third party library types to be used without changing them. And thereby greatly decoupling our code. Also notice how the data batch generator actually implements the iterator protocol and the iterable protocol which is pretty cool. So this means that this example this code will now pass a type check because all these types without modifying them or anything or subclassing them are now considered a subclass of our protocol. So thank you all for listening. If you have any questions I'll be happy to take them. Thank you. Okay we do have a few minutes if anyone does want to come to the mic and ask a question. All right then I guess that is that question. Thanks for the talk you showed that seems like the signature at least the input of the methods are checked in the protocol. Is there a way to also have it for the output. As in like to check that the men like or if if I have a property that the property is a list. You mean the return type of the function. Yeah. I believe that the type checker will check for that if possible. Yeah I mean also I didn't mention that but I recommend reading the actual pep because it provides a very thorough explanation of the subject. But yes function our function return types should be checked. Thank you. Thank you. Hey thanks for the talk. So you went with a lot of the protocols like sized and iterable. How do you go about going from very broad different broad types like a of a dictionary or mapping to I only need a Lang methods or only need sized for my particular function. Do you do you have any good way to do that or do you know what I mean. Actually can you try to explain a little bit more. I'm not sure. I want to give it. I don't know something I usually give it a dictionary. But all it ever does is it only checks the length of the container or whatever is in there. So actually my type for my argument is dictionary. But all I need is size. Do you have a good way about going from the very rich definition of mapping or dictionary to sized. So if I understood that correctly you want to specify more kind of low level type in your in your function definition. Well in that case well first of all you know you got to understand you got to check that your function really does need to doesn't really care about the type other than the function defined in the protocol. And if so then you can just pass in in this case as instead of the dictionary type annotation in the function argument you can pass in the sized type annotation. Yeah if I understand that sure but like yeah I guess there's no easy way to to go like OK a type checker could tell you hey actually you're using this broad definition like the dictionary but you only need a sized. I'm not familiar with anything like that. Yeah that's that's up to you. Thanks. Thank you. OK thank you all very much. Yep thank you so much.