 Today we have with us Chris Kumar Ram Raju. So he will be presenting a talk on type check Django app. So over to you Chris Kumar now. Thank you. Thanks Kajetana. Good afternoon everyone. Welcome to my talk type check Django app. I can't who doesn't know T1. I hope you're all having a good time. Who am I? I'm Chris Kumar. I'm a software engineer based out of Bangalore. And I've been a volunteer to PyCon in the previous editions. I work for this company AdBase. So AdBase is a single platform to manage all your company specs. If in case you are interested, we are looking for developers. You can talk to me offline. For throughout the stock, I will be using Python 3.5 plus syntax. And whenever I say the static type checker, I mean mypy. And there are other static type checker available. But all of the code example and the concepts are very much tailored to the standard mypy. And by default Django version I'm using will be 3.2. And Django does not support Python type in stativity. So we will be using a third party plugin called Django stuffs. Types at runtime. Python is a dynamically type language and you don't need to give any type information to the interpreter. It can figure out all the type information and more details about the objects and it attributes everything at runtime. But there are some places where you need to know the types of an object. For that you have an inbuilt function called type. So type takes an argument and returns a value of the data and expression which is passed on as an argument. So here is an example type. You pass a tuple to it. It returns output as a tuple. It returns a concept if you apply to a Django user object. You filter the user.object.stop filter and you pass a result to type function. What it does it outputs query.query set. So review available mypy checker. Sorry about that. Type is a double during. This will be a runtime and double during. Particle type. There is a tuple and stuff. And returns elements in that. This is rather static type. So mypy by default work type will be also interested in the every element inside. So here is an example of a reveal type where you pass a result of a Django user objects filter. Now you can see what the function is returning. It returns a user manager of user. This is a bug in Django stuff. It should be Django user query set of user. So mypy config. Since mypy is a third party library we install using a pip module. Now we need to configure mypy to work for Django source code. The first thing as I said Django does not come with default type annotation for all the code. It is annotated in a third party plugin called Django stuffs. So now we need to tell the mypy to load that plugin to look for all the types which are related to Django code. So how do we do that? In mypy.ini configuration file you have a section called plugins. In that you can mention what is the entry point for this particular Django stuffs module. Here we say mypy Django underscore plugin dot main. That is the entry point for this plugin. And then next we need to pass some configuration settings to this plugin because Django app pretty much works exclusively from starting from the setting. So you need to specify the settings location. In our case if your app name is say book counter you will specify book counter dot settings in the mypy file. So mypy internally will pass the settings variable to the plugin and the plugin will work as expected and return the types of all the Django objects and all the custom function return values. So what are those annotation syntax? So Python added type annotation in 3.5 onwards and you can annotate your variables, functions and classes wherever there is a data holder. And this is completely optional. There is not mandatory at all. So what are the syntax? In Python to assign a value user variable name and equal to value. So there is only one little bit of change that is like after declaring the name that is language colon str which denotes what is the type of the value. And this type can be like any type. It can be a inbuilt type or user defined type or composite types. For an example, second one is a very specific date type. Here date is a class. So it's going to be instance of date. That is what it means. Your colon date means your variable will hold instance of date. Now let's go to the next example. Some function takes a two arguments and returns a value. So the syntax is similar. It has got two arguments, both are soft type integers. So you specify the types as A colon int, B colon int and return value as int. So function can return more than one value. You put a square bracket and say like what is the type of first return type, what is the type of second return. Next let's go to the class. Here is initializer which has got three arguments. Name, age and isalite. Self is self-externatory because we know it is going to be instance of person. There is no need to annotate it. And we annotate the next three arguments that is name, age and isalive attribute. So annotating Django code. And let's start with a simple view function. And we'll see how to annotate. Then we'll move on to models. So here is a simple view function. So view function takes in a request and returns a response. So how do we annotate the input arguments? The request is a HTTP request. So we import it from Django to HTTP. And we say the return type HTTP response. So HTTP response is a proper super class of all the specialized responses like JSON response. So all the function will take an HTTP request. Sometimes it will take more than one argument. In this case it takes only one argument request and it returns a response. It's a type HTTP response. Here is another example. Here we have a function called view 404. The input is a HTTP request. But the response if you see in three examples, the response type is annotated in slightly different way. If you look at the body of this function, it returns one to one HTTP response not found. But these three are valid annotations which are accepted by my point. So let's try to see why these three return types are valid. So before getting into that, let's see what are the return types it is saying. First one it is a HTTP response not found. It is very clear to us its instance of HTTP response not found. So it is okay that HTTP response not found should be a correct one. Second one says like HTTP response is a return type. MyPy does not comply. Third one it says object. Now let's see why is this three, why is there three ways to annotate the function in two different ways. Python has this concept called method resolution or class. Method resolution or order is a way to specify, it is a way to know what is the inherited chain of a class. So it starts from a child class to the parent class to a child class. For example, if you see the HTTP response dot MRO, find every class you will have this method called MRO. When you call that MRO, it returns all the classes in the child class to parent class order. So the HTTP response is inherited in this order. HTTP response inherits HTTP response base. HTTP response base inherits object. If you see this MRO for HTTP response not found, it inherits the HTTP response. HTTP response internally inherits HTTP response base. and internally inherits objects. So this is the reason that PyPy did not complain even though the return type of HTTP response not found. Even when the source code of the function was annotated to retain HTTP response base or object, it did not complain at all. So this is basically an alignment with a concept called LISCOP substitution principle. So what LISCOP substitution principle states is that in an object oriented program substituting a superclass object reference with an object of any of its subclass, the program should not break. Since HTTP response not found is a special type of HTTP response, it did not complain that these classes are two different types. That is the reason why MyPy accepted the type annotation. Next is Django models. Now let's see a simple create scenario. Here is a simple model question which takes in two explicit model fields. Model question text is of character field. Update is of datetime field. We have a function create question which takes in a question string and returns a question. It is a simple way to create a Django model and return. Since it is returning instance of question, we will annotate the return type as a class name which indicates it is going to return an instance of this particular class. Now let's do a read function. We have a simple function called get question which takes in a question text which is of type string and returns a question. And internally it uses filter method. So if the question matches the question in the database, it will be very first instance of the question. When you run this code against MyPy, MyPy complains with one error. The error message is incompatible return value type optional of any expected questions. Let's take a look at how objects.filter works. Object.filter is nothing but a var query where whenever the result is found, it returns all the results. Since we are using dot first on it, it will return a question only when there is a match. If there is no match, it will return none. MyPy was exactly able to identify this and it says optional. Optional means none. Any means are any instance which it returns since object manager can return anything. Here it is very specific to question. What it is saying is that get question is annotated to return question, but the source code or the return statement can return a none or question. Now this is very fundamental difference with how the dynamic behavior and static typing works. If you see how can we solve this problem, MyPy already said that it can return optional of question. So what we will do will import optional from typing module. And then you can say the return type of this function as optional of question. Optional, the square brackets means of what is the type it will return. Optional means it can return none or question. So all the helper methods and attributes or the references for typing is available in typing module. This is one way to solve this problem. There is another way to solve this problem using MyPy config. So start something called a strict underscore option. Strict underscore optional is a configuration parameter in this configurational file where you can set that value to false. Then what it will do is whenever a function returns a none or an object, MyPy will not complain that your function does not reflect that. For example, in our case, if we return this as a question, annotate it as a question and even though we didn't specify this optional of question, MyPy will not complain. So this is a very good trade-off between a linear code type checker and a strict type checker because depending on your source code and the level of the maturity of your source code, you can decide to use one of the modes to start with type checking. You can start with a linear mode like strict optional is equal to false. There are various other options. Once your code means to use a very good maturity and you have a lot of type coverage, then you can go back to strict underscore optional and then set it to true. You get the maximum mileage out of MyPy for strict mode configurations. So next is a filter method. The filter method is available on the object manager.objects. So when you do objects.filter, Django returns a query set and the query set contains a model instances inside it. You can consider it like a list of values. So all the values inside the query set will be of the homogeneous type. That is, if you are using a user.objects.filter, all the items inside it are going to be user objects. Or if you are going to use a group object, all the items inside the query set is going to be group. This is important how you will be accessed Django overall. So here is some example. How do you annotate a function which returns a query set? Similar to optional of question, we say query set of question, which means the return type is going to be a query set. That is outer type and the inner type is going, inner type which will contain will be the instance of question. You can consider it like a box model where like a box is a query set. Items inside there are the model instances. In our case, it is going to be question. Some of the other methods which returns query set in Django or more like all, reverse, order by, and union. There are a couple of other methods on object manager which returns query set. Next we will see an aggregate function. Aggregate function is a way to compress the data in a database or in a table to return a very few values. For example, average is one such example. To illustrate the use case, I have two models. The first model is a publisher model with one explicit field name of character field. Second one is book model with five explicit fields, six explicit fields and name of character field, pages of integer field, prices of decimal field. Rating is a float field. Publisher is a foreign key to publisher model and publish date is a publish date field. So here is an get average function. What it does is like in the books model it tries to give the average price of all the books in that particular table. Since it is compressing or summarizing the data, it will return only one value. So now AVG underscore price is equal to average. So AVG is a keyword here. So when the return, when this function returns a value, returns a value as a dictionary where key is the query expression that is AVG underscore price, value is the type of the field of the field price. So in our model declaration, the prices of decimal field ends the compress to value or the summarized value is a decimal value. Now let's try to annotate this particular function. Since we know the return type is a dictionary, we can say the return type is a dictionary, but we need to annotate what is the type of the key and what is the type of the value. Key is going to be average underscore price and the value is going to be decimal. So we will annotate it as string comma decimal. Always the type of key goes in the first and followed by a comma and then the value of the type is annotated. Next let's see the annotate method. So jango annotate method is nothing but a group by query. You group by items in a table by a particular column and you either do some sort of aggregate function like count star or average of all this. So in our case we have already a publisher model. What we will do is we will try to find number of books published by each publisher. So what we do we are going to do a we are going to use annotate method. So publisher.objects.annotate I am going to take a sip of water. So publisher.objects.annotate it is going to annotate all the books published by a particular publisher by name and print the number of books. So if you see the output of this function it is going to return more than one value because there are more than one publisher in the there are more than one publisher in the database. So you can see the values. It looks like a list of dictionaries since we are using a list comprehension that is why it is represented on this. So it is going to contain a name and number of books by each publisher and there are two publishers. So we have two rows in the result. Now returning a query set so now we are slightly modified the count by publisher function rather than annotating or returning all the values of the num books everything. We just make sure it returns a query set and top of that we have one more function called printpub which takes an argument num books and prints the number of books which are greater than specified value only those publisher details else it prints all the publisher details. So let's have a look at the printpub function. Let's got an if condition and then a for loop. If condition has got if it is greater than zero the num books is greater than zero then we filter the publishers were published more than the past in value and then we store the result in a result variable same thing in the else case we are not doing any filter and once we have the result we iterate over the results and we print the name and the number of books published by each publisher. So the output will look something like this so pen going to and vintage one. So this is not a dictionary now let's try to annotate this particular function. So count by publisher we know it returns more than one value we will annotate it as an iterable of published book in the previous example you saw like it looked like a dictionary it had two attributes. So what I will do was I will do create a class called published book which inherits from the type of dict so type dict is a super class of dict what it does is like whenever you know keys of a dictionary and values which is of a static dictionary you can consider we know that dictionary will contain only two keys and we know what is the name what is the type so what we do is we create a new class called published book count the key value is name and the value against that key is going to be string num books is the key and integer is going to be the the value contained in that dictionary item. Now we annotate iterable of published book count that means like we are going to have some kind of a iterable container where you can put a form and each item is going to represent published book count. Once you run this against MyPy, MyPy complains there are two set of errors. The first one it says is like it got very set of any expected iterable of published book count. So what MyPy is saying is that you have annotated the function to return annotation says it's a iterable of published book count but it has from the source code since annotate method returns a query set MyPy is complaining that this function count by publisher is returning a query set of any but not of iterable of published book count that is quite understandable. What is the second error? Second error is building upon the very first error the iterable of published book count has no attribute filter. So what it is saying is like in the for loop not in the for loop in if block filter condition which further filters the result. So it says like the iterable does not have a method called filter but we are accessing the object with a filter method. So MyPy caught these two errors. Now let's see the validity of this error. Whatever MyPy is complaining seems to be legit. Let's see how we can fix these two errors. The first is since MyPy is saying the return type of code by publisher is going to be a query set. Let's change the orientation of this function from an iterable of published book count to a query set of publisher. And then let's run this MyPy. When you do that MyPy now complies with only one error. The first error is gone. That's a second error. Now what it is saying is the publisher model has no attribute numbooks. This is coming from the for statement where we are iterating over the result and printing each rows values. Now it is saying the publisher model had only one field. One attribute called name but there is no attribute called numbooks. If you work with Django for so far you would understand the code what we wrote is legit. Because numbooks is a attribute field which is inserted on the fly by Django or but we don't have to declare that field in Django models. So it is a very legit code properly if you run a test case it will pass, if you run it in the production it will work. But MyPy being a static type checker it does not understand the dynamic behavior of Django because Django is inserting this attribute num underscore books at run time at Python run time. So now MyPy is unable to understand this magic. MyPy complies that there is no attribute called num underscore books which is not declared in publishing. So right now this is causing MyPy confusion. How do we fix this? So this is actually a bug in Django Stubbs project but there is a solution for us. At the time of preparing the slides there was a bug right now this bug is fixed but if you are using some older version of Django Stubbs project you will encounter the same bug. There is a solution for this. How we do this? There is a bug condition you can use it. So from typing module you can import something called type checking. This type checking is a Boolean flag. If you put if type checking this particular piece of code will be only executed inside MyPy. So when you try to run this code in using Python interpreter what would happen is that if block will not be executed at all. Since we know certain portions can be executed only by MyPy what we will do is we will say type checking if type and what we will do is we will create a new class called type publisher which in heads of publisher and we will add a new attribute called num underscore books is our field integer field. Now what is happening? Now we are creating a new model which has got name and num books. Now we will add a meta configuration called abstract is equal to 2. Now we will change the annotation to query set of type publisher and now when we run this code with MyPy MyPy will not complain at all. This is because right now the during type annotation type MyPy sees the type publisher it sees two attributes name and num books which satisfy all the condition and it goes through some of the tools so that if you are starting to annotate a Django project it can become very hard to annotate a large piece of code so what we can do is there is a tool called PyAnnotate. PyAnnotate is a tool which can infer the types of Django object during runtime and convert it to MyPy types. You can use a PyTest plugin called PyTest Annotate where you can run this Python source code and infer the types during test time and then use it to annotate those existing source code. Here is a function which we saw earlier that's got no type information. Now what you do is you write some test cases for that particular function which invokes all the code you wanted to annotate and then what you do is you run Django settings module like this. So I'm using Poetry you pass an extra options called annotate output. It will store all the outputs in annotation.json file and the output looks something like this. You have the path, what is the function name, what is the type annotation of the function infer during runtime and then what you can do is you can apply the changes once the test case is done you can say annotate this particular file that is view.py file from this annotation file of Python 3 syntax this is quite useful too. It will help you to get some kind of coverage from 0 to some expose. This is how the code looks. The code looks legit it can handle imports but there will be some changes definitely required from your end after seeing the coverage and the last part of this presentation is learning resource. I come up with a project called Python typing k-ons. Python typing k-ons is an interactive way to learn Python type annotation. To learn all the concepts and gradual typing in Python you need the Python code with the proper structure. This git repository has classified all the code into three sections, Python, Django and DRF you can run each file using this command line option and it will print out all the errors in mypy and as a learner you open the source code and you look into what is the error and try to modify the type hints or add the missing type and then you run the code. You reduce the errors from 5 to 3 from 3 to 1. At the end of this once you solve this problem you will understand how that particular concept works in mypy and thank you in case you have any doubts I'm happy to take the question. These are my contact details. Thank you Chris Kumar for the wonderful talk on type checking so we have some questions from the audience. So the first question is how do you decide whether or not you should add type checking in a Django app That's a good question like if you say like that the camp for type hints in Python is divided into two territories people say I don't want type hints people say I want type hints. My view in this is is your source code large? When you say large it is like you have like 5-6 apps you have some 40-50 Django models and it is growing every day it is going 1. 2. How long this project will be active? Is it going to be active for what to say like next 2 years, 3 years? 3. Is it just a pet project or not? 4. Is there more collaborators? So in my opinion if your project is going to last longer and if there is going to be more collaborators I have seen that's a lot of value for type hints in Python because it helps you to catch a lot of bugs before even writing test cases which I find to be useful and also it helps in navigating a larger piece code base and it also improves the readability but if you are only 1% writing on a small project then you can decide not to use it because there is a lot of overhead in adding type hints as well so you have to see what is that trade-off and when to add type hints to a project Thank you. So let me check whether we have any additional question here okay there is one more question you can take up so the second question is what's the standard way to annotate Pandas data frames so I use pandas.dataframe haven't checked with my pie so as I said earlier like Pandas by default does not support type hints when I said by default means in the source code when they write Pandas library there is very less support for what to say Pandas.dataframe what is that type so there are third-party libraries which maintain steps for what to say for NumPy for scikit-learn for data what to say for Pandas so you have to use one of those third-party libraries as a plugin how I showed for Django and you can start annotating your data frame using those libraries and it will help you to get to a decent level I cannot guarantee that all of the features in Pandas or all of the Pandas source code you will be able to annotate using those libraries but it is catching up and as I said it is not even complete in standard library itself