 Hi, everyone. Thanks for having me. This is last track for Herobitan. My name is Jean-Philippe Casey, can reach me on Twitter at JP Casey. I work at Shopify in Montreal. I'm French Canadian. I'm also an organizer of the Montreal Python user group. We do monthly meetups with little conferences and project nights also. So if you're ever in Montreal, or to our website, see if we have anything on, I would be glad to see you there. Today, I'm here to talk to you about and study type checking in Python. Before we go in depth, I want to bring some theory in it. So the first thing I'm going to want to talk to you about is type systems. It seems boring, but it's fun. You'll see. So the first thing we're going to check is what is a type system, of course. The Wikipedia definition is quite hands-on on it. So it says that in programming languages, a type system is a collection of rules, a set of rules, that assign a property called a type to various constructs composing a computer program, such as variables, expressions, functions or modules. So it's basically just a set of rules. But also, it has lots of purposes. Of course, the first one is we have type systems in place to help us reduce and identify potential bugs we have in our programs. It also is going to give us meanings to a sequence of bits. Because if I just give you this eight bits, we have no idea what it is. So it could be either 72 if it's an integer, it could be the letter H if it's an ASCII character set. It could also mean that it's a true Boolean value for C for instance, because it doesn't equal to zero. So let's dig quickly to some fundamentals. So with the type systems, it's going to be studied by type theories. It's lots of math, lots of computer science also. And also, a programming language is going to need a type checking in place. And the typing was just going to be meaning that it's assigning a type to a value. Type checks could be done either at runtime, it could be done at compile time. It could also be manually noted in the source code. So you're going to declare the type before a variable or the type system I mean could also automatically infer. So without having to declare what the type of a variable it is, the type system can deduce what it is. And like I just said, typing will give meaning to a sequence of bits. So with type systems, we're going to have, of course, type checking. Type checking or type safety, it's the process of verifying and enforcing a constraint of the type system. So it's just checking and making sure that the parts have been connected in a meaningful and constant way. So for instance, we cannot add a string to a list. It's the type, the type system doesn't work like that. With type checking, so it's going to prevent illegal operations. Like I said, adding a list with an integer, for instance. It provides also a memory safety measures. So a good type checking for a good type system is going to reduce the buffer overflows or the out of bind writes that you can do, which would lead to corrupting the running executable or the memory in place. It also helps for logic errors. So to disallow you working with different semantics. So with type checking, we're going to have type safety, like I said. Type safety is basically just enforcing the types in a programming language. It's a requirement for any programming language. And it's also closely linked to the memory safety of your executable. People are often going to compare it to strong typing versus weak typing. So it's only going to be whether if it's, what is the type safety if it's memory safe? Is it static type checking or is it going to be dynamic type checking? However, the problem is that for many languages, many languages are too big for having human-generated type safety proofs. So they would require thousands of cases. However, there are some languages that have rigorous defined schematics. For instance, some ML-based languages. And they have been proven to meet certain definitions of type safety. Haskell also is another language that if you do not use some unsafe methods that are mostly IO operation, you can provide a good level of type safety. I also just want to talk quickly about COC, which is a programming language. Well, mostly it's an iterative theorem prover, which was written in Okamen. It's over 26 years old. It's a dependency type functional language. And there's a browser that was written in this programming language called Quark. And the web browser has a kernel that has been formally verified by COC. So it means that it should be almost bug-proof or secure, as in no buffer refills or no bugs related to the type system. So we have the type safety type checks. But to have type checks, there's two ways of doing it, of course. We're going to have the static type checking, which is going to be done, of course, at compile time. In a static type checking, every variable is going to be bound to a type during the compiler phase. It provides us with some good things for us. So it operates on the program source code. It doesn't have to run the executable. And since it runs on the source code, it helps you to catch bugs earlier in your development cycle. And it's also going to give you a higher level of confidence, in my opinion. And it could be, all depending on which type system you use, it could be also a limited form of formal verification as to whether your program does what it's supposed to do. Some quick static type check language we have is, of course, C++, Java, Go, and like I said, Askel and OCaml or some of them. There's a lot of other languages that are static type checks. The other type checking we have is, of course, dynamic. So instead of doing the type checking at compile time, we're going to have it at runtime. So while the program is executed, while it's running. So it's literally the process of everything type safety during runtime. Compared to static type checking, here every variable is going to be bound to an object and not necessarily to a type, because the type has to be inferred during the execution. This gives you... Usually it will allow compilers to run more quickly because it removes a phase from the compiler, which is the type checking. It allows also interpreter to interpret dynamically new code. So for instance, Python we have eval. It also allows dock typing and easier metaprogramming. The last two are all need for dynamic type checking language, because for instance, with C++ we have templates, and with C++ templates, the type checking is done at runtime. Another example of dynamically type check language, of course, we have Python would be JavaScript PHP, but we also have Clojure and Lisp, which are also compilers, which is a compile language. It's not only a dynamic language, it means a dynamic type check language. There's also a combination of both. Like I said, for C++ templates, there's for Java and C, when you downscale a variable, the type checking is going to be done at runtime. And for C, you can also just downcast everything to void and the compiler would never complain. So that was type systems, but since Python is a dynamically type language, what can static type help us? As I said, it does help us reduce the number of bugs, and by reducing the number of bugs, it will help us to identify more quickly bugs during our development process. With a static type language, people will often say that since my language is statically typed, I do not need to run in tests because my tests are handled by my type system, which is, I don't think it's true, but also people are going to say that a bug is merely a poorly typed, poorly checked type. And this brings also the, I don't know if you remember Hardread, which was over a year ago, when it happened, people complained that if OpenSSL had been written in a language with a better type system, the bug caused by Hardread would have never happened. It's up to discussion, maybe yes, maybe no. We'll see. All right, so let's get back to some Python. Let's say I have this method, Fibonacci method. If I want to run it, Fibonacci 42 is going to give me the 42nd Fibonacci number. If I wanted to test that method, one quick way of doing that would be to assert, for instance, Fibonacci 012 and maybe another bigger number just to make sure that the method does what it's supposed to do. So by only testing those five numbers, let's see what else I can do. So let's say I have my entire set of possibilities of various types, various objects I can have. And I have my set of integers, which are here. The test that I have only tested the five integers, so it's almost nothing in the entire set of possibilities that I have, which means that if I have floats, for instance, Fibonacci 0.0 or 1.5, in the case of my method, it works, but it doesn't really give me what I wanted. And if I do Fibonacci 14.32, it's going to explode right there. This means that with my entire set of possibilities, by using static type checks, I would have been able to... Here I have my set of floats, but I would have been able to just remove them from my set of possibilities since a Fibonacci cannot be calculated with floats. And same thing with strings, yeah, with lists, I don't need those. So this reduces the set of possibilities that I have to test against. What is the current state of static checks in Python? So Python is a dynamic language, but we do have some static type checks that are happening. One of those that I looked upon is JetBrains Python, so the IDE. So they added type hints and type checker for five years, yeah. And the way it works is they use doc strings for Python 2 and for Python 3, they use the function notation that I'm going to talk a bit after. And this provides the IDE with information on what types, either methods or variable are supposed to be. So it gives the IDE user some basic code completion. So as I said, it works with the parameter pass to a function, return values also, and local variables. Some example that I took directly from PyCharm's documentation. Let's say I have this method, and for PyCharm to do some type hinting, I could add doc strings with their specific syntax. So here, for instance, program A, B, C would be integer, and this would help PyCharm to give me either auto completion based on those choices or to warn me if I give a string, for instance. PyCharm's went from a simple class types to a really more complete type checker with, like you see, topical types, generic types, function types. And I must say that they did a pretty good job with the community's feedback, so they just gathered feedback from the user's experience to help have a better system. Another one we have is PyLint. PyLint is a source code analyzer, so it's a command line tool, which looks for programming errors and helps you enforce a better coding standard. So the static checks that it's going to do is they're going to do basic Python, APEP 8, sorry, style guide. They're going to do some various error detection. So for instance, they're going to tell you when you have variable that are undeclared, if you have modules that are not important, if you have a news variable. You can also tell you if you have a return statement and you have code after, it's going to say that the control fold will never reach you so you don't need this code. It's fully customizable. It's extendable. It's a good piece of library, in my opinion. Last time I checked, there's over 180 different error codes that it can produce. And even the current core maintainer gave an amazing talk Wednesday, which was really nice. It integrates nicely with IDEs, with VM, EMAX, Eclipse, PyCharm, and many more. Everything is on their website. It's nicely documented. Another one that I want to bring up on is PyFlakes. So just as a pilot, it's a command line tool that will check your Python source code. It says that compared to a pilot, it's going to be faster because it only parses the syntax tree of the files. And it will never complain about your coding style. And also it will try very hard to never emit false positives. So if there's a warning, it wants to have a real meaningful warning to you. And that may be a problem with the parser. So this is PyFlakes. As I said earlier, there's a functional annotation. This is PEP3107. So this is a syntax that was added to Python just in time for Python 3 back in 2006. And what it does, it allows you to have arbitrary metadata to your method signatures, like you can see here. And with those arguments, we're going to be able to get them from the underscore annotations super method. So this means that the other libraries like PyFlakes are able to use those syntax augmentation to help. The next one is MyPy. It's an experimental static type checker that has been around around 2012. It was heavily inspired by a Python-inspired language which included an optional static type system. So it's an optional static type checker. It's a tool that you can use to run against your files and it will use first of all the PEP 484 type hintings. It has a powerful type system and compile type checking. And the thing is the author wants you to use the tool after writing your program. So start by writing your code and then add it right after and then run MyPy to be able to maybe catch bugs or to enforce a certain type checks on those. So once more it uses the function annotation and it's going to use also the PEP 484 type hints and in that case my Fibonacci method since it takes an integer if I pass a string MyPy will produce an error. So there's also PEP 484 which is optional type hints. It appears since when type annotation started over in 2006 a lot of third party libraries and applications started using those but it sprung lots of different ways of using it. So this PEP wants to be a standard way of doing type hinting and I believe it's going to be compared to what Whiskey introduced of having a standard way and a baseline for tools to work with that. And of course as the author states Python will dynamically type language and the authors have no desire of making type hints mandatory. I won't go too much in those because even Guido gave a talk there was a second talk about type hints so I'm going to just tell that this PEP aims to unify and ease static type checks. Let's go back to our circle of possibilities we have. So if we include static type checking in our program this means that the set of possibilities that we have to test against are really lowered but we still have this huge integer set that we are not sure what we should do with that. We could either do some formal proof with our method but I think this goes against the principle of what we want to do. So one way of doing it is using a library called Hypothesis. Once again a lot of people talked about this week so I'm just going to go quickly on what it does. So Hypothesis is a property-based testing library. It's based on Haskell's QuickCheck library which itself is a combinator library which was written back in 99. It's designed to assist you in testing your software. So it's going to generate data random data and it's going to try to falsify your assertions of your unit test. And once it finds a feeder it will try to give you to simplify at most the failure. So on a normal unit test what you're going to have is you're going to set up some static data you're going to perform some operations which is the method for instance you want to test and you're going to assert that the result that you get by this operation is what you think it is and is what you expect it to be. The difference with property-based testing is instead of setting static data it's going to try to test data that is going to be matching a specification you're going to be giving it. And also you quickly the way it does. So hypothesis will generate random data matching the specification. If it finds a feeder like I said it will try to give you to simplify the filling data. So let's say that you have a big list of integers that fails your test it's going to try to reduce it completely. And of course the data is going to be saved locally for a faster test after since it generates random data and it does lots of tests with it. All right so I have this huge method that we don't really want to care about. It's LZW data compression. I took this literally from the Rosalind Aston's website. So I have a compressed method and I have a decompressed method. If I wanted to test it regularly with a unit test this is what I would do. So I would try to compress my text and then decompress and I would try to make sure that it stays the same since it's a loss less compression algorithm. But with hypothesis what I would do is I would give it a specification which is with the given decorator and I would tell it that it's text. And with this it's going to try to generate random data that are going to be text based and try to make this assertion fail. And if I run it it's going to give me one failure that the method has and it's of course that if I have an empty string it fails. And if we go back to the test the empty string will fail somewhere there. And I think I forgot to put the stack trace. So having hypothesis on top of some static analysis would help us to literally test the entire set of possibilities we have. So in conclusion in the 20 minutes we have type systems are inherently complicated. But it's interesting to know how they work. Both dynamic and static type checking have their cons and their pros but I think that having both of them living in a type system could really help devs. PEP484 is going to unify type hinting and it's going to give more power to develop more type checkers in Python. And lastly of course hypothesis and other things that can help you to reduce and find bugs early in your development cycle. So this is it. Thank you. If you have any question feel free. We have some time just catch me. So you were saying that pylons also does static type checking to some degree? What I want to say is pylons wants to use PEP484 to add some type hints and it's checks because it wants to use the I'm forgetting the words in English sorry but it wants to try to use the ASC tree of the Python code you have to give you some hints of fear you could have. From what I heard it's something in the work. It will be done later. Thanks. Any more questions?