 And welcome to the next keynote from Bob Ippolito. I met Bob when he was involved in Lithuania in Vilnius in 2006, where he was also hacking on the PyPy project. And of course, he was also on several Python conferences. But he actually, instead of continuing with PyPy, he went on doing something very commercially viable. He founded Mojimedia, a company around advertising. And as it happens, he doesn't need to work anymore for earning money. But he can now go to conferences and give interesting talks and play with lots of fun things. And one very interesting thing he's also doing is he's involved in Mission Bit, which is an educational program in San Francisco for teaching children, high school programming, all kinds of stuff. And he's doing that on his own time, simply. And he's also sometimes looking for the next big thing to be involved in. But he's not eager to actually get there soon. So I'm very happy that we have him here, because he also wrote the major HTTP server in Erlang. He has helped evolve the Greenlight Library in Python and lots and lots of different things. I think you have also been involved in the simple JSON parsing stuff. And he's done lots of stuff in Python, but also in Erlang and recently also in Haskell. So I think he's a perfect guy to tell us a bit about what Python can learn from Haskell and functional languages. Bob, it's you. Thank you very much for inviting me to speak here today. I'm going to tell you about some of the things that I don't like about Python. But starting off, I have done a lot of Python. I've been doing it for a long time, since about 2001. Most notably, I'm the author of the Simple JSON library, which I still reluctantly maintain. And I worked on a bunch of Mac-specific stuff, Py Objective C, and all of that stuff. So either you're welcome or I'm sorry, especially to Ronald over there, who still maintains it as far as I know. And as Holger said, I founded a company called Mochimedia. And I worked there for a while, but not so long after I sold it. I've been using Haskell for a couple of years now. Most notably, I ported the exorcism.io curriculum to there. Exorcism is a website which has programming practice problems in many, many different languages, including Python, but also languages like Swift or F-sharp or Haskell, whatnot. So I worked on the Haskell curriculum, and I provide a lot of code review to people that submit solutions in Haskell. These days, I do a bit of advising and investing. I write open source code on the side. And I teach, and I'm on the board of a nonprofit called MissionBid in San Francisco. But we hope to expand beyond that once we've sufficiently covered the San Francisco area. And so before I start ragging on Python, let's just say that Python is not all bad. But I only have 45 minutes to talk, so I'm only going to talk about the bad things. I love Python's community. There are so many great ideas and libraries that come from here. And Python works wonderfully for many users. I'm sure most of you are perfectly happy with Python. And for the issues that do exist, there are often good enough workarounds, like PyPy, Numba, Scython, et cetera. And I'm actually not going to talk too much about Haskell, but suffice it to say that Haskell's not all good. I learn a lot from it. I think it's fantastic. But notably, it has a much smaller community. This non-strict evaluation stuff is just so different from how most other languages evaluate code. Sometimes the documentation, it's not a website with some nice documentation. It's like a postscript file that you download that somebody wrote for their PhD thesis. And that's kind of OK. That's accepted in that community, although that's changing slowly. And it can be intimidating to some, because the terminology mostly comes from mathematics. So you have to become comfortable saying things like monoid, functor, monad, et cetera, which it takes people a little time to get used to if they haven't taken a category theory class. And so if you're perfectly happy with Python and don't want it to change even a little bit, you might want to leave now and go downstairs and hang out with all the cool companies in the basement. And I was a lot happier with Python until I started learning other languages, like Erlang and Haskell. And I didn't look for other languages until I really hit an area that Python just wasn't very good at. And that's concurrency and network programming. And I feel like I'm a better programmer having learned all of these different approaches to problems. And a nice little quote that I ran across recently, and I think this explains some of what we have in Python today, nasty little paradox. The better something works, the less likely it'll be improved. And so these are some of the things that I find that Python makes it hard to do. Python makes it hard to write code that works. It makes it hard to maintain code. It makes it hard to create good abstractions and it certainly makes it hard to run code efficiently. And so on the correctness topic, you're going to make mistakes in any language. It doesn't matter if it has an amazing type system or anything else. You're going to make mistakes. And the key is to have tools that tell you what those mistakes are. But Python defers all of that to runtime. So unless that code path happens, it's not going to tell you anything. The only thing it can tell you is if you use tabs instead of spaces or maybe if you put a colon in the wrong place. And sadly, the static analysis tools for Python today are very, very primitive. They could be better, but it just hasn't been an active area of research. I think this is because the people that can understand this problem and know a lot about type systems and inference and whatnot basically move on to other languages that support these things more readily. And you can write code in Python that you know is correct. And you do that by writing a lot of tests. But some of these tests are things that it should just be able to do for you. Like when you write an integer literal somewhere, you know it's not a string. And you shouldn't be able to add it to a string. And so given that, here's a simple example of a perfectly pep eight compliant Python file that shouldn't do anything. The language guarantees that this code will not work because ints and string are closed. You can't override how they behave. You can't change what plus does. So no matter what, this code is just going to blow up at runtime. And we can very easily just glance at this and know that that's the case. There is no universe to which this code should do anything unless maybe you're introspecting the byte code and doing something horrible, but don't do that. And so let's see what Python thinks, all right? So we throw the Python compiler at it. We use this nice compile all thing, which I often use in automated testing because it tells me if I screwed up the syntax. And what does it do? Well, nothing. It compiles the file. It's totally fine. I get byte code. No error, no warning, nothing. And so another popular tool for looking at the surface of Python modules is PyFlakes. So I run PyFlakes. It also tells me absolutely nothing. And so what about Pylint? OK, so I wrote all these nice doc strings and it's perfectly formatted. So this is 10.0 out of 10. I did nothing wrong here, nothing whatsoever. It runs all these tests, and it's perfectly fine Python. But it's obviously not the case. And do better tools exist? Well, I certainly couldn't find anything. And if I couldn't find anything, that's bad. I would love to hear more about that. To my understanding, that's an IDE or something, but maybe we can talk about that in questions after. OK, so nothing that I could readily find could catch this simple case. Most of the tools are concerned with just the general case of Python where anything can happen and dynamism is cool and you can override things at runtime and there's nothing wrong with that. So it's very difficult to write one of these tools when you accept that every feature of Python is great and it should be used all the time. But I don't think that's the case. I think we should write more restricted code, just that in the same way that we should indent consistently by either two or four spaces, depending on where you come from or where you work. And the thing is that Python 3 already has the hooks that we need to do very good annotations and we can use those annotations to tell the tools what we expect to happen. And there is a tool that works in precisely this way and I'll talk about that in a little bit. So here's what some other languages do with that example. Here's the airline version. So it's a little more verbose in some ways. And what does the airline compiler do? Well, if I try and compile this, it says, warning, this expression will fail with a Baderoth exception. And the interesting thing to note here is that airline is a totally dynamic language. It's not statically typed anywhere. It has optional type annotations just like Python 3 does. And airline also has even better tools. It has this tool called Dialyzer, which does static analysis and success typing and all this fancy stuff. And this gives me a more specific error about exactly why this is wrong. I can't add one to a string because it expects them to both be numbers given the first argument and the fact that plus doesn't work on lists. There's a separate operator for that in this language. And here's the Haskell example. And you might be thinking, oh, wait, where's the type signature? Well, there it is. But this is the full Haskell program. You note that it's just as simple as the Python one, if not simpler, because there's no underscores to be seen anywhere, no equals or ifs or none of that. You just write main and what main should be, and then that's that. And if I throw the Haskell compiler at it, it's going to give me an error. It says, I can't add the number one to the string one. And to explain this a little bit, we can kind of gloss over this, because it doesn't really matter. So it's saying this num chair thing, because by default, lists of chair are strings. So that's what the chair and square brackets is. And operations such as plus, minus, times, negate, et cetera, belong to the num type class. So it's saying that there is no instance of this num type class for lists of chair. And you could, in theory, implement such a num type class, but you really shouldn't do that. Only truly numeric types, such as int, double, rational, et cetera, should implement this num type class. The Haskell folks are, they like to write type classes only when there are algebraic laws that you can apply to them. And here it's the law for this particular category. And so why is refactoring in Python hard? Well, it's hard to refactor without good tests, and not everybody writes good tests. The types are very obvious tests, and they're also documentation. So in one way or another, you're writing the types here anyway. And refactoring is much easier in Haskell than it is in Python due to these types and also referential transparency. Can we do something like this in Python? Well, the answer is yes. So first we have this pure Python, nothing fancy, breadth first traversal of a graph. So we have a starting node, which is some number, and then edges, which is a dictionary of numbers to a list of outgoing edges. And we do the simplest possible bad algorithm. We have a list of all nodes that we visited, a list that is the queue of nodes to visit, and we simply go about our business traversing that queue and appending to it as needed. And so we're gonna do one little thing here is we're going to add type annotations to it from this typing module. And so it's going to tell you all of the things that I just had to tell you in order to understand this algorithm. The starting node is an integer, the edges is a dictionary of integers to list of integers, and it returns an iterator over integers. And the only other thing that changed here is I had to annotate that visited as a list of integers because as an empty list, this particular tool couldn't infer that. But nothing else had to pollute this code. It's not C or old C++ where I have to annotate visited and queue and node and tell the compiler exactly what I expect it to be. It can infer all the details. And then what I'm going to do is I'm going to make this algorithm a little bit more efficient. So I know that if I use a set for nodes that I visited and a double-ended queue or deck for the queue, then it's a more efficient algorithm because it's not a linear traversal every time I look through it. And so all I've done so far is I just changed the types. The only changes I've made so far are just these two lines. And obviously this isn't going to work because these data types don't implement the same methods. So I run this MyPy tool over it and it tells me exactly everything else I need to change to make this algorithm work. And the only thing I had to do was change these two little lines just the types. That's it. And then it's going to tell me everything that's wrong with my program. It says that on line 11, this deck has no attribute Dell item. So that's right here. And on line 14, set has no append because it uses a different method name. So I can simply look in the documentation and see what I need to change. And so here's the correct algorithm. It's very, very simple. I just had to change this to a pop left and this to an add and now it works. But MyPy was able to tell me exactly where to look in the code to make that change. And so is MyPy the solution to this problem? Well, it's the most sensible approach that I've seen so far. But the author needs some help. There are many tasks that need to be done in MyPy that don't really require deep knowledge of MyPy itself or type systems, such as annotating parts of the standard library or third party libraries. And although MyPy does great stuff, it doesn't yet catch the very simple integer plus string error yet. But it can catch many other mistakes, such as the one that I just showed. And so why is PyPy or Nuitka or Scython or whatever not a solution to the problem? Well, these projects are mostly concerned with just performance. They don't really try and help you make your code more correct. They don't help with code quality. But MyPy aims to do both. But I think right now the important part is helping us improve our code quality. And the nice thing is is that this is just Python three annotations. That code runs exactly the same whether you're using MyPy or Python three. And Python three, it's not gonna do any of this type checking, but the code works. So it's fully portable to any Python three runtime. So my modest MyPy proposal here is that we adopt this as a standard. We should start using this typing module or some standard derived from it, even in vanilla Python three. All the code in the standard library should have these annotations. The documentation's tools should be able to read these annotations and automatically write out nicely formatted documentation with hyperlinks for all the types and all this nice stuff. And we need to stop using function annotations for anything else. As far as I understand, they haven't been widely adopted for any other purpose. So I think it's a good time to draw a line in the sand and say that function annotations are for type annotations. And just move forward with this proposal and help us improve Python's code quality. And the nice thing is we don't really have to rely on the quality of the MyPy interpreter or compiler or any of that fancy stuff. We can just use it as a linting tool, as just a step and just a set of tests to run over our code. We can continue to use PyPy three or Python three or perhaps even the older Python VM once MyPy is backported. And what you can do now is start contributing to MyPy and maybe write an official pep for this proposal because I'm probably not gonna do it. And so what? Well, the code quality tools in Python are really far behind other languages. I don't know what PyCharm does, but I suspect that it's not quite as good as Erlang Stylizer or Haskell. And MyPy is a huge step forward and it can be used with our existing tool chain. The only thing we have to do is upgrade to Python three to make it work today. And that's something that we've all been trying to do for the past couple of years. And the only thing we'd have to give up is using function annotations for any other purpose. And I don't think that's a big deal because I haven't really seen too many elsewhere. And in a way, without giving any meaning to function annotations, it's hard to use them for anything at all, except for some domain-specific, library-specific purpose. They're only truly useful if you can use them in all of the modules that you write. And another thing that I have trouble with in Python these days is mutability is just everywhere. You can't turn it off. And I think it's the wrong default. It's a very, very common source of bugs. Even the beginner case where you're using default arguments and you have some keyword arguments set to the empty list or an empty dictionary or an empty set, and you get behavior that you don't expect because people expect it to always be an empty set. But if you add something to it, it won't be empty anymore. And it also prevents many optimizations. You can't do any multi-threaded stuff without locks if things are gonna change all the time in unpredictable ways. And mutability should really be opt-in. And so why should it be opt-in? Well, it's hard to understand code when the underlying data might change. You can reason about it locally, but you really have no idea what else is going on, especially in a multi-threaded environment. The value of something could change from one line to the next. And this means that you either give up and just hope that nobody does anything bad or you do all kinds of defensive programming. So anytime you return a value from your class, say you're returning self.value, you're gonna make sure that that value is a deep copy or something like that. Every time you wrap it in some way or if you wanna use it at a key in a dictionary, you have to just copy, copy, copy. And copying is slow, especially for large, deep-nested objects, and sharing is prevented by copying unless these already happen to be immutable types like frozen sad or frozen dicked or integers or strings or what have you. And these days we care a lot about concurrent access and concurrent access require synchronization if everything is mutable all the time. And so can we even fix this? Well, I don't know because to truly fix this, you're going to need some really large changes to the languages, to the language in the libraries as we know because we've been trying to get rid of the gil for longer than I've been using Python and it just hasn't worked so far. But if we're trying to fix this, we can look towards languages like Rust or Swift for good examples because these languages have opt-in mutability. So in Swift, for example, if you use there, then you're declaring a mutable variable. But if you use let, then you're saying that that is a constant, even for a data structure like a dictionary or a set. So maybe that's a bit more practical for Python to adopt. And another thing that really gets me about Python is that all of the abstractions are really expensive. Some things like PyPy's JIT can change some of the constant factors here and there. But ultimately, almost nothing is free. Function calls aren't free. Classes aren't free. Nothing gets optimized away. So you're encouraged to inline code all over the place and make local references to globals and all this gnarly stuff. Like if you want to see what this does to you, go look at the simple JSON code base. Because I hate doing this, all this inlining, all this stuff, because it's not fun. And you don't want to do it with decorators or AST manipulation or anything like that. You just do it. And it makes it more code that's harder to maintain. But you do it because you get a 20% speed bump here, a 20% speed bump there. And you just live with it. And I really don't like that. And the other thing is that classes in Python are open. You can always subclass them. You can do all sorts of naughty things in classes. And it makes it really hard to analyze them for correctness. Basically, subclasses ruin everything. And so a solution that other languages, particularly the ML family of languages, have come to is having a syntax to easily declare algebraic data types. And so I'm not including Scala here. So other languages do this reasonably. And Python over the years has grown a lot of hacks to sort of give us some of this. Like the new enums in Python 3.3, I think, named tuple, et cetera. They all solve little pieces of this problem. And even the destructuring assignment in Python is sort of a subset of what you get with algebraic data types in other languages. And so here's an example of what a typical problem looks like for a Haskell developer, say. You're working with abstract syntax trees. So here I am in Python. In order to say something is an abstract syntax tree node, I have to define this class. All this class does is define an eval method. And then I have to define all of the different types of AST nodes. So there's a constant, which takes some integer. There's this minus thing, which is like a unary minus to negate the node. And then I have to define add and multiply. And you can see that that is quite a lot of code to do this when ultimately it could be a lot simpler. And so here's that whole thing in Haskell. So in Haskell, I have the boilerplate to say, oh, hey, this is a module, and this is what it exports. So in Python, I would have done that with under, under all, but I didn't really have room for it. And here I'm just clearing all of the different types of AST nodes. There's a constant node. There's an add node, minus, and multiply. And all that is very succinctly described. I can look at this, and I can see, oh, these are all of the things an AST can be. If I look at the Python, I look, oh, maybe I should grep this code base to see if anything else subclasses AST anywhere. You don't have to do that in Haskell. Let's close. These are the only possibilities here. There are ways to do extensibility, but you don't always need that. And it's harmful if you have that all the time, because you can't analyze this to say, oh, you didn't implement evaluation for minus or add. It can't do that statically. And so here's the whole evaluator in Haskell. You typically don't do this with separate functions for each type of node. You just do it in one function. And the compiler can tell me that I handled all these functions appropriately. But the cool thing is, is that in Haskell here, it can do a lot for me. So here I'm saying deriving show and eek. And so what this does is it automatically writes the code for me to make sure that these have a sensible wrapper and implements like under, under eq in a reasonable way. So if I were to do that in Python, this is what it would look like for just that one specific node. So in Haskell here, I have this amount of code to solve the whole problem, but in Python, it's that much code to solve the problem for just one particular node. So it's very difficult for me to go back to Python and work on these types of problems without ripping my hair out. Because Python just gets in my way and none of the tools can tell me if I've done it right. And the other thing is that in Haskell, we often write a lot of abstractions and we know that the compiler can unroll those abstractions for us. You can use something called new type to just make a wrapper around a data type that totally evaporates at runtime. So if you say have a float in your library that really means time and you don't wanna be able to add time to pounds and have it succeed, you can wrap the float for time in a new type that doesn't allow you to interface with other types without unwrapping it first. And this is also used for things like booleans and whatnot. So although there is a representation for true and false that are one and zero appropriately, you have to unwrap that with from enum. So there's no reason to have code like one plus true equals two in Haskell because you can have this very efficient representation and without paying all this cost all the time. And the other thing that languages like Haskell can do is in these abstract data types, if it sees that the field is strict or meets the appropriate criteria, it can unbox it. And this is something that is very hard to do in Python. You can get a little bit of it with PyPy and you can erase some of the cost using slots and stuff, but with Haskell or Swift or whatever, you just get all this stuff for free. You don't have to think about it. And even function calls can be inlined. So you can write small functions that abstract little bits of code and you know that you're not gonna pay for it because the inlining actually works, even cross module. Although you do get some of this with PyPy. And so on the performance topic, I'm not going to focus too much on that because everything I've talked about provides more ahead of time information. And this information could be used by tools like PyPy, Numba, Scython, MyPy, et cetera. And the important part is that these features save developer time by making it easier to write code. But also this information can also be used for optimization and it can lead to better performance. And the trick here is that once it's easier to write better code, it's easier to optimize it because you can refactor it and you can have these types guide you to the correct implementations. And the other thing is we really need to write less code in C and C++ because this is what holds us back. This is why it's hard to remove the GIL because we all depend on NumPy or whatever. And these APIs are written in C and they need to work with the reference counting and acquire the GIL and release the GIL. And when we're held back by that, it's very difficult to change anything. But when we have implementations of all these things in pure Python, it's much, much easier to change the semantics. So thank you PyPy for spending all of this effort over the past decade or so making this even reasonable today. And so unfortunately for the same basic reasons, the Python C API is really holding us back from doing concurrency properly. It's not even something I can talk about solving until some of these other things are addressed like mutability and not depending so much on these C libraries. But once we do those things, then fixing the other deficiencies are gonna be possible. But there is somewhat of a silver lining here. It's been shown that it's possible to have a mixed approach where you write code in a subset of Python, like Py Parallel, where that subset of Python can be compiled and work completely independently of the typical Python VM object system. And so in summary, I really think that we should look long and hard at incorporating all of the good ideas from myPy into Python, possibly even just the implementation as is. And after that, we can add some more of the conveniences that we find from other modern languages. And to do that, some of us are gonna have to learn languages like Haskell or Erlang or ML or Swift or F-sharp and actually come back to Python. But it is a worthy endeavor. Even if you don't go back to Python, I highly recommend learning other languages. I have learned so much by branching out a bit. And if we manage to do some of these things, then we can enjoy a safer, faster and much more capable Python in the future. And thank you. So thank you, Bob, for the very honest analysis. I think typically at Python conferences, you get a lot of hailing how great Python is in all this. And I enjoyed this very much to have a long-term Python user and also involved in other languages to get this honest analysis. So, questions to Bob? Yes, Mike. Yeah, I got a question because I'm learning Haskell myself. Do you think it would be feasible to use Haskell as an extension language for Python and writing all those things, currency, recursiveness, trees? Yeah, Haskell is really good at writing extensions and really make them nice Python objects? Do you think it's possible? I think it is possible. Certainly all of the FFI that you would need is exposed by Haskell to do that. I'm not aware of any library that makes that particularly easy. Most of the effort is kind of the other way around where you're writing code in Haskell and you want to interface with C or Objective C or something along those lines. So, unfortunately, I think the upfront cost would be high to write all of the code to expose Haskell ADTs as nice Python objects. But if you were to do something that was a little more decoupled where maybe you're expecting primitive values in, primitive values out, that that would be a lot more straightforward to implement. But I think that as a learning exercise, that might be a bit tricky, like diving into the FFI and how to cross these barriers. I think it's much more reasonable to try and learn Haskell in isolation, maybe speaking over a network to Python, rather than linking to it as a C extension. Yes? Just a small comment about the static analysis thing. Jedi is going to be able to find most of those bugs you presented. It's very new, so you can't know it, but still. I'm going to talk about it on Wednesday. Okay, great, I'll have to come to that talk. Thank you. Hi, thank you for the talk. I just wanted to share some, I see some correlation between the dynamism of a language and the size of its community. If you look at JavaScript, Python, even PHP, they're all dynamic languages with huge communities. And if you look at, let's say, Scala, Haskell, Erlang, they are like very small, tight communities with very correct, very, people focused on good programming practice, but also on correctness. Maybe there is some relation, something in the fact that they are dynamic that makes their communities grow so fast and so lively. So maybe by proposing a more static approach, introducing that would actually hurt the community side, something to think about. I think that's certainly an interesting consideration, but there are some very easy counter examples, such as C or Java or C++, which have presumably larger communities than many of the ones that you cited. And here what I'm proposing is not mandatory. It is opt-in, full dynamism is still permitted. It's simply encouraged to add these annotations where they're possible and sensible. And anything else would be simply inferred. So it's not like C or Java, where you have to write it everywhere. It's more like the ML family of languages where it just happens. And I think there are other reasons that have nothing to do with the type system, why those particular communities are smaller or larger than others. Thank you. I have a question myself, also because I'm involved a lot in testing and functional testing of systems. To me, it always felt that you have for any sufficiently large system, you anyway have to do some kind of testing, right? And you want to make sure that you have some kind of coverage on your code, so actually your code lines are touched. And as soon as you have something like that, like a test suite where you test the behavior, not just the types, but much more than just the types, you get as a side effect, the type chatting. So your example of like the integer one plus string one would easily cause an error in this test suite, but the tests also provide much more. So basically my question is, if you have a more powerful type system of which I can see the use of course, you still have to write tests. So you basically have to add more overhead and strictness there and you have to write your test suite. So are you actually gaining that much? I think that that's an interesting point to bring up. Of course you would write tests, but you no longer have to write so many of the obvious tests. And in languages where you do have these annotations available, there are tools that can either exhaustively or like randomly generate input. Like basically it writes fuzzers for free. So in Haskell, I can write a quick check property based only on the types that will generate small inputs, big inputs randomly to try and find errors. And those tests are much shorter to write because this information is available. And the advantage is that it tells you what tests you need to write additionally. Like say for example, you have a library that accepts some input as JSON and you write the test for okay, what if the input is a list? What if the input is a dictionary? What if the input is an integer? But you forget to test what if the input is a float? But the types, because they're separate types for each of these things, you can try and compile it with something like Haskell and it can tell you that you don't have a case for floats. So you can very easily write a lot of tests and get a lot of coverage in Python but miss these cases that the type checker is going to catch. And these type annotations are very succinct. You don't have to write many of them basically just in the arguments to your functions and you end up writing that anyway in some less principled dock string format somewhere because the user of your system has to know what goes where anyway. So this is just moving that information into something that is actually verified, either at compile time or run time or both. Thank you. Yeah, and if you allow me, I would just like to expand on your answer and answer another thing of your question. And it's not only the correctness and testing that the extended typing helps but also the documentation of the program. Programs become much more maintainable when the developer immediately sees what type is returned and what type is expected of any arguments that go into the function that makes development much more easier. And also it allows tools to understand the programs, it allows the tools to navigate the program much more quickly like you can do in VisualSea or probably in PyCharm or Idea, et cetera, in Eclipse. So it helps develop much better tools that allow you to program much more efficiently. It's not only the correctness part. That's just my sense. Yes, thank you. That is absolutely correct. Okay, then thank you again, Bob. We're just directly going to start. Yes, sorry, thank you, Bob. Thank you very much, Bob.