 Awesome. So I'm going to skip the introduction. Thank you a lot for the fantastic work. So what I'm going to talk in this session is the recent work we made in Python 3.10, and now in Python 3.11 about improving the error messages in the interpreter. But I want to start first with an interesting story that also kind of mixes what I was doing before doing software engineering. So before my current role right now, I work at Bloomberg in the Python infrastructure team, so I basically make sure that, you know, Python works at the company and whatnot. But before that, I was doing my PhD in theoretical physics, right? And in the PhD, normally you do like pen and paper, although these days it's more like, you know, computer and pen and paper. But interestingly, I started to use Python at the time. And there is this nice story that I always like to think about when people ask me, like, oh, what do you think is the most interesting improvement you can do to the language? And so figure this thing, right? Like we were in the room, like three of us doing our PhD, right? So the three of them PhD candidates. Quite the smart people. And one of my friends was starting to call in Python, and he was doing some kind of a script to, you know, like match data or whatever. The important thing here is that there was something that was not working. Particularly my friend had a syntax error, but the problem is that he couldn't figure out what was the problem with the syntax error, like why that was wrong, right? And my friend then says, oh, Pablo, like, can you come here and try to help me? And I tried, and then we were looking at the script, and we couldn't figure out what was wrong. We started, like, changing things randomly as you do when you are a professional software engineer until something happens. And we even bring a third PhD student. Okay, figure out the scene, right? Like three PhD students. We could figure out the deepest mysteries of the universe, but we couldn't fix a syntax error. So not good, right? And I will show you what was the syntax error. So you can see, like, because you may be thinking, ha, PhD students, they cannot solve syntax errors. So the syntax error was this one. Who knows what's wrong with that? The indentation is good. Obviously, the error is not exactly in this line, but this is the error that you get. So the line is fantastic. Like, don't lose your time trying to figure out what is wrong with that. The line is correct. So I want to give you more context. This is the code that, more or less, is around the line. Who sees the error? Okay. Okay, nice. Exactly. So the error is that if you look at the fancy dictionary there, there is one open bracket, there is two open brackets, oh, there is only one closed bracket. But look at the error. No bueno. So this is very annoying. And what I'm showing you here is, like, the smaller example I can show you, so it fits on the slide. But obviously, as you can imagine, the original code has a gigantic dictionary full of scientific whatever. And instead of two curly braces, there's, like, 16 of them. And the one that is not closed is, like, super deep into the dictionary. So it was not that easy to find, right? But the problem is that, like, three million years, like, years after the closed bracket, you find the function definition and the error happens on the function definition, right? You may be thinking, wow, Python is quite bad, huh? Well, it may be true, especially before 3.10, right? But, like, the thing here is that for the parser point of view, it makes sense because, like, it's trying to understand the dictionary and it's saying, okay, yeah, dictionary. I can see here things that I recognize in dictionaries and whatnot. And then, you know, it's trying to find that closed bracket, but it's not finding it. So when it reaches the function definition, it thinks it's still inside the dictionary. And then it says, yeah, you cannot do this function definition inside the dictionary. What are you thinking about? But, like, you as a human have other plans, right? Like, you're planning to do this fancy function definition and then the thing is getting in the way. And this is the problem that we're trying to solve. Like, you as a human have some ideas of how the world works and the parser has other ideas. Like, the parser is a smarter. It's just that, you know, when it complains, it's not very useful. So this is what we're trying to solve, right? And not this particular example in particular, but all of them, right? So I want to show you, like, some other problems that Python has. So, for instance, this is just the problem that you saw. If you don't close a bracket in a dictionary definition or some other collection and then you have something else, in this case, a function definition, then you get this ugly invalid syntax. This is another one. So if you try to define, like, a comprehension, but then you use a comma, an unparanthesized tuple in the value, then you get syntax error. This syntax error is quite weird. You have not seen it before. It's because you need to parenthesize the tuple in the value. This is you don't close, like, a list and then you have something else, like an equality. Then you get the syntax errors on the equal. Quite weird. Think about that these are simplified examples, right? Like, these things can be, like, super complicated at the beginning and then you get the error, like, you know, super far away. So not good. For instance, this is quite funny. You get a bunch of dictionary matching core developer names to the GitHub usernames and you see you forget a comma over there and then what this is going to tell you is that gukas, langa, is a syntax error, which is kind of rude. But, you know, like, no bueno. So this is another funny one. If you see this one, you may be, like, especially, again, think about when these dictionaries are quite complicated. This can be quite difficult to the back. I mean, it won't take more than five or ten minutes, but it's five or ten minutes that you will, you know, have to spend doing these things instead of what you want to do for real. This is another one if you don't parenthesize, like, exception in multiple exception handlers. You get a syntax error in the comma. Or this one when you forget, like, for whatever reason, you forget the value in a dictionary that you get a syntax error on the bracket. And everyone's favorite, this one. How many times have you found this one? And actually the question is how many times have you had to explain someone what this means? Well, I expected more hands. Do you know, who knows what this means, actually? Oh, wow, okay. There is some explaining here. EOF stands for end of file, because apparently writing, you know, the whole sentence is expensive, so we put EOF. The reason you don't see anything is because the syntax error is pointing at the end of the file, and at the end of the file, there is nothing. So that's what the, you know, there's no code there. And if you see the file, example.py, in reality, it has nine lines. And it's telling you that the error is on line ten because it's the end of the file. But this is, everything is wrong with this slide. Like, the lines are wrong. Like, if you try to use your editor to go to line ten, there is no line ten. Like, what the fuck is EOF? And like, you know, like, what is this carrot? Like, do you see this thing, like, the carrot pointing to the void? And it's like, you know, all your hopes are just lost. Like, what is this going on, right? You know, if you have seen this one or two times, you may say, okay, yeah, I got it. Like, I know what this means, like, I will fix it. But like, think about, you know, some young person, you know, full of life and excitement, and it's like, I'm going to learn Python. And then Python says, I have a thing for you. And, you know, like, then you add all these people saying, oh, man, like, Python is quite hard. And it's not really hard. We have these kind of errors, right? So not good. So the idea that we, you know, apart from many other improvements we have made in the Python interpreter over the years, or the previous version, or the current version, because we are going to release 11 this year, we said, okay, we need to improve the situation, right? So let me show you some of the new improvements that we have, and then we are going to cover how we made them and, like, different problems that you may find with those errors. So all of these errors are possible because the work that we did with Guido and Lisandros in PES 617, and we basically replaced the whole parser. The parser that was before, if you check the commits, so the first parser that we have, it was an L1 parser, which is a kind of parser. It's not important what is the difference, but if you see the dates over there, the first parser was basically committed in 1919, and the pair parser was committed on 2020. So it's almost 30 years old. Like, the previous parser was there for 30 years. It was probably one of the oldest pieces of Python, which means that, you know, it was working quite nicely because you could write Python before, right? But, you know, it had all these different problems, and actually it was like... It was not allowing us to do some of the cool stuff that you can see on Python 3.10 and forwards. For instance, just because we have a new parser, we could do a bunch of things like parenthesize context managers. So you can write, like, parenthesis around context manager groups. Or, for instance, you can have match statements. Like, who likes match statements? Well, a bunch of people. Nice, good, good. You need to say that more on the Internet so people don't only think that they suck. Everybody likes match statements, right? So these things are only possible with the new parser. But there is a problem, right? Because, like, you know, the new parser allows all these things and, you know, we are introducing all these funky grammar, like, you know, match statements. What is even that? And people get angry in the Internet, right? Because that's what you do in the Internet. You just go and, like, you get angry and someone gives you money or something. But then, you know, that is not good. And people demonize the back parser because it's, like, oh, a stupid tool. It's allowing humans to do things. And they think the tool is bad, right? A evil tool. But what I'm going to try to show you here is that, you know, the back parser is not bad, right? It's like a knife. Like, it's not inherently bad. I mean, you can do bad things with a knife, but I don't know, you can also do bad things with a spoon and people don't hate the spoons. So, well, don't quote me on that. But, like, anyway, the idea is that I went to show you that you can do a bunch of very cool things with this parser that is not only funky syntax that you may or may not like. In particular, I want you to think of the back parser as allowing a bunch of super cool user-centric features, like error messages, but other things that we may be able to cover in the questions, like, you know, improve F strings and whatnot. Okay, so let me show you a bunch of the cool things that we can do with the back parser. So, for instance, this is a very common error. If you have, like, a conditional and then you have, you forget the column, which a lot of people do, instead of, we are, like, you know, invalid syntax, then you get, like, oh, I expected a column over there. Nice. Then imagine that, again, you forget, like, the value in a dictionary, right? Like, that set over there doesn't have a value. Now the parser tells you, whoop, I was expecting an expression after the key and the column, which is nice. Imagine that this is also very common. You get a conditional and instead of, like, you want to compare two things, instead of comparing them, you use one equal, because, you know, it happens to everyone, even to the best of us. And instead of, like, giving you a funky, like, invalid syntax who knows where, then you get, like, this nice error saying, like, oh, I cannot assign to an attribute here. Maybe you meant to use, like, double equals or the wall roots. And here, again, like, maybe you forget a comma in a dictionary, like, literal. And now the parser tells you, whoop, that doesn't look good. Like, oh, maybe you're forgetting a comma over there, which is very useful. It doesn't report you, like, three million light years away. So everybody wins. For instance, here, this one, like, oh, you mis-indented that if, right? Like, bad day. No, because now the parser will tell you, oh, you mis-indented that if statement on line two, so you can just go and fix it. And everyone's favorite. So if you go back in time, those piece, these things now will be very happy, because now it tells you that the dictionary was not closed. Oh, wow. What about that? Nice. Yeah, yeah, yeah. Four years of PSD is just thrown away when I could have this. Don't do PSDs. Anyway, awesome, right? This is just a bunch of them. I'm not going to cover all of them because, like, we will be here forever. But I want you to teach you about how I did these error messages maybe hard. So the idea here is that there is two components of these things, right? The technical component, which I'm going to more or less cover, but I want you to think about these things also as the human component versus the machine component. Because as a human, you know what you want to do, right? Okay, I want to write the dictionary. But the machine, like the parser in particular, is going to try to understand what you are trying to write. And assuming that you make an error, that you make a mistake, then the parser is going to try to understand what you may be trying to do. And, you know, you know what you are trying to do, but, like, the machine may not be able to. And the complexity on all of these things is that the guess that the parser is going to do is not very far away from what you are really trying to do, right? And that is quite hard. Like, you know, it's not only on parsers. Every time you try to do errors on your applications, right? Like, maybe you are checking if a value is less than, you know, bigger than zero or something. And then it turns out that you need to, for giving a good error message, you need to figure out why someone will have passed your number bigger than zero. Oh, sorry, less than zero. Maybe they are trying to index a list from the back. Okay, maybe, you know, you don't allow that, and then you need to emit an error message. So, you know, bringing the technicalities of things and what humans expect is quite a hard task. That is one of them, but we are going to focus on the technical challenges, which are more funnier. Okay, so as an example of why adding error messages is difficult, think about this, right? Let's imagine that you want to introduce these error messages, right? Like, you have a list, and if someone forgets a comma between two of the elements, then you want to say, oh, perhaps you forgot a comma. So, let's try to implement this together, right? Okay, so how do you do that? Well, you go to the grammar of the language, and now it's a back grammar. You don't need to understand what this means, but like, I will more or less give you the idea. So, you introduce a new rule. We are going to call this rule invalid expression. And then you say that an invalid expression is an expression like three plus two, or a, or, I don't know, dictionary, followed by another expression with a comma in the middle, right? So, think about x and then y, or one plus one, and then three plus two, or something like that, right? So, if there is no comma between two expressions, then it's very likely that someone forgot a comma, right? And then we capture these two expressions using this equal syntax, and then if that happens, like, if this rule parses, like, the parser sees that this is happening, then we are going to raise a syntax error, and then we are going to say, we are going to point to a and b, and then we are going to say, oh, invalid syntax, perhaps you forgot a comma. So, what we expect, sorry, what we expect is precisely that, right? Okay, so you implement this rule, makes all the sense in the world, and what happens? Well, it turns out that the rule doesn't work. It doesn't work because, for instance, if you forget the in keyword in a for loop, then you get, oh, perhaps you forgot a comma. No bueno. Then it turns out that if you write incorrect stream prefix, then you also get invalid, maybe you forgot a comma, which obviously is not the problem. But you also, like, if you forgot to close, you know, a tuple, for instance, and then you have an equality afterwards, you also get perhaps you forgot a comma, which is obviously wrong, and then you get this. I don't know what is wrong with this, but apparently I am forgetting a comma. And also, like, you know, oh, you write a bunch of numbers because, like, why not? And you are getting the comma, apparently, on the right part of the expression for some reason, not good. And also, like, much statements don't work anymore, because, you know, there are two names together, except that one of them may be or may not be a keyword, because I don't know if you know it, but much is not a keyword, it's a soft keyword. Ooh. So it doesn't work, right? Oh, damn, soft keywords don't work anymore. So, you know, like, not good. And you may be thinking, oh, yeah, but this is all theoretical, right? It's not that someone has actually had to fix all these problems when they introduce the comma error. You can see this is a real issue that I fix when I introduce the comma error. And what I have seen is me, actually, learning that error messages are hard. Yeah, not good. And this is just four of them. There was, like, six or seven. So, you know, it's quite hard, because, like, you may be thinking about a super small, like, subset of the problem, you know, expression plus expression, like, why that will appear in any other case. And it turns out that, you know, you forgot about all these cases, right? And that's the problem, because you may be very happy and thinking that, oh, I got it. And that you're trying to match everywhere, you're forgetting that, you know, all these other cases can also be matching your rule. It's like writing a regular expression, right? Like, you write a regular expression, and the regular expression is too generic, and it's matching things that you don't want to match. So it's the same idea, but with a lot more people complaining about it. So, yeah, that is quite hard. There is other problems, in particular, the peg parser turns that peg parsers are by nature exponential. This means that the bigger your input is, the more time they take to parse by default. And the complexity that they grow in time is exponential. And this means that the more characters you add, the grows in exponential time. Normally, you fix this problem by introducing what is called amymojation cache. So that way, you take them and tame them to be linear. That is called pack rat parsing. It's a very common technique, and this is partially what we do in Python. But you need to put that thing in. Like, you need to say, I want to use the cache here. Otherwise, you will be using memory all over the place. But you can think that you have all of the cases covered, but you may forget about some of them. For instance, it turns out that in Python 3.10, this is fixed. But this expression, which is a syntax error, because it's a bunch of open brackets on a column that you cannot write that. Python takes two seconds to tell you that this is a syntax error. You may be thinking, well, I have two seconds. But this expression takes over an hour. Well, it's fixed. We fix it. Apparently, someone here was very happy, because they like to put a lot of brackets, and now they can have syntax errors in nanoseconds. Hooray! But it's hard, because you may be thinking that you have everything, and then someone, the bracket guy comes in and says, hey, what about my brackets? They take over an hour, and people are not happy. There is a lot of things. What I want to teach you here with this is that there is a lot of things to take into account in syntax errors, not only on these much rules that I don't have in mind, but also on the very core technicalities of the parser. In this case, it turns out that the very technical details of how the parser works, like exponential time, et cetera, et cetera, can percolate over GitHub issues or whatever. So those are hard. But it turns out that now that we have a lot of syntax error covered, we say, oh, why is it stopping here? We can do also runtime suggestions. And this is very interesting as well, because runtime suggestions are a different beast. Like, parser errors happen when the parser tries to understand your program. And that happens only once. Normally, the Python compiler then produces PYC files, and those PYC files are compiled by code. So the second time you run your programs or your modules, they don't need to be parsed again. They just load those PYC files from memory, and everyone is happy. These are errors that are detected and generated at runtime. Which means that you are paying for the detection and producing the error. And I have to say something over here, because when you're going to see these errors, a lot of people say, oh, this is just like Rust. I mean, sure, yeah, Rust is cool and has a lot of other messages, but come on. We don't go copying other languages. We can also do independent work. We can all be nice programmers and languages, right, without having to say, oh, a user is just copying Rust. In particular, runtime suggestions is something that Rust normally doesn't or other languages. It doesn't need to care about a lot, because normally, these nice error messages that you get with the Rust compiler happen at compile time, which means that it's not when your program runs. These errors have the extra complexity that we need to make sure that your application is not slower when it runs with other languages. Let me show you some of the ones that we pack in Python 3. So, for instance, if you mistype when you're trying to access attribute in a module, for instance, collection, I wrote name topple, which is run. Instead of telling you, like, okay, yeah, we don't have name topples here. Now we say, oh, maybe you mean name topple, right? Okay, yeah, nice. I like that. And this also happens with, like, variables, for instance, if you try to write a bar shell black hole, which I never know, right? Then you have variable correctly written, and then you mistype it in a variable, then instead of telling you, yeah, this variable doesn't exist, at the end it tells you, oh, maybe you meant, like, the correct spelling of that. Nice. So, how we do that? Well, this is a problem, right? Because, like, the problem is that if, well, let me show you how we do that, and you will understand what the problem is. So, for instance, this is the idea. The first thing we did is that when you attribute the error, so, for instance, in this case, I'm trying to access the attribute something on the variable X, which doesn't have something. Then we have, in Python 3.10, we have placed two more special attributes on the attribute error. We have placed the name that you're trying to access and the object. These two were not available before. And, therefore, the exception itself, the attribute error, knows the name that you are trying to access inside the attribute error and what object you were trying to do. The attribute of the attribute is something here and the object is this X over here. Okay, nice. The problem is that at this point, you could say, oh, when we construct the attribute error, we can do this super fancy math scientist computation and trying to find what is the attribute that you may be trying to access. The problem is that this code is valid. Like, someone may be doing, you know, an attribute access and then failing the attribute access and the program keeps running nicely. So, if you do the computation and you want to be super slow because now you're paying for computing an error message that is never going to be shown and that is the challenge. The challenge is, like, how do you do these nice errors only when nobody cares about, like, when people will care about the error and the program won't continue. So, the way we do that is that we, the algorithm that we are going to run is a word of distance, so it's nothing fancy. This is, as you can see, like, if it's in a slide, you don't need to understand what's going on. The second that we use in Python is not living string distance. It's a modified version inspired by GCC and other compilers that have thought about this much more than us. But the idea is that you basically have a bunch of words which are the attributes that are in the object. Then you have the attribute that you are trying to access and then you find which is the one that is more similar based on something called string distance. So, you can search this in Wikipedia. It's not important for this. It's just that this is the basic algorithm. Then we say, okay, we initialize the current distance to minus one and then we use the dir function over the object just to give us all the real attributes that is there. Then we try all possible attributes. We calculate the word instance. We take the smaller one and then we return this as the suggestion. So, this is the idea. Obviously, this is made in C. The code is insane. But, you know, in Python it seems that it's even reasonable. But this is the idea, right? And the problem is that that thing needs to be faster still. If someone is trying to access an attribute, raising the attribute there and then doing something that is legit and the program continues cannot be slow. Not only cannot be slow, but it cannot be even a bit slower because people will care a lot. I don't know if this is a pattern that people do a lot, but they could perfectly do it and you need to care about what's going on there. So, the way we do that is that this is C code, so don't freak out. It's very small because you don't need to read it. But the idea is that there is this function inside interpreter called printException in C, right? And this is executed when the exception has reached the top level. Nobody has caught that exception and we're just going to print it. This is the trace back that you normally see. So, at that stage, the interpreter is no more. The party has closed. You know, everyone is going home and then we are printing the exception. So, at that point we can take a bit more time to, you know, calculate stuff. So, if you see this thing, but what it does is it gets the exception, then it prints the exception file and line. So, it's telling you where the exception is happening, then it's printing the exception message. It's a bunch of things. And then we are adding this extra thing called printExceptionSuggestions. And printExceptionSuggestions, you know, like, which is the line over there. So, this is basically the code that I showed you, except that it's like a lot of C code that I'm not going to show. But the main idea is that the way we try to make sure that your programs are not slow is that this only runs when the exceptions are being printed. Nobody has caught them. The interpreter is going down. So, at that stage we can take a bit more time doing this computation. We also have like a lot of extra checks. Like, for instance, if you have an object that has 6 million attributes, we only take like a bunch of them. Also, like, if the strings are very big, we don't compare super big strings because that could take like a lot of time. So, there is a bunch of extra things that we take into account. So, this doesn't take like forever. But the idea is that I want to show you that even if these things don't work, you need to be very careful so they don't impact like everyone. Cool. So, the last thing that I'm going to show you here is like better trace back on Python 3.11, which is something that I worked together with Amar and Batuhan. Batuhan is over there. So, you can thank him also after the talk. And this is super cool. This is going to land, well, this lands already on 3.11 and we're going to show you like what this means. So, imagine I have this trace back, right? In this trace back you can see that you have a bunch of two points and then you're adding something else and the error is oh no, non-type object has no attribute X. So, this means that one of these guys over here is non. But which one is it? Oh, you don't know. But with the new and improved granular error messages then you get this nice underlining telling you that guy is non. So, you don't need to attach other bugger, you can just see it from the trace back. How cool is that? It's cool. Awesome. But wait, wait. It gets better. What about this other error? Oh no, I have this gigantic JSON and I have many levels of the JSON like A, B, C, D, R and then I get this horrible error saying non-type object is not subscriptable. Which one is non or is this one nice? You don't need to attach the bugger. Now you can see it. It's there. Awesome. And it works with everything. It works with your libraries, it works with your code, it works with absolutely anything. Do you have this weird super simple computation? Division by zero. Which one is zero? Is that one? It's awesome. You can just see it there. It's very cool. So, how we do this? So, the way we do this is that you know, when you write this code, the Python interpreter writes a bunch of bytecode instructions. And those bytecode instructions basically things like, okay, load this name and access this subscript on the name and a bunch of things. So, what we do here is that we attach positions from the code to every of these bytecode instructions. So, we know which chunk of the code generates every of these bytecode instructions. You can access this thing in 3.11, for instance, using in the this module, you have this function called get instructions. And it will bring you a bunch of instructions, objects. And you can see that, for instance, for this binary subscript, which is basically out of the square brackets, you can have this new attribute called positions. And positions tells you like the line number and the end line number and the open square line and the close square bracket in another. And it will tell you the column offset and the end column offset. So, you can know exactly which bunch of the code associates with the bytecode instruction. And then, when the exception raises, we have this ridiculous amount of code. This is very ridiculous. You don't need to look at it. But this takes into account like, you know, this extracts the chunk of the code that is raising the exception. We rebar that we can also add extra information. We're very hard to make this right. And it's very good code, but it's C code, so everybody hates it. We have a nice comment over there. You can see the structure. So, you know, like, someone could make their PhD on this. But the idea is that, you know, we put a lot of hard work. So, you know, you can get all these nice squeals and we will tell you exactly what's going on. And the way we do this then is basically, you know, then we produce, like, the positions. And we know that everybody could instruction, you know, what positions they have. Once we know the instruction that has raised an error. So, for instance, in mind of accessing a subscript, because it's not there. So, this instruction raises an error. Then we reparse and we can use the AST of the expression and the positions to rematch and understand, oh, actually what is failing is a subscript and this is the subscript and the one that is failing is the last on and we're using that. We can actually point exactly where the expression is failing exactly. So, we can also add this extra contextual different kind of squeals pointing to it. Very cool. Soon on 311. Awesome. So, last thing. So, how can you help? And I think, yeah. So, as you can see, this is quite cool because, like, you know, this takes a lot of work, but turns out that a lot of people are very, very excited about this. Like, I get a lot of people, like, coming and saying, oh, man, I love these new error messages. And interestingly, what I found is that, you know, we have been working very hard on many other parts of the interpreter, like, you know, making Python faster or new modules or new APIs or, like, fixing old bags. But, like, by far, the thing that got most people excited in my experience is this. So, we are trying to, I can tell you that we are trying to put more effort on making sure that, you know, the interpreter gets smarter telling you errors and hopefully in 3.12, you get even more improvements. But it will be cool if you give us a hand, right? So, how can you give us a hand? Cool. So, the first thing that you can do if you want to get your hands dirty is that you may be able to say, okay, I want to add a bunch of new syntax errors. So, I wrote this big, big document in the Python developers guide. So, it's devguide.python.org or something like that. If you search in Google, devguide.python, you will find it. Here, you have this nice document called guide to see Python parser. And, you know, this is a very technical document, so if this is not your cup of tea, you don't need to go through it. But this will go through, like, how the parser works and, like, how you add new grammar and a bunch of things, like, you know, also, if you are doing these talks, that there is one every Python when someone modifies the language to add, like, pipe operators or new lambdas or whatever, I have seen all of them. I have even done one myself, obviously. So, you know, you can also read this and you will find how the new parser works, so you can implement very funky new grammar. But, you can actually read it and, at the end, you will find a section on how we add new error messages. And that will explain you, like, all the problems that you may have, like, how you can test new error messages, how to make sure that your error messages are good, et cetera and et cetera. Actually, a lot of people have done that and, for instance, in Python 3.11, we have, this is just a bunch of them, but there is more. And, oh, I am finding, you know, this problem a lot and this error message is not good, so they are actually suggesting or even, you know, doing themselves, like, pull requests against the Python, adding new error messages so, you know, they improve. I have to say something here. This sounds very exciting, and you may have, like, an error that you really, really, really hate and then you spend, like, 10 million hours trying to fix the grammar, and, unfortunately, we need to reject your PR. The reason is that, even if, you know, we understand that it takes a lot of time and it's not easy and then you're very excited, it may have side effects or it may be making the parser slower. As you can see, as you saw before, there is a bunch of things to consider, right? And you can have, like, weird surprises that you may not be very aware of. So, if you try to do this thing, we really, really want you to try and, you know, to help us. It's just that come with a clear mind and really reject your suggestion or modify it or change it, just because these things are quite hard when you consider it in one of the, or if not the most popular language in the world, right? There's a lot of users of the language and if you get it wrong only once, you're going to have all of them at our doorstep with, like, tortures and pitchforks, and we are the ones that are going to get the pitchforks, right? Not you. So, just, you know, we want you to, so this is not deterring you to do it, it's just that, you know, a bunch of them put some nice candles and then open the pull request. But yeah, you can do that. But if you don't like to get your hands dirty or you don't like C code or you don't like parses and grammars, something that you can do which is super useful as well, is that, for instance, if you are a teacher or you use Python, you teach people Python or you interact with Python yourself a lot or something like that. And then you have errors that you have seen people struggle a lot or you struggle a lot yourself with some particular kind of errors, even if it's syntax errors or other kind of errors, you can open issues on the Python backtracker which now is on GitHub, so, you know, if you go to the C Python repo on issues and you can tell us, hey, I hate this error, can you help me? And then we will tell you, like, yes, obviously, sometimes we will tell you no, but mostly we try to tell you yes. So that will be very useful, why? Because we as Python developers, like developers of the language itself, we are quite biased towards errors. Like, some errors are super, super weird, but we have seen them so many times that we don't mind them anymore. We know what they mean. So why are we going to fix them? But maybe you don't, right? Or maybe your students don't. Or maybe, like, I don't know, do you find something super, super particularly weird? So for us, it's very difficult to identify and prioritize these errors to know which ones need to be fixed before. And with your help, we will be able to do it if you tell us which ones are the worst. And one of the reasons we have prioritized these errors is because people told us, you know, that they are worse. So that's the whole talk. So the summary here, like, you know, peck parsers are cool. Error messages are cooler. 3.11 is going to be incredible if we manage to release it. Because, like, if you have followed the latest developments, it's getting a bit difficult to release, but we are getting there. And as you see, we have put a lot of work to make sure that, you know, your experience when dealing with errors, like when things don't work anymore is right. And I suppose that, you know, the moral of the story is that if you are doing your PhD and then you find a bunch of syntax errors that you cannot solve, you can cry in a corner. Alternatively, you can, you know, study a lot of Python and parsers and grammars and become a core parser of one of the biggest languages in the world and then fix the error. Or you can wait for someone to do it instead of you. I hope you have enjoyed the talk and that's it. Okay. So we're a bit ahead of the schedule. So we have time for plenty of questions. So I'd ask people in the room if you have a question to just line up in front of the microphone here. And for people online, just to let the online organizer know that you have a question so that they can answer it. Yeah, go ahead. Thank you for the talk and for your work. I was just wondering if Python 3.12 will write itself. Well, no. But like... But Python 3.10 is already writing itself in some ways. So for instance, the parser that we have is called, you know, it's written in C and whatnot. But we don't write the parser actually. So you can search groups and things like that because in the wild you don't find this. But what we have is a parser generator. So we have a program that reads the grammar and then generates a C parser, right? But turns out that that parser can also read itself and generate itself. So in some ways, Python is already, you know, generating itself, which is quite cool. Thanks for the question. Do we have any questions online? No. Okay. Go ahead. Thanks for the talk and the work. It's great. I was wondering how you do regression testing or how do you judge the effects of a potential new error message because to judge the effects of a potential change on correct Python code there's a 10 million, billion lines of correct Python code out there you can check. But do you have a corpus of incorrect Python code? I imagine you have individual tests. But do you have a large body of statistically useful incorrect Python code that you can actually work from? What an excellent question. So this is actually a very, very, very hard thing to solve. We have things. So the reason just an introduction why this is a problem. So the reason this is a problem is because what is valid Python is a very consi... Well, it's not as small. Technically it's an infinite amount of programs. You can write infinite amount of valid Python programs. But there is much more constraints that you need to test like this other infinity, right? Like cardinality is a left 6 million. So the problem is that to do that what you need to test basically is like, okay, imagine that you have a syntax error that you want to add, right? And then you know exactly the kind of code that will fail there. So what you do is that you start mutating the strings to see the syntax error still happens and then you have a manual step when you find the biggest errors at the beginning so you start to trim it down. What you do after that is that you analyze the resulting grammar. This is what I do, right? And we have a small program that does this. So you analyze the resulting grammar to know basically to do a bit of grab analysis over what rules are going to be affected by the new one that you are adding so you know exactly how the syntax error may propagate around. This is very insufficient in most cases because, again, the amount of invalid Python programs is gigantic. Unfortunately, we try to find a corpus of invalid Python programs. We have found in research groups or in Stack Overflow, actually, they have collected all the questions in Stack Overflow that are about syntax errors but the problem is that most of them are indentation errors. Yeah, or like missing columns. So we already know and those are quite easy to do. But the comma one, that is very hard and the problem is that to understand all the ways, like that is like when I showed you all the different problems, to understand all the ways this could go wrong and the suggestion may be incorrect, it's quite hard and the only way to do this thing is trying to do a bunch of graph analysis and trying to find a round because there is no way to do faster for this because you cannot automatically check these suggestions make no sense. So what we do is like a bunch of that and then waiting for users to report, oh, what a ridiculous suggestion, right? What we do as well is that the error doesn't say this is wrong. What we say is like perhaps you forgot the comma, right? I mean, it's just perhaps, right? So you cannot be super angry if you say oh, this suggestion is wrong, right? Because we say, oh, perhaps. So, you know, thank you. It's a very good question. I'm just going to check again if there are any questions online. No? Okay, so please go ahead. Thank you for the talk. I have a question. I have the feeling that in the past few years, couple of years there has been more focus on user experience on exceptions. Also, for example, PIP having better exception messages now, Python. What is your thoughts on that? Like when that started to happen or is it just because, for Python, is it just because of the new pack parser? Or is there some other movement going on that focuses on this UX on exceptions? I can absolutely tell you exactly what happened because what happened is that someone complained on Twitter. Yeah, it was Anthony Sotil. He said it was the error with the end of file. You forgot to close the parenthesis thing. He said, oh, this is horrible. And I said, bam, okay, let me fix this. And then I fixed it and I showed it to people and people were incredible. And I said, okay, let's do more of this. And it turns out that PRADIUM, which is the PIP maintainer that did the improvements on PIP, was like, oh, people like this thing. So he said, okay, let me try that to PIP. So as you can see, please don't complain on Twitter. That's not a way to solve problems. As you can see, a small spark followed by what I think is the most important part, a lot of the excitement from users telling us, we like this work. The other work that you do is nice, but this is very nice. So that's the fuel of open source. We see our users excited about something and then we put a lot more effort. And then very smart people like Eva Duhander and Amar and other people joined the effort and we start thinking about, okay, how can we do more and more and more. If you backtrace this thing to the beginning, it's just some people complaining on Twitter. Awesome. Thank you for the talk. So Python is still evolving and the syntax is evolving and NewPep gets accepted and that changes the synthesis of language. So how does that affect the work that you've been doing and how changes in language need to be maintained in terms of this? Great question as well. When we add new grammar in the language, like for instance when we added the exception groups or when we added the match statements, we normally work with, I mean the authors are normally core developers in this case or a core developer is the implementer. So we work with them to make sure that the new grammar also have error paths. So we work with them. Normally the first implementation because these implementations are quite big already they have a minimal subset of them so just a bunch of the obvious cases and as people use them more and more we start adding more refined error messages around. But the idea is that we normally coordinate with the core developers or sometimes it's just us. Our people are excited about those particular subsets and they say okay, this is a very good opportunity because there is little error messages here. But there is another interesting case and when new peps are added several features for instance, the new pep parser allows this concept of soft keywords which is maybe a keyword maybe not like match, right? You can use match as a variable as an argument but you can also use it as a keyword. This is cool because we don't need to forbid everyone using match as a variable name and it's a cool technology but the problem is that this makes the parser extremely more tricky around the soft keyword it means that the parser needs to figure out more about that. It needs to reparse that a bunch of times or maybe infinite bad tracking because it needs to just try everything and then try it without the soft keyword. So adding soft keywords is very tricky and very dangerous. And what I have seen is that there is a lot of people thinking that now it's free party keywords for everyone but that is complicated, right? Because even if it's very exciting because you don't need to add normally this thing as a keyword then it turns out that having a good analysis of the grammar and the impact on error messages is important because adding soft keywords may invalidate a bunch of other messages around the soft keywords. Like you saw it in the match statement like the comma error invalidated the match statement, right? So it's very tricky to fix and take into account those and the more soft keywords you add the more tricky it becomes. So one of the things that we are doing right now is to be very careful when people propose new soft keywords because this cost needs to be taken into account even if it's... People are super excited because now they can propose new syntax but this cost is not free on the parser and at the end of the day what do you want? More grammar or better error messages? So it turns out that this is a decision that you maybe need to do, right? And that's one of the challenges that we have. Thank you very much. Let's give Pablo a warm round of applause and thank him for the talk.