 Cool, so who's ready to hear about f-strings, huh? As you say, yes Nice cool. Can you mind and then I say I'm not and then I abandon the podium cool So we are going to talk about f-strings and in particular some changes You're going to see in Python 312, which is cool But in case you don't know what f-strings are let me make a super fast introduction So like he's like the way you format things in Python all the old way So you use this format thing and then you specify what the formatter parameter you want and then you pass like a bunch of numbers And then the string will have the value 2 and this is kind of boring So in Python 3.6 we added this cool thing called f-strings and allows you to drop this dot format and voila you put an F and then you put whatever you want there and Magic it just appears like science cool This is kind of an easy example But you can do very cool things with f-strings for instance You can have a daytime object like this one like today And then you can also provide like this cool like form of the specifier over there like this thing And it will allow you to format your daytime object in line So you can select in the f-string how you want that day to be displayed Which is also cool and you can do all sorts of things for instance Here's I have a bunch of songs and then I want to display every of these songs in a simple line and then in my Stream I say okay This is my playlist and I use like this new line And then I join the new line with the list of songs and I have the songs. It's cool I do you like it because you need more coffee because you cannot do this. It's wrong Turns out that now you get this weird error apparently the error by the way is at the end about error message I should fix that So so apparently yes f-string expressions cannot have backlash why I don't know apparently they cannot And this is a problem and there are many other problems You see now you should feel fear because you have been living in a lie You think that f-strings are cool, but they are not as cool as you think and now we're here to make them cooler How cool is that? Awesome, so let's talk about the program right so okay, so f-strings are kind of cool But they have some problems so the problem is how they are basically parsing see Python It's just a way the way we implemented in the first place. So for instance, this is an f-string So how do we know it's an f-string basically how the parser know it's an f-string? Well, it's quite simple It has an f and then there is like some Python code inside those brackets That's kind of the important thing But this is how we look at the f-strings, but this is not how the language looks at f-strings The language uses this this parser technology which I'm going to show you so this is Like don't don't don't go crazy But this is like a bunch of the grammar for C Python using the peg parser that we added in Python 3.9 So this is basically telling you like how the language reads the code that you write in this case This is a rule for an atom and it's saying like an atom can be many things like for instance It can be a name like your variable whatever name you have there It can be true false no and a bunch of things and among them there is this rule for a string Right, so this is how Python reads strings right and then if you go to the rule for strings You see that a string is a bunch of string tokens So and it can be a bunch of string tokens because I don't know if you know it But did you put two strings together without anything in the middle Python joins them Which is a lovely behavior when you have a tuple and then you forget a comma. It's fantastic But yes, so that's how I how Python kind of parses this in the problem is the following The problem is that the parser when it sees an F string it emits a token called a string So it says a string But also if the parser basically receives anything there It will emit this this particular token with the contents of the string right So how the parser sees an F string so for instance He is an F string and it says well This is a string token and then we have the string token with the contents foo and the F But you know like this is like let's say it's something like this So so this is a different f-string so how the parser sees that f-string well It also says a string token is the same even if there is two f-strings there one inside the other And what about this like this is an incorrect f-string. It's not close. It's invalid It's invalid syntax bad f-string how the parser sees that also string very good very good So this is technically the tokenizer not the parser But you know like the distinction is kind of depending on who you ask this is kind of a problem There is another problem here, which is that the code that creates this Fantastic behavior. It looks like this is horrendous. It's really bad all written manually written C code So this is around like fourteen hundred lines of manually read written a string parsing code in C The technical term for this is no deal The technical term, but if you don't believe me allow me to quote Here don't al nooth. It says like parsing code in C is the root above evil Or most of people and it's true like if you try to do anything in C you may succeed Except if you try to do parsers, then you will be miserable believe me. I maintain the Python parser So so it's bad right so it's manually written code. Okay, but what is this code doing? How is this code doing because like even if Python sees those f-strings as a string tokens the f-strings work Like at the end of the day you put like a value there and it kind of works So how it works so basically the first thing the parser does is like graphs this a string thing And even the miss a string token so that is what the parser does But this is not good because that f-stream may contain code So we realize that that the string token starts with the letter f and then we pass the thing to this horrendous manually written code and this a string parser and then this produces a bunch of charge start pointers You should start fearing the moment you see that and meta data And then it parses that again to the parser because you know we identify things like okay Here is a expression in Python like one plus one and that needs to be reparse because you know one plus one is a string And now we need like one the node number like we need that So we need to tell the parser Hey we have again more code that you need to parse and then the string parser again and needs to join those chunks together into the final Structure the CPython needs to produce which is called a join string node which is a horrendous name because you know computer science And they will say well, but like you have to have this documented somewhere, right? Right Yes, we have it that is you go to the Like the Python docs there is this grammar for Epstein that is is wrong It just doesn't make any sense like for instance look at this little chart can be a coin except this is or no Like what is null like like I'm a Python programmer I don't know what is null like it is known I don't know and if you try to make sense of this actually you you stop like read like you know with a With a glass of like I don't know juice or champagne or whatever you drink in your hand This doesn't make any sense. It's wrong. It doesn't work But but it's there just to make you and there is other problems for instance This is actually not valid code currently as you saw because f strings cannot have backlashes I can explain why but it will take me a while the thing is that that part manually Like written parser code doesn't work with these backlashes because it makes it's basically difficult so we say well Let's not do it Yeah, it's difficult. So this doesn't work. This also doesn't work What is this so I'm reducing the quote right as you see I'm using double quotes there and then in the expression part I want to access this key in the dictionary and I just want to reuse the same quote This doesn't work because the strings normally open and close here So this is not f string with a dictionary inside This is a string here and then it has like some bullshit here and then some other string here So this doesn't work and look man. I'm in the rebel I don't care about this is one this to work because I'm just typing this thing Copying it from somewhere this will work right Okay, this also doesn't work like what is this? Well, it turns out that you can use this a slash and to use a unicorn So for instance, you can print like a bloodhurt sweet and this creates the unicorn character for that Haha, it has a backlash. So it doesn't work. So you can also not do that Also, you cannot do this which you could you you may say well, that's a good thing Paulo I'll say like well for you. I'm a person implementer. I like this So you cannot do this which means that I don't know you cannot horrify people. What about Halloween? What about you want to solve this in Halloween like you can so it's incorrect Okay, so we are here to solve that and you will say well, but that's horrendous Like strings will be like, you know, these simple things that the start and end and like this shouldn't Contain other strings inside like no other language does this right? It's not like this language or that language or rather the language or rather the language does it Actually that horrendous Halloween nest of the string you could do that in any of these languages I mean many more because it's a good thing. I'm trying to convince you There is all the problems like for instance error messages are bad Why are they are bad because we use this manually written code and we didn't put error messages in them So for instance, this is an incorrect string because like there's a bunch of numbers separated by nothing And the error message is like like voila. There is nothing. It's just this is what it shows Like mysteriously, there is a bunch of parentheses around why I don't know mysterious And then now there is no string and it says like in line one even if this is in line 15 So what is the problem? I don't know. It's inside an f-string It's your problem now So it's kind of bad and now in Python, you know, three Three time we have these nice new error messages and we cannot use them because that is in the perk person And this is manually written code which is bad code and we don't like it So we cannot use the new error messages, right? also In Python 3.10 we can we fix this a bit and now it shows you this thing But the problem is the same What you want to see is this right like what you want to see something like this Which is not what you get right now It's like you want to see the whole code like for instance You want to see this part when you are assigning to it because this is the line No, this is not the inside of the string and now you want to see maybe a suggestion for here here It says maybe perhaps you forgot a comma or something like that Right some of the new cool error messages that we do in Python 3.10 So to solve this we created this pep 7.01 which all these lovely people and Lisandros is also there in the in the Audience and we basically said well, it's time to kill that like horrendous C code It's always a good day when you do that and for that we did like this long document that basically specifies how we're going to just bring that manually written code and that like like Hanway be grammar into the real grammar the real grammar nice So what we are going to do is that instead of emitting like a single string token for this We're going to meet several of them So now for instance the f and the quote is going to be its own token We call this f string and start the the middle chunk between this and one of these curly braces We are going to call this f string middle It's going to be something and then the end is going to be F string and anything else is going to be its own token because we already have tokens for the curly braces the numbers and the Places right so instead of emitting one single a string token we're emitting these little ones And we are going to teach the parser to make sense of them So for instance, this is what you get before so for that string You get like a string token and this token has the whole f string bad and now you get this lovely thing So you get f string start for the start then you get a string middle for the yeah Then you get an operation for the bracket number etc. So this is much better because the parser can see this and says I like this Thank you very much the parser in the human but like, you know, you get the you get the idea And the rule is super simple you transform this rule that says a string is multiple strings Which doesn't make a lot of sense into something that looks a bit more complicated, but it's not that complicated So here is the whole grammar like, you know, I'm not going to go through it But you know, you could you could probably make sense of it in 10 minutes or something like that Even easier if I'm with you and I explain it to you But you can sit down and read it is quite simple And it completely contains anything that you expect in f strings like specifiers and the bio information and the funky like exclamation mark and all that stuff And now the only thing that you need is that now that you are emitting these tokens You need an algorithm that can make sense of what's going on Because the problem is that now before the tokenizer had a very easy life Because he finds a quote and he says well until I find the matching quote It's a string and it just keeps going it sees the matching quote and says a string here you go Give me a raise or something and now you cannot do that because now the tokenizer needs to emit these tokens right like this once over here So now the tokenizer needs to know that this is its own token and this is its own token So he needs to kind of know when to stop and for that you need like an algorithm An algorithm is very simple. So let's say we have this f string that has another f string inside And you need a stack Why you need a stack because you can have nested f strings and then you kind of you need to keep the state Around right so when you enter a new f string you need to say well I have a new kind of world here that I need to tokenize and when I'm going to finish with that f string I'm going to back to the previous one and I need to know where I left So that's why you need a stack right and then we have the stack is this this thing over the right So we started on the f string and then the tokenizer says wow an f string because it starts with the letter f And it has a quote and basically finds the quote and once he finds the quote He says well I have f string and start that was easy And then he keeps going he just keeps going and says like well, I'm passing an f string chunk And now look we put in the in the stack that the tokenizer now is in f string mode So this f string mode says I'm not parsing python right now. I'm inside an f string So I'm just reading a string. So I'm going to keep going. So he keeps going or another character I'm in an f string. So I'm not going to make sense of this It goes it goes it goes until he finds either a close quote or a curly brace here It's finding a curly brace so it stops there and says what I found is an f string middle Because I found one of these stop points and because it finds a curly brace is entering python normal mode Why because inside this curly brace? This is python here It happens to be another f string, but it could be one plus one So now he needs to say I'm parsing normal code again. So you push a new mode So you meet the token for the curly brace that is an l brace in the c tokenizer And now it goes into inside and it's just python parsing python code It finds another f string. So it says well, I have another f string So I need to like emit another of these f string starts and because I'm entering like the f string I need to push f string mode here Which it doesn't have anything because it immediately is going to go to the bracket So we pop that thing after emitting the l brace We pass the python code inside. So that's very easy. It's a number. Then it's a plus There is a number. This is the old tokenizer So nothing new here and then he finds the closing bracket and he says well, I finished with my expression So I need to pop this thing from the stack because I need to go back to f string mode So it meets the l brace and pops that thing from the stack He knows that it's at the end of the f string Again, he finds the the closing quote and it emits an f string end because that is matching This code over here in the stack and then it goes to the next bracket It says that it's a bracket and again finds the closing bracket and that's it and it pops everything And at the end you know that you succeeded because you're in normal mode again Which is where you started which is you know, maybe I made it sound very easy But this is like a horrendous sequel again, but it's now You know under the rack like nobody sees it because he's in the tokenizer, right? So it's fine now everybody's happy So this was basically the the c tokenizer changes, which is fantastic But turns out that that is what allows you to write f strings like you can now write f strings But python has more than one tokenizer. It has two tokenizer because why not? And the other tokenizer is the tokenizer module that you can use to tokenize your code and like linters use it Like flake it and all these things and maybe black even I don't know So so this other tokenizer is around there and it also needed changes and Marta here is going to talk about how we change the python tokenizer, which you're going to find that is funnier So this is the result of what publish has shown on this tokenizer and This here is the result of applying the same code back to the python tokenizer So the difference here is that instead of having like One token for each different operator. They are all grouped into uh This is harder than it looks They are all grouped and there are operator um token here But there are also more differences So for example imagine this code This is how it would be parsed by the c tokenizer and This is how it would be parsed by the python tokenizer. So you can see that here There is a comment And this is important here because I mean the interpreter doesn't care about comments, but a linter or Code highlighting tool does care about comments So I mean there are more differences, but we are short in time. So I won't explain those the thing is that okay, so how How did we implement all the changes that paulo told Into this python tokenizer module? I mean it was already built and This was my second contribution to see python. So paulo promised that it would be like a minute or two um well This was the code that I was supposed to modify in a minute or two So it is like a more than 200 lines of python code and strings are handled here in three different places And it uses like a ton of regular expressions that were written like a decade ago or so. So Yeah I to me the best idea was to name this as a black box. So Okay, I have this code that I Prefer not to modify So instead I do a different function that calls this And then post processes each string that it returns So if it is an f string it applies the sub tokens that paulo just shown And so let's see an example. So I have this input and This function says this is a string. So okay, I'm gonna post process it um So first I check this is an f string or not since it is an f string I enter this function and then I do recursive calls Like each Level in the tree is a recursive call. So e cpc implementation um But it had a problem like This is one of the promises that This pep promised So you could reuse the same quotes the thing is that these tokenizer return to String tokens and post processing this was hard And we also were like very tight on a schedule when this issue came out Like the release was two weeks ahead And we knew that we had to make Defects like we couldn't keep trying and iterating until we got something working. So We had to do something that worked um So I told pablo. Hey, um There's already this c tokenizer working. Why don't we use it and pablo said well I mean it's not a bad idea, but I mean It's been planned for some time, but it's a hard problem and I was like, okay. Why is it a hard problem? So he probably was like, yeah, there are some differences like the comments the grouped tokens and stuff So he told me everything. I was like, okay. And what was the hard problem you talk about? So in the end we decided, okay, let's go ahead and use this c tokenizer to implement this and So this slide tries to summarize what we did So there's the c tokenizer then this there's like an intermediate layer implemented in c that adapts The output of the c tokenizer to the python one so We were very happy because all the tests passed And we believed like It was done. Okay. So the test pass The public api of the tokenizer module Is the same so no one should notice about these changes Well, the very next day we merged this on the main branch. We already got a bunch of bugs so I'm gonna comment some of these bugs. So the first bug that we got Was about parsing invalid code. So One of the things that change is the way Errors are handled. So right now if you try to tokenize this The current tokenizer will raise a syntax error, but the previous one Just return like an error token And it allows you to parse everything And when it is incorrect, it just returns the token and that's it. But now it raises An exception. So we There was like a big discussion here and in the end we just The conclusion was that the set of invalid python code is way too big to actually support it Another thing here that another bug this one was that we were adding a new line Character at the end of every tokenize code Even if it wasn't And here another bug we were raising a syntax error when you try to Tokenize a code that makes spaces and tabs and you will say, okay, but this is invalid Yes, but it broke Pilinged no, no, it is not piling this Pycode style. Sorry. It broke one of the checks in Pycode style And this is like the craziest one for me This is some code extracted from Pycode style as well So We broke Pycode style because we changed the way The buffer that you passed to the tokenizer module was being consumed like we consumed everything and tokenize with the string that we got and Pycode style was depending on This buffer being consumed lined by line So Yeah, we didn't realize that python is like one of the most popular programming languages that there are millions of python programmers and You know changes like this that go to the very root of the language affect Way more things that we first thought like we were just like, yeah test pass. No one will notice. Well Um, I don't know if you know this. This is something called items law So the items law says that with a sufficient number of users in an api Any observable behavior will be Uh, dependent on by someone. So for example here There is like a Imaginary software Where they fixed a bug that overheated the cpu and someone is like, no, but I was using the cpu overheating to do something Please Take it back Um And so after all these discussions we added this warning to the documentation to say like Hey, we are not supporting invalid code So questions Thank you very much, uh, I don't we still have a couple of minutes So maybe if someone wants a question up here in the front, please Thank you for making f strings more powerful Do we have to be afraid of f string injection like a squirrel injection? No, more than before Do we have any other? Ah there again, please Yeah, so I'm interested I'm interested to third to know if this will affect, uh The syntax highlighters in like a meaningful way because the currently like Syntax highlighters like had a problem when f strings were introduced So like will this affect that again? So so this was discussed in the pub actually, um, so yes It affects syntax highlighters because if they were using regular expressions to identify string chunks now They cannot they need to use a parser, but turns out that it's not really a problem because most syntax highlighter Support multiple languages and they already need to support this kind of uh syntax Like nested f strings or nested parentheses because these all the languages that you saw like javascript or ruby or all those things Supported so for instance the syntax highlighter on vs code or by charm or any of these things Well, maybe by charm not but like any any general purpose syntax highlighter will need to support this So the one in beam or the one in e max or things like that Pigments I think already supports this in some way because they they already have the technology even before that And they were kind of restricting it for the way f strings used to work But in general, yes, so you have very single simple syntax highlighter version regular expressions. You will need to change it. Yes Yeah, got it. So we have a question in the back Thanks for the call Is it is it me or mark? No, no in the back. I just saw mark and I sorry, sorry Now I have fear. Yeah, um, when when you say that you triggered some bugs when you fix when you change the way Things work internally. Do you look at it as like your fault with air quotes or is it Then on the other people to just Adept to the fact that you have to change the internals. How do you usually? That's a fantastic question It turns out that when you're a c python core developer, uh, everybody will think it's your fault. Um, so Even if it's not, uh, so it's interesting We try to look for instance in this case Even if we don't really need to solve most of the bugs that marta was speaking about we actually solve most of them actually Lisandro's make this cullian effort to fix ipython because ipython was one of the packages that were using the tokenizer to parse in valid code Um, so so we actually did the effort because you know, we care that when three by three 12 comes out Nobody suffers the consequences even if it's not our fault. So Yes, we could define things in terms of is our fault or is your fault But like I don't think it's a Interesting discussion the discussion is like how are you going to make this thing work? Right and sometimes we need to go a bit the extra mile Sometimes it's impossible for us to do it and they need to do some changes. So it's this kind of balance Um, and convince people to be nice to us on the internet Thanks, that makes sense. All right Okay, we still have like one minute and a half So so having made all that work for yourself to redo the parser Are you planning to do something with similar with the tokenizer and have that generated? Yes. Um, I have The thing is that I know that um, you probably are asking this because you already did uh in the past And it's a nice idea. Actually like the the idea to to have this as a stack was uh, nicely borrowed from your idea But yes, we have that idea. So the the problem is that the the tokenizer Of python is keeping a lot of state And it has a lot more things that just tokenize And the problem is that automatically generating the tokenizer is a bit more challenging Just because you need to keep this nice behavior Like for instance, one of the things that we have around that by the way we should get rid of Is that you can ask python to treat to treat async and await again as soft keywords So you can assign to async and to await like there is a way a secret way to do that I'm going to tell you how but but the tokenizer has code that can parse async and await as its own thing So you can say async equal three and it works again as it used to do in three seven or three six And that code is still around and I don't want to automatically generate that madness So first we need to do a like a cleaning pass to get rid of all these stupid things And one that is gone then we can automatically generate the tokenizer probably as a state machine Which is a beautiful construct Or probably a state machines or state machines or something like that. Yeah, why not? But yes, there is there is plans to do that Okay, we have almost no time. So last two questions. Maybe really fast Yeah, you mentioned that the stoke the stokenizer has a mode and the mode are The modes are put on a stack. What's the maximum stack depth? So how many f streets? Beautiful question. I am very happy that you are asking this So I'm very happy to report that you can compile python as many f street nested as you want For your particular enjoyment The the full one I think I don't know. I think it's five No, sorry. So, okay The pep specifies that any language implementation that supports the pep needs to at least give you five And then the maximum limit is up to the implementation and I think it's see python I don't know how we give but like we give five or ten But the name can be changed. It's just The call is generic and there is a number that says how many just we keep yes, it's a compile time because we keep the stack Static it's not dynamic. Thank you. Okay. Last question Magda, which quotes are better single quotes or double quotes? Good. Thank you very much