 Yeah, thank you very much for that introduction. So you've made it. We're nearly at the end of Europe. I'm so glad that there are still people here I noticed some people going off to the airport already. So I'm really happy that you're all here So thank you very much for joining my talk. I won't do an anecdote. I think you've heard enough during the conference I'll just dive into my talk. I'll first briefly introduce myself. Who am I and then we're going to talk about the actual substance So my name is Sebastian. I'm from the Netherlands. I live near the Hague I love it there and I work for a company called Ordina and more specifically the Ordina Python ears Which is a smaller practice within our company that really focuses on Python development I'm also one of the code Smith's which means that I get some time for innovation for conferences being here But also just to explore Python with all the other Python developers at my company And at other companies as well just to have a little bit of fun with Python I think that's really important to have fun with Python to really Dive into it and make it your passion at least for me. That's really what I love in my spare time I'm one of the volunteers for Europe Python here this year. I was the finnate lead for the financial aid program And I also did some session sharing was really interesting I'm really happy that we've made it so far Into the conference and I'm also one of the founders of Python discord Which is a larger online Python community with a lot of teenagers But also other folks who are trying to learn Python discuss Python and do other stuff with Python So definitely check it out if you haven't heard about it. Anyway, that's enough about me. Let's talk about Python So what are we going to do today? Basically, we're going to make a journey a journey all the way from the source code of Python to the execution And for me, it's kind of like writing magic spells I used to play games like D&D and stuff like that and with programming languages You get to write something and then something happens often. It's not what I want, but at least something happens So and that's what we're going to see today. I'm going to take you on a journey I'm going to show you how Python gets from the source code all the way to the magic all the way to the execution I'm not just going to give you a lecture about that. I think that would be rather boring But I'm going to show you how to implement a new operator in Python And that will naturally take us all the way from that source code to that execution However, do note. I have a series disclaimer for this talk We have time constraints, so I'm going to omit things. There will be blatant emissions. There will be gross oversimplifications I will take shortcuts. The implementation will not be ideal But hopefully it will give you an overview of what happens within in the python internals all right If you do like to know more details if you're interested after this talk Definitely check out the python developers guide on devguide.python.org It contains a lot of information about how you can mess with the python internals how to compile it But it also has excellent explanations about the parser grammar and everything else And then obviously there's anthony shull's book see python internals published by real python It's a great book. Definitely check it out if you're interested in this All right So what is our journey today? Well, we're going to go from source code to execution Basically in two parts in the first part We're going to look at the tokenizer and the parser to create something called an abstract syntax tree And then in the second part, we're going to take that abstract syntax tree and go all the way to the magic Using the compiler and the evaluation loop By the way, if you want to check out the slides or the source code for the the version of python that we're creating today Check out the github repository. Everything's there a couple of versions of the slides one version of the source code But do note it's an educational implementation Don't use it in production. I'm not maintaining it. It's a crappy implementation. Please don't use it for anything serious All right, what is a pipe operator? Well, a pipe operator is something that's part of a lot of languages but not python And it's basically another way of calling functions So say you have a function that takes one argument like this double over here It just takes a number and it multiplies the number by two and it returns the value Well, the pipe operator allows you to call this function in a different way You provide the value on the right hand side of the operator Then you use the operator itself Then you name the function and then the argument will be passed into the function and you get the result out of it But this is obviously not really interesting. It's just like calling the function double one But what you can do with this is you can build pipelines So you can start with a single value Then you can pipe it into various functions to process it and you get something out the other end So in this case this would be equivalent to calling double One and then insert that into another double call the nested function call So this is what we're going to do today Well, just to be clear. This is not a part of python. I don't think it will ever be a part of python There were a few proposals for this they were rejected for good reasons Maybe in the future. Who knows ask a core developer, not me And the implementation is purely educational. I've already mentioned that So the first part we want to get from the source code So the characters, uh, on the right hand side all the way to that abstract syntax tree that you see there So this tree like representation of your source code So if you have a single python file with this line in it only this line You'll get something like you see there that tree representation of your source code You have a module a single expression It's a binary operation an operation with two operands and value to the right hand side and a value to the left hand side And that binary operation has a constant a 10 It has the operator itself the call pipe and then it loads the name the name of the function Obviously this won't work in a file in its own because there is no function here But this is the general idea of what we're trying to do in the first part We're trying to get from that part to that nice tree over there So first look at our source code We humans we are very good in reading We recognize patterns We immediately see that we have a 10 that we have a name double that we have a weird operator in the middle That is unfamiliar in python, but at least we immediately recognize the bits But if you think about it for python at the start This is just a stream of individual characters There are 12 characters here and they make up our source code and python has to understand those characters And the first step python takes Is basically extracting the tokens from this source code and tokens are the minimal parts that still have a meaning So for instance, we have a number token over here If you split this up even further into a one and a zero you lose the meaning of the number 10 We have a name over here. Obviously if you remove a character, it's no longer the same name So this is a name token And the first thing that we need to implement is that python will recognize this Token here in the middle this operator this two character operator as one single token This isn't very difficult to do This is just a configuration in a file if you look in the python repository There's a file grammar slash tokens And it's just a big mapping of token names to token character sequences So all that we have to do is we have to add our own token into this file And there's no meaning here yet. This is just a description of the token So I've called it v bar greater fill the vertical bar greater and I've mapped it to the token sequence that we want to introduce This isn't All of it what what we still need to do is regenerate the tokenizer itself the actual A piece of python that will get this tokens from your source code Because python will not read this file every time it tries to parse source code For that we just have to run some commands And this will be a recurring theme in this presentation change something run a command and then you can see the result So these are the commands I'm not going to go into detail in all those in all these commands because you can find them on the internet And you're not going to remember them here anyway But after we've done this python will now be able to recognize our new token our A v bar greater token And that's already a big step But it doesn't really know what to do with that token So there's no grammar rule that tells python if you see this token then this has to happen So the next thing that we will need to do is we need to add support for this new token in python's parser by adding it to python's grammar Well, the grammar also has to produce something which is the abstract syntax tree that we saw earlier So we also have to tell the grammar how to Generate the abstract syntax tree and make sure that there's something for our code pipe available in that abstract syntax tree So let's look at our grammar yet So since python three points. Oh, I forgot three point nine The new pack parser has been introduced and this has a completely new grammar. It's very flexible It's very powerful But we first have to look into the syntax of this grammar just a little bit to add our own rule So i'm just going to take a very small dive into the parsing expression grammar that A python uses just enough to know to define our new rule for the call pipe operator So if you imagine a very simple programming language And it only has two expression types. It has a sum And it has something called an atom And in this very simple language, there are only two grammar rules The sum and the atom there might be some other for statements and stuff like that But we we're going to ignore that for now. We're going to focus on these two Then this is basically what you can define in pack parser. This is a little bit simplified But these could be grammar rules and just to Call our codem grammar rules can reference each other And this is how python goes down all the grammar rules to see what it matches So it's first try starts by trying to match the sum rule And within the sum rule there are options that reference the atom rule, which is just a number And that is how we ever consider the atom rule So if we just have this single piece of Source code, it's just a number it will first try to to match the sum rule. It has two alternatives That's what the vertical bars mean It has an atom plus an atom and it has an atom on its own So this matches the second alternative in the sum rule, which is an atom Then we look at the atom because it's just a number so we can parse this So in this very simple grammar example, we can parse a number on its own How what about this one on this one? We have an atom a number plus and then another atom which matches the first rule of the sum So it's very easy to see that this grammar rule will now be able to match this simple piece of grammar as well But now what about this one? Think about it for a moment Are our grammar rules able to parse this very simple statement? I see some people shaking. Yes, others shaking. No Well, the problem here is that when you start parsing your expression You will consume these parts. We can Parse this very easily and what we're then left with is a plus and a three And we have no rule that matches a plus and a three on its own So how are we going to match something with two pluses? Obviously, we can add another rule But what if we want an atom plus an atom plus an atom plus another atom? Do we have to add another rule if you want four pluses or five or six or if we want an infinite pluses We'd be busy quite a long time who wanted to add alternatives for all those different scenarios But there's a very simple solution for that And you're probably going to love this. It's just recursion So what we can do is we can change the first Alternative in the sum rule. We can make it reference itself And now we can have sums that are embedded in other sums embedded in other sums And just remember a sum can also just be an atom on its own So we can still match an atom plus an atom But now we can also match A sum contained in another sum And this is basically all that you need to do to get an arbitrary number of operators in a row And if you think about it, this is precisely what we need for our new pipe operator We also want to be able to build an arbitrary long pipeline So the grammar rule that we will need to add is a grammar rule that has such a recursive relationship So let's see that This is the existing grammar file in python grammar slash python dot gram Here you see the shift expression if you want to do bit shifting and you see the sum expression And i'm just going to insert our new grammar rule between those two. So let's make some space Um, this is probably isn't the best place to insert it, but it means that it's very easy to insert it So that's why i've chosen it So here we're going to add a new rule. Let's call it pipe And as you can see here, this is just our recursive relationship There's just one problem our grammar rule isn't Reference by any other grammar rule in python So the parser will never consider it when parsing source code So what do we have to do? Well, if you look at the at the shift expression It actually references the sum expression. That's how the grammar rules flow down So the only thing that we need to do to insert it We have to change the references in the shift expression to the pipe one and in the pipe Can then reference to someone. So if we change this The shift expression now references the pipe one the pipe to someone and now our grammar rules can flow down again And this is basically the only thing that we need to do to add our new grammar rule for our new pipe call operator So now we can parse this right. We're done Well, not quite But because we also still need to be able to create the the tree structure that you see there on the right We have to be able to fit it into our abstract syntax tree. So how are we going to do that? In the old parser there used to be an intermediate step the concrete syntax tree But with the new pack parser, we don't need that anymore And the reason why is that because we now have something called grammar actions If you look here, this is a grammar action Between the curly braces and because we're targeting c python This is basically just a piece of c code that is embedded into the grammar file So whenever python matches a sum rule It will then call this c function to create the classes that we see there to create the tree Tree structure that we see there well to create a binary operation tree structure We need the right-hand side of the operator. So the value we need we need the left-hand side and we need the operator So that's the information that we're going to pass into this function And that we can do by assigning names to the parts that we match in our expression So we match an a to the right-hand side pass it into the function We match a b to the left-hand side pass it into the function We know that this is an ad operation because this is the sum rule So we can hard code the ad operator and there are some extra bits having to do with line number and stuff like that We're going to ignore that for now, but they're very handy for tracebacks and other kinds of interesting things So this is all we need to create a binary operation with an ad operator Can we now do that for our own rule? Obviously, we can just copy the approach Because our new operator is also a binary operation So we can just make it call the same function But instead of using the ad operator we can use the call pipe operator here Just one tiny problem The AST uses classes and there is no class called call pipe yet It just doesn't exist yet So we have to create that so that we can actually build this abstract syntax tree with that call pipe node Somewhere in the middle for the operator field So how do we do that? Is it difficult? Do we have to code a lot? Luckily not because this is another configuration file This is in parser slash python dot asdl which stands for abstract syntax definition language And as you can see here somewhere in the middle, there's a part for the operator And here's our ad option just in an option list And this is all you need to do to have a generator create classes for you So what do we have to do? Just add our new operator at the end as another option Then we regenerate the AST we run another command and now our classes will be created for us So this is all we have to do to add support in the AST Now we can regenerate the entire parser And now we are actually able to parse expressions with the call pipe operator in it You can see here that we have a binary bin of nodes somewhere in our AST tree And it actually uses a call pipe class object in there to represent the operator So this is all that we need to do for the new grammar And now we're done now we can go from source code to an abstract syntax tree But this is all fairly static nothing happens yet So to do that we need to move on to part two where we actually going to Where we actually going to transform this into something we can run and then actually execute it So in part two, we will look at the compiler So if you've ever joined an online discussion about is python a compiled language or not, please don't they're all very toxic But we are going to compile this into some form of an intermediate language We're going to compile this into a little bit of bytecode And to do that we need to have a bytecode for our new operator because bytecode is just a long list of instructions for python A long sequence of bytes a long sequence of numbers And each instruction that we're going to execute has its own number or its own byte and we need one for our new operation And obviously there is already a bytecode for calling functions But i'm going to ignore that one because it isn't fun to use what's already in python So we're going to add support to the compiler by creating our own instruction Then we're going to make the compiler actually use that instruction write it into the bytecode And then we're going to add support in the evaluation loop to actually do something with that So let's do that. This is the only python file that we'll see today We can use it to define our operation codes or opcodes Our opcode doesn't have an argument Don't worry about that for now, but it means that it has to have a number lower than the have argument constant So i'm just going to add it here. I'm going to call it binary pipe call That's for us so that we can understand it and the bytecode and the number that we associate with the operation is 90 And i have to increase all the numbers below it and that's a tedious job, but we'll have to do it Now we can regenerate all the opcodes. Do you see the pattern yet? And now the opcode will actually be generated for us Right now we have an opcode now. We still need to make the compiler actually write that opcode into the bytecode So how are we going to do that? Well, the compiler is just going to visit all the nodes in our AST and it has functions that handle that visit And somewhere in our compiler, there's a function called compiler visit Expere one expression one which eventually gets called to handle expressions We don't actually have to change it, but there's an important fact here If you look at this function, it's just a massive switch case operations And for each type of expression including the binary be bin up kind There's a special case and here's our case We first are going to visit the left hand side of the operator, which is really important because we need that value Write all the instructions for it. Then we're going to visit the right hand side write all the instructions to Evaluate the right hand side and then we're going to add the operation for the operator And to know which opcode to write there's a helper function bin up and this is the function that we actually need to change This is another switch case statement This will get an AST node like this add here and it will return the Bin up that we actually want to execute or add to the bytecode So we need to add another case Write our binary pipe call operator and now we can write it into the bytecode There's one final thing that we need to change in the compiler. It's the stack effect python uses a value stack You'll see that later And the effect of our operation is that our value stack decreases by one value, but don't worry about that for now And this is the compiler and now we can go from an AST To a long list of instructions and this is all we need to write our new operator our new opcode into the bytecode And now we get to the evaluation loop and this is really where the magic happens The evaluation loop like the name says is just one giant loop that goes around and around and around and around Executing all the instructions in the bytecode And inside of that loop. There is a massive switch case statement, and I'm not Exaggerating it's really massive. Look at the source code in python slash c eval dot c And for each opcode it has a case. So for instance here is the binary subtract And this is the code that actually gets executed whenever it sees such an opcode for a binary subtract Something minus something else But here we have a problem Because how do we get the values that were to the right and to the left of the operator? How do we get them? Back to actually process them here. Well, this is why python uses a value stack And from that value stack we can get the values that we need to perform the operation Now, how does that work? Say that we have this simple expression 4 minus 1 What we saw earlier is that python will first write the instructions to evaluate the left hand side when it's done It will put that value onto the value stack To remember it then it will evaluate the right hand side And it will put it onto the value stack and the value stack is really a stack It will put a 3 on top of the 4 on the value stack just so that we're able to use them later So when we enter this new function for the operator The 3 and the 4 are on our value stack So now that we can use the pop macro just to get the 3 out of the value stack and make right point to that value For left we do something similar. We leave it on the value stack, but we make left refer to that value That's there on the value stack Now we have our values We can call a capi function pi number subtract Pass in the two values. We get a result out of it. The result will now point to the one We are now done with those values. We can decrease the reference counts for the values And then at the end we need to do something with the result with the resulting value So what do we do? We just put it back on the value stack We're going to replace the value that was already there So after performing the operation, there's one less value on the value stack And that was the minus one that we saw earlier Then there's some error handling and then we do a dispatch to say to the evaluation. Please move on to the next value So how do we do that for our binary pipe call? Well, we just copy paste this code because this does basically what we need We change the targets to our binary pipe call operator We get the values like we did before but now we don't want to subtract them We want to call a function with a value There's a very handy C API function in python 3.9. It's not in the the long-term stable one So it might be deprecated, but it's very handy here and here you can call a function with one argument And remember the function was on the right hand side. So we Pass in function right first the value was on the left hand side So we pass it in second we get the result decrease the reference counts and we Put the resulting value back on the values a value stack And this is all we need to do to get from the ast to the magic And now we've completed our journey We can go all the way from source code to execution and all the steps that are in between There's just one thing left to do Let's compile our new version of python And if you're waiting for it, you can you might as well just have a sort fight And after that's done you can run your new version of python and you can use your operator. It will actually Work if you follow these steps. There are some quirks, especially in operator precedence with this implementation, but it will work in principle And basically that's it. We've seen a lot of python internals In a short time. Don't worry about it. This is just for a framework. Check out the books the dev guide Source code and slides are available. And if you get weird errors try running may clean or clean all on windows And look into something called the magic number because we've changed the bytecode version So we need to change the bytecode version in c python as well Um before we go, I hope you've been you've enjoyed your python. It's been great for me But we also need to organize the next edition in uh 2023 I don't know when or where it's going to be But if you want to be a volunteer just like me Join us to help organize python 23 you can send an email there or you can just talk to me after my talk And that's it. I hope you've enjoyed my talk. Thank you very much Thank you very much. Thank you Well, wow, wow I expect the only we we're gonna see a lot of forks of python nowadays So do we have questions? So when you'll be ready just queue up and we have we can okay go on Just something silly in the beginning when you created the tokenizer You had a symbol there. Did you use that afterwards? I don't remember Did the symbol the the the vertical bar and the yeah the v bar Greater well, we don't use the name that that we can use it to tokenize our output But we haven't used the name in the grammar. Um, so we haven't used the name But the token sequence is obviously our new operator So that's the actual operator that we use So so why did we give it a name? Um, because we had mostly for us humans to make it readable So, uh, yeah Another one, please. Thank you for the talk. I really enjoyed it I have like stupid question like every time I'm here that like the bytecode evolution is just a loop looking at the top of the stack It seems so like limiting like you have just like one value you can look at How does it happen that it like us like normal performance like it doesn't It's very interesting. So there are obviously a lot of optimizations in there But basically this is just kind of like a virtual machine You have instructions you have a value stack that you keep track of some kind of register And and that's really powerful. You can really do a lot with it So maybe in the advent of code, which is a online puzzle I think in the 2019 edition you will actually build your own Kind of like inscode computer using your own kind of bytecode and targets So I recommend you to check that out and you can really see how powerful it can be so, uh I i'm not quite sure how Uh performance you can get it but uh Have fun. Yeah, thanks So, um, I don't see anyone else keen. I have a question though. Yeah, so One minute to describe what's the magic number the magic number So, uh, I actually had a a file about this. So whenever python, uh, compiles your code It will write to a pi c file. You've probably seen it somewhere and that bytecode is versioned It has all the instruction numbers in there But if you're going to change the instruction numbers and obviously all the old bytecodes They're no longer valid. They use the old numbers. So there's a kind of a versioning in python It's called the magic number and it will version that bytecode You can actually change that magic number if you change the the bytecode opcodes So that all the old pi c files will be marked as How do you say that stale and it will recompile all your python? So just to avoid very weird things happening with old bytecodes with changed bytecodes It gets a mess. So that's why that's the magic number Yeah, thanks Well, thank you. And this is all the time we have so, uh, please Uh, give a round of applause now. Thank you very much