 So hello everybody. So I will Introduce me quickly because we don't have much time to write a jit compiler. So let's start Very soon. So I am Antonio Cooney. I am a pi pi core developer since 2006 I'd also do some other open-source stuff So that the scope of this of this talk is to show that jit compilers are not so hard as they as you could think at the first glance and So we are going to write a very very simple jit compiler for x86 64 bit We are going to compile a subset of Python And we are going to do some assumption to make it easier like all the variables will be of type float and And we won't Compile all the instruction at all just a very simple subset, but it's enough to show how things work We are using an library to encode assembly instruction because it's well for Intel architecture It's a hard task and and that's it and the disclaimer is that before writing this talk I never wrote any assemble code before so it is really shows that it is kind of easy enough if you know more or less what to expect and So since I didn't know much about assembler What I did was to like compile C programs Compile with that GCC look at the assemble generated and basically do copy and paste. So for example you start with This very simple function to add the two double numbers and to return them. So what I did was something like this and Then with op-dump Okay, so these are basically the Device that you need to put in memory in order to execute this logic and then the next thing you want to Try is well how to How to how to execute something like this from Python? so Basically, these is these are the bytes that I showed you before these are it's really a copy and paste of the object dump thing and It corresponds to this assembly instruction So what you need to do is to allocate some memory you need to do a look at it with a map because you need to say to Tell the kernel that you want to execute this part of memory. You put the code inside a buffer and Then you use for example CFFI to tell Python that this particular piece of memory contains a function which Which takes to double and return a double for example, so let's see if it works So you see that here we have Our buff which is a buffer so we can with CFFI We can get a pointer to this memory We can cast to a function pointer and we get a CFFI function pointer and when we can call it and it works and that okay This is the second time I do this talk and when the first time I got a second fault at this point, so I'm improving so back To the code okay, so basically this is the basic idea So we need to allocate the memory and fill it with the right instructions and execute it And that's how how you write a jit compiler, but then of course it's much more complicated than this So let's start from tests so I want to test that it is Going to work. So I'm going to write a Compiled function class which wraps all of these So let's see by the tensed tensed jit What is it? Okay, so Okay, I don't have this class. Let's write it Okay, I'm going to cheat a bit because I didn't have enough time to Well type everything live so what we want to do is something like this. I think so we allocate the buffer We copy the code inside We cast it to the Right Pointer type and when you call the compiler function, well, you just call the The CFI 5 function and let's see if it works And it doesn't right? Okay, better. So we are running some code and it's working so the next thing we need to do for writing a jit is to Handle register so the our the modern CPUs have many register And when you run your compiler the code, you need to decide which values goes in which register So we are going to write a very simple register allocator in which you In which you assign a variable to register and it's so simple that basically we just assigned them once in a modern real jit compilers have to Have to Decide when to use a register when to reuse for something different where to put a value of register in some In some temporary location in memory and etc. But since this is very simple code. Well, we will just Decide that we use as many register as One is for each variable and then if you use too many variables, then we just give up So so for example, we we get that we maintain a dictionary Which maps a variable to register we use this default to be tricked so that we don't have to maintain it by ourselves and if we don't we no longer have Register then we just raise not implemented error because it's simple in there in a real code Then you need to handle this case and to save the value of a register to some location in memory and And then the older complicated logic and then to get to the value of register You just return it From this dictionary. So when the register is not in a dictionary it goes to the defaulted logic allocate one and etc So let's see if our test Work you see We are checking that the variable number a gets the first register and the second and the third and etc So Okay, like coding Okay, so we are slowly getting to some point In which we can actually compile things because now we have a compiler function We take care of handing the low-level stuff register locator with which we will use later And then let's try to write a real AST compiler. So I'm not going to parse the Python code because it's Complex and there is no time. So I'm abusing the AST compiler and basically with AST you take Python source code and you get a tree which represents your instruction. I will show you some picture and And Okay, and and then I what what I want to do is to have a class in which I can pass some Python code Compile it and get a function that then then I can call and I want to check that the empty function just returns zero in my in my Toy language you have to return a number. So if you don't return anything it will return zero So of course if we test it We have an error because this class doesn't exist So let's cheat a cheat some more So I'm using the visitor pattern. I will show you in a while So basically the idea is that this class takes the source code. It's parsed with the AST module of Python and when I compile it I Do some magic to generate the The assembler code as by it to put them in the buffer and then I can Return the compiler function. I just wrote and the visitor pattern. Okay, I will show you in a In a while, but basically it means that from this tree that I will show you very soon I I will call a method of this class for every Node of the tree. So if I test it, I see that like I have this not implemented error For the module method, so let's write it. Okay, so now I can use self.show to show The AST tree and it should appear there. Yes So this is the tree representation of our source code So you see that you have a module which contains a function def Which is named foo. It doesn't have any argument and it contains a statement and the statement is pass Okay, so the visitor pattern I am using here Basically, you say that okay, we have a module and then the module node the module node as An attribute which is body which contains all the statements So what I want to do is for child in know the body self-doped visit Child and this is going to recursively visit all the nodes and the idea is that when you visit a node you emit December code for the statement that you are Your your Visiting now so now I expect it if I run it I will get an error because function def doesn't exist and Indeed so let's implement function def function def self node So the arg name names of the function are something like arg dot arg for Something in this so I new fun I Create a new function. I will show you the implementation of these in a while and then the important thing is again to recurs into the children of this guy and to visit them and Since we need to return something because in December we can't return known we return zero by default and We return it using the as s s asm object, which I will create in shortly so the assembly usual way of Returning zero is to sort a value with itself so We sort the xmm zero register with itself and we return it and then of course now we need the new funk Function and what it what what it does is to create it asm Thing which is the function assemble, which is what something that they didn't show but it's basically it It makes it's possible to just write nice Mnemonics as method instead of having to put the bytes in the buffer. It is the pitch by library I was Telling you about then We need to allocate to do Create our allocator then we need to allocate a register for every argument So these allocator register for every argument and the calling convention says that the first argument is in the Xmm register number zero then the first in the xmm one and etc So since our register locator returns them in the right order if we Assign the first register the first argument we we already ensured that the arguments the value of the arguments are in the correct register and Since I'm going to use to temporary register During during the compilation. Well, I just allocate them See that it these are like having To temporary variable. Hope hopefully that source code will not use in these names if they do well You get problem. So of course in a real-world example, you need to handle this case So let's see if we are improving and Yes, now we just need to implement the past statement, which is the leaf of our body and Let's implement it by Well, we don't need to emit any code for the past statement, of course And the and success so we manage with by writing these methods and following the visitor pattern to To pass this this very first test which took a while of Putting things all together, but now we have a framework in which we can easily implement all the all the other Feature of our G. So the second test we want is to return a simple constant So let's Let's see what happens Okay, I need to implement the return statement so that No, I will I want to show you the AST again So it is the AST I have now and you see I have the return statement which takes the num expression We take the constant 100 so what I need to do Well is to Is to write methods for all these node subclasses So the idea for evaluating an expression is to use the stock. So whenever you have A tree of expression with binary expression Operands and etc. What you what you want to do is to evaluate the two children put them on the stock Then pop the two value of this from the stock and compute the value this way you can handle easily deeply recursive Data structure like binary expression and expression in general So what I want to do for the return statement is that I want to visit so the No, the dot value So these is going to put the value 100 on top of the stock So then I want to pop the value of the stock inside The register XMM zero which is the register which is used to return a value from a function and then I Want to emit the return assembly Instruction in assembler and these of course it's going to fail now because I need to implement the num Thing and then when I visit a num node and what I want to do is to put a number in On the stock so I need to use the move as the instruction and I want to Load the value inside the temporary register zero and I want it to be a constant value and the value is no dot and so in my example No dot n is 100 and then once I have the value register in the register. I can push it on the stock So you see what happens so by by using visit I'm walking down three and then when I emit the code for the leaves. I put value on the stock the stock at the stock is at runtime and then visiting the The nodes up You pop value of the stock you compute you put it on the stock again and etc. And by doing these At the end you have Well, you're the result you you are supposed to have so let's see if it works Okay, push D push SD think it is and it works and so so we need basically to To Continue so I have more tests so I want to handle arguments So the intense case in this case we pass two arguments Which are going to be in a register XMM zero and one as I said and I want to return one of the two so If we execute we don't have the name So let's let's see what up what's happening the Name So this is our tree now So you see the function then contains the arguments these arguments are going to be put in the register by the register Okay, sir, and then I want to return an expression, which is a name So I need to write the logic. I need to write the logic to load the name from the register and To Well to put it on the stack so which register well, of course, I we have a register allocator so the register allocator takes care of Giving me the correct register from the name of the variable which is not a dot id and then I can push SD right and T should be enough to pass the test because now we are doing the same as before but instead of having a num We have a name which puts the right value on the stock And it works I'm impressed that it works. I'm not doing any dummy mistake So let's try something more complex now things are getting interesting So and so we want to handle operation now So how how does it look like so we need a bin op node So that's Let's see how it looks like This is the tree so you see that now you can start to see the recursive structure of the tree So you have function def which has our term statement the return statement as a bin op expression Which is odd and the two argument or two operands to the expression are two names a and b So we need to so the idea is that to implement a binary operation. We want to visit the left children To evaluate it and to put it on top of the stock then to visit the right To put it on top of the stock then we push the two values from the stock. We add them together and we Push the value again on the stock. So in this way if you have a complex expression with many binary operation and parentheses and etc Well by doing this push and pop you get the right precedent precedence of operands Oh, there's a temptation fold and managed to get except for you So for example in this case the open name So the odd that you think you see in the in the picture is It is basically is the client the name of the class of The op attribute of the node. So what I want to do is to well for now I just I'm just implementing the odd for the addition. So what we want to do is to visit the left children to visit The right children and then I want to pop the right Value on the temporary register one. I want to pop the second value. I want to add them together So they add SD assembly instruction is basically a plus equal So it's going to to sum temp zero and the temp one and to put the result in temp zero So now I want to put the result of these in The result of these on the stock, right? So if everything is okay, it should work Okay SD and it works. So next test Okay, I want to check that I can use more instruction and not only The odd so the assertion is is failing because well, the operation is no longer the odd So what we want to do is something like this. Okay, let's see if I can copy and paste to Speed up things something like this So for every for each instruction, I I get a new operation So what I can do is something like ops Open name and then call it and see see how it's working so next Next test is a sign. I want to be able to assign variable to the Assign values to the variable so I need to implement this assign node and Again, I think I'm running out of time So I'm going to cheat a lot and instead of typing I will just Copy and paste. Sorry But but by doing this way, we are going to have more time for questions But as you as you can understand now with the logic is similar so the node of targets is The name of what contains the variable the name of the variable for the assignment So I get the register corresponding to the variable. I visit the value. I put it on the stock I pop it and they move the value from the stock into The register which which represented the variable and so Then we have something which is a bit more complex, which is if so now we have to do labels so again, I I'm going to cheat a while so basically the idea of Of this is that I want to okay. I'll show you the node Okay, so So you see now that the tree is getting complex, but basically the if as a test which which is the expression in the if and then it has a Body argument which contains what you want to do in in case that then they did the condition is true And it also has a else a clause, but we are not going to implement it so So what we want to do is to check That that we don't have any else because we are not going to implement it then to declare a label For the 10 to declare a label for the end. So these are points to which we can jump So we are did it several code. We are going to generate and now I'm going to generate it for real so the operation is the test operation is Inside of the test attribute, so we are going to visit and the rate and put it on the stock We are going to implement only the less than Comparison for now. So if it's a left hand, so I have the value on the stock So I can jump if it is below and if it and I jump to the 10 label and If if I didn't jump to the 10 level I want to jump to the land label So it's a asm.jmp and Label and then I can declare I can Declare what where is the label so at this point in December? This is it's where my 10 label it is so it means that if this Basically if the condition is true I'm jumping from this point and I'm not executing the jump to the end So this is how it works at low level and then I can visit I can visit the children and so by doing this it means that if the condition is true We jump from here to here and then we are emitting the Code for the return for example and if it the condition was false We jump to the end label Which is here and so we don't execute anything basically. So this is the logic Again, I am Running out of time. So I will copy and paste the compare Which is the same logic again and again. So I visit the left node for the comparison. I visit the right node and I use this instruction to say, okay I want to compare them and then depending on what is the what was the comparison in this case It was LT then I know here that I had to use jump below if it was like L e so which is less equal. I had to use Jump below or equal or something in the something like this So, let's see if it works a session error Maybe I don't think it's known but probably it's an empty list now that I think of it yet. Yes, not so Set dot asm It is not working the pair program. You should have told me that I was wrong. I Mean this is not per program is like a hundred people programming. Okay, it's working. So The basically then you have the same thing for a while loop Which I'm not going to To write live because it doesn't make sense. It's really the very same The very same logic I can just copy and paste and Implement it this way Okay, so now we have a JIT compiler which can do all these things we can do while we can do ifs We can do assignment. So this is the logic for the while. You see we can do comparison and etc So we have like a complete enough system. I also want to add a decorator So I can write a nice jitter.compile around the Python function. So how do I implement it? Well, I implement the compile function here. It takes a function. I get the source code from inspect dot get source Fn then I I Institiated a st compiler and I compile it to a JIT compiler function. So let's see if it works It does. So basically now my JIT compiler is complete and I can try it with something real. So I have this benchmark This is a function which computes in a very inefficient way the digits of P and I'm going to run it on top on top of C Python like normally and Then I am going to JIT compile it using my compiler here and then let's see If it's faster or not Disclaimer it is. If it doesn't take fault. Okay, so you see it's more than 10 times faster and so basically it's because what happened is that we are compiling T is putting into assemble execute in assemble and of course it's faster than than C Python just for the sake if if we're running on top of pi pi It's even much faster. But because pi pi is a real JIT compiler It is a toy JIT compiler, which is not optimized for speed and by the way in few minutes there will be another talk about the pi pi JIT compiler by Ronan And so if you are interested in see how pi pi managed to be ten times faster than T a simple thing well, you should listen to him and So basically that's it before ending. I want to challenge you because I Yes, we have this game write your own JIT. So this is the github repo for with containing this code. So I You are welcome to fork it to Other new features new statements and new operation whatever you want. You send me a pull request I won't accept the pull request because I need The toy language for the talks, but I will maintain a list of interesting pull requests to show people if how How to extend this so I think I am done and if you have question Thank you, Antonio for the amazing talk. So any questions, please come to the microphones. There is a question there I don't think I understand the return statement. I'm not sure if it Does it actually use to return register or does it just take the last thing that was in the stack and return it? so Return statement is here So basically the calling conversion of x86 for Function using returning a double It is that you need to put the return value in the XMM zero register and then execute the red assembly instruction, which is Which is this so what I what I'm doing is to visit the The node containing the expression So when I do return 100 100 is the node dot value in this example So I visit it I compute the value I put it on this on top of the stock of the x86 stock and then I put This value in the XMM zero register and then execute the red and that's where do you put in that register? Hmm, where do you actually put in in the register? It's here the pop SD so pop SD is Well, it's a helper that they wrote To pop the value from the stock into register. Okay, so Thank you very much for the impressive life coding session Do you like there is no handling of like overflow like if you instead of returning a hundred you've written like 10 I don't know billions. Yes, it's gonna. It's gonna say fault or something. No, yes I mean we are using a floating point So even in Python there is no handling of the overflow if you're using double variables. I mean float variables They just overflow or get to in for whatever. Okay, so the num class actually converts everything to floating points Yes, it's something that for simplicity. We are using only floating points variables here. Okay, so yes Okay, any other questions? Well, we have time Okay Okay, since you're a pi pi developer. Yes, my My understanding was that you were working on This kind of code yet. You said that you didn't never did assembly code before so What are you doing? I'm pretending to work on pi pi and get and then the other do the real work No, actually piping pi pi is a complex project and it has various various layers So what we do for jit compiling is that you start from the Python code You do many layers of things and then some point you you end up with an intermediate code, which is Similar to the assembler, but it's But but it's generic and then we have many backends So from these intermediate code you can compile for x86 you can compile for arm You can compile to our power PC and is the job of the back end So when you write a new jit back end you turn this intermediate into real assembler, but I never worked on any Backend for real CPUs I work on the back end for the dotnet virtual machines years ago And so most of my work in pi pi is above this level So I do things which produce the intermediate representation, but then it's the job of the back end and well I'm not the next part of them are many So that's that that that's why I never had to write any real assembler But of course the intermediate language is close enough that you still need to to think about registers and variables and stocks and etc Okay, any more questions Okay, so then I guess thank you again