 Thank you. So I'll be talking about writing an interpreter in Python itself from scratch. So let's get rid of the hard question first. How to distinguish between the different prompts during the demo? Because we have Python and we're writing Python in it, so we're not going to know where we are. So I'm going to use iPython for the regular Python interpreter. So if you see in bracket, number of bracket, that means we're in the regular Python interpreter. And if you see p greater, greater, then we are in our interpreter, the one we're trying to write. Okay, so then let me clarify a bit what I mean by from scratch. So we're not going to use exec eval, of course that would be too easy, or even the ast module for interpreting, and we're not going to use any of the existing internal libraries or external tools for parsing either. Actually, but we are going to use Python strings, Python lists, and all the functions associated with those. So let me just show you what the interpreter looks like now so we get a better idea. So you can't hear? Can I hear me now? Okay, so we're in the interpreter here, so let's try some simple things. X equals 3, and then maybe print X, we're in Python 2 here, you print, I don't need parentheses. And oh, we got an error with a terrible way to start the demo. Of course, this is deliberate. So also now you can distinguish between when we are in one interpreter and the other, right? So we had p greater than here and here I have in. So let me hit this and restart the same thing. And now what I'm going to do is as a first line, I'm going to write simple, simple ast. So simple it works more or less like import, but I'm going to explain how that works slightly later on. So now if I do the same demo, so print X, I don't get an error and it prints 3 as expected. The command something is just for debugging. Let me move that a bit higher. Okay, and to get an idea of the scope of what the interpreter can do, let me just compute the squares of all odd numbers between 1 and 5 and see that it gives me the right result when 1925 here. Okay, so also for those of you for whom writing a Python interpreter is too boring, you can try to figure out before the second half of the talk why it gave me an error. If I didn't run that first line and why it did not give me an error afterwards, I mean, it's not just a if check that says if you didn't run this magical thing, then give a deliberate error. Okay, so we want to write a Python interpreter and so the user is going to write something in the file, like commands in a file or in the repo as we've seen here, so something like X equals f of 3, but we don't want to treat this as a string all the time, although in theory it would be possible. So the first thing we usually want to do is just turn this thing into a data structure that's more malleable for execution. So and that's an abstract syntax tree which we have on the bottom left there. So for this statement we have as a root a regular assignment node and with the first child which is a name node with the value X, the second child is a call node which itself has two children, name node f and an argument list node arg list with a single argument number three. So an abstract syntax tree is just a tree like this where each node has some kind of name like regular assignment call arg list and so on. And if you see it in the console during the demo, you're going to see what's on the right, so this is the textual representation of the same tree on the left we have there. Okay so to write our interpreter we're going to have two steps. So the first step is parsing, we're going to turn Python source code into an abstract syntax tree as we've just seen and in the second step we're going to take this abstract syntax tree and run it, so run the corresponding program. Okay so let's start with the first step parsing. So of course I'm only going to give glimpses of this because there's many parts to this interpreter. So for parsing right so that's when we want to transform text into the abstract syntax tree. So the core of it is an algorithm which takes as first input a description of a language so in this case Python I'll come to that so I'll describe that in a bit more detail and a bit a second input some piece of text right which we think of as the source code and which gives us either an abstract syntax tree or a syntax error if the second input string didn't match the grammar which was given there. Okay so I'm going to call this a grammar matcher and we see a black box here because the idea I won't have time to go into detail of how that's going to work. Okay and usually so for the black box that I'm not going to talk about this steps involves you know multiple sub steps like tokenizing lexing parsing and so on but actually in this implementation we'll just have one step so to go from these two inputs to the output we want. So let me talk a bit about the grammar so actually the grammar we have to give as first input is also going to be represented as an abstract syntax tree it's just an abstract syntax tree for describing some language. It also usually has a textual representation which can be turned into this right so the main function we're going to use for parsing is this so it takes an abstract syntax tree piece of text and turn it into the abstract syntax tree of the second input. Okay so maybe it's easier so normally this is sort of the textual representation of the grammar which I'm not going to go into detail about but this is the piece of text we're going to start off with so let me just show you how the parsing works. So let me exit here so I'm going to run the first few lines of the demo we've seen earlier so let's import some stuff from our library and here I'm defining a function a match function which corresponds to that black box we've seen I'm defining it here just so that it looks more like what was already on the slide oops so here I have a piece of text I've imported right it's like about 20 lines and it's going to describe a first language that we're going to turn into a tree okay and we have some kind of start tree so I'm not going to show you what this is and now the first step is we're going to match right so this is the black box we have here the first input is this start tree which I'm not going to show you yet and the second input is this grammar which is just this piece of text and now match tree should be a tree which corresponds to this piece of text so let's see what's in there right so now we have an abstract syntax tree as I've shown earlier except that this is a tree describing a language instead of a tree describing a Python program right and so one interesting thing here is now I'm going to show you what was in start tree right which was imported and it looks exactly the same in fact if you check this start tree equals match tree they're actually exactly equal right so the only information contained in start tree is that piece of text but in the tree form okay so this was unnecessary because we already had start tree we could have just used that so I'm going to in the second step I'm going to pass so give as first input the match tree we got in the first step and as second input the grammar we had so the piece of text we had plus some extra stuff so I can show you what that is but it's not oops that's not too important okay and as third step we're going to take what we had from the second step and give it as first input and then as second input we're going to have something that actually describes the Python programming language so let me show you what that piece of text is okay so this is again you know because Python is still fairly complicated as a language so we have lots of things in this piece of text actually something similar is in the C Python source so you can see something very similar in there so maybe let me just focus on a few things so here we have file input right so if you want to parse a file and it just says well what's the file input it's just either an empty line or some indent followed by a statement and repeated many times the star is some kind of regular expression repetition star and a statement is just a compound statement or a simple statement a simple statement is just small statements separated by semicolons and the small statement is either a print statement the delete statement the pass statement that's all one and let's just take a look at so a pass statement it's just the the string pass followed by nothing a delete statement it's just a string delete followed by some spaces followed by a least list of expressions it's all okay so you have this entire string which is going to describe the Python programming language and now we have the string in the tree form in match tree three okay so let's actually try to parse some Python program so again this is we're going to use a slightly more advanced black box although we could have used this one the entire way instead of match and we're going to try to parse this file and to parse it well we actually just do exactly the same thing so we run match and as first input we give it the tree describing the Python programming language and as second input we give it the text which is the content of the file and now if we look at this file oops yeah so it's what you expect from a Python program right so we have regular assigned as in the example we had in the very beginning okay so that's all I'm going to talk about for the pricing parsing part but I should mention that so the black box isn't that complicated actually the entire thing isn't that complicated so if we could just take everything in the parsing library and put it into a file which is about 510 lines of Python and that file is able to parse itself so let me maybe show you that now so this is the file single file right so 508 lines this is actually this part you don't need to read this is sort of like the start tree which and this is all the strings we had in there and in the beginning you actually have you know the semantics of how like what to do was each of these trees right like the match function which was in there okay so let me just run it see that right so here it parsed itself and it gave itself as an abstract syntax tree just to show that parsing worked right so this is you know what we had at the very end here so this looks like that okay and the bootstrapping part so the part where we started that fits in about 120 lines maybe a few things that were interesting so the thing which allows us to do this in a single step instead of what it says is the fact that we're using a parser expression grammar instead of one of the other grammars and the other thing which is interesting is that this parser is a parser interpreter in some sense it's not a parser generator right we're just making abstract syntax trees but we're never generating Python code for parsing the next step okay so that's all I'm going to say about the parsing part so if you get lost you can wake up again and I'm going to talk about running now so let's assume that if the user types some text we can have it in some kind of tree which is representative of the program and now we just want to run it okay so did anybody think about that simport puzzle at the beginning so why did it give an error if I didn't run simport which runs more or less like import and why did not give an error if I did that okay so let me give the answer to that right now so simple st.py right so if you were trying to import simple st then it would look for this file it contains the phone following a function definition so it defines a function which is called print statement which takes a single argument and then does the right thing to print its content so what happened here is actually that some of the semantics for the abstract syntax notes for so for how we're supposed to run it they're not included in the core of the running part of the interpreter they're actually included in the library so since they're in the library if you don't import them the interpreter doesn't know what to do with them right so they're actually going to be run through our interpreter instead of being hard coded into the interpreter okay so let's see another example so this was print statement right so let's look at end test right so if you have some if you're writing some condition and some other condition how are we supposed to run this well there's a function defined in there which is called end test which takes two parameters right so the first argument so which we think about as the first argument and the second argument but actually the argument it receives are the children in the abstract syntax tree right and so this one it just if the first argument evaluates to true and then we try to evaluate the second argument and if that's a value is true we return true and otherwise we return false right so this corresponds to our run of n okay so let me now that we've seen two examples let me go into the details of how the main loop so here's a slightly simplified version of how the running part of the interpreter works in its main loop so it initializes the scope at first so some environment with your globals in there then it initializes the stack with a single element in there right a single frame which just contains the root of the file we are trying to run and then it just goes into this while loop so it looks at the top of the stack and it looks at the name of the node at the top of the stack if it's in some module called boot then it just calls that function in boot passing a star node so passing the children of that node as argument to that function otherwise it looks in simple st and so if there is a module called simple st and there is a function corresponding to the node's name then it just calls that function and passes the children and otherwise it looks into the built-in so the things hardcoded into the interpreter and if it finds that then it calls that function and passes the children of the node and otherwise that's in there so that's the first error we've seen okay so let's look at okay so let's look at some more examples uh so if statement so this is normally something you would need hardcoded in but actually uh we're going to be able to get away with having it in the library instead of in the core of the running interpreter so and if statement just has a list of pairs of conditions and blocks and if uh so we go through all these conditions and blocks in order and if a condition evaluates true then we evaluate the block and return it oh and by the way evaluate as a function we're restricted this main loop all it does it just adds that node at the top of the stack it doesn't really do anything else and then the main loop will take care of evaluating anything as expected okay so let's take look take a look at fourth statement so fourth statement well there's an index variable some iterable and then the block we want to execute and maybe some kind of else block in some cases when we want to execute a block of code if we didn't break from the loop and so the loop just works this way so evaluate this iterable take an iterator of it right iterator of it so this gives you an iterator and then run the following indefinitely so try to assign the next value of the iterator to the index variable so here we can't do index r equals iterator dot next because that would assign the value of iterator dot next in the wrong scope it would assign it in the scope of the fourth statement function whereas we wanted in the function which calls it and if we we do this until we get a so sorry we assign iterator dot next to the index variable and then we just run the block evaluate block at the bottom and we do this until we get a stop iteration right which means iterator dot next ran out of values here and then we just check run the else block if there was one and then return okay and so let's look at one more example so try statement well try statement it's just a block and then a bunch of exceptions right so some list of of exceptions we're looking for and the block to run if we catch that exception so it just works like this evaluate the block and if we catch some kind of exception well let's put it in a variable called error and then just check it against each of the clauses that the that try statement is trying to check against and if we find one then just evaluate that exception block and return and stop checking for the others and otherwise just keep going okay so does anybody notice any problems with the last three definition we have seen here so if we have this thing in the library what's going to happen is that this is going to loop indefinitely because when we translate this function into an abstract syntax tree so the first thing in this function is going to be a try statement node and if we think about our loop for that try statement node well it's going to look in simple st for a function called try statement and it's going to call it with its children but that's this function so it's going to call itself and then it's going to call itself so it's never actually going to run anything right so the call graph actually this was a problem with its statement also right so our definition of its statement included in if and so if we include it in library it's just going to call itself there which and loop indefinitely so that's not what we want right the call graph looks like this try statement calls itself if statements calls itself for statements calls both of the two others which calls it back so that's going to also loop indefinitely so it seems we are a bit in a bind in a bit of a bind here right so where what we are actually going to do is um we are so one way would be just to put all of this uh coded into the interpreter right so if he this was run in c python it would be okay because the try statement inside would be run in c python uh but then actually evaluate is going to be hard to implement because currently evaluate just adds uh the node we put there at the top of the sack and then what it's supposed to do is run a few iterations of the main loop and then when it's done we extract the return value and do something with that so if we have a python function inside the interpreter there then when we run evaluate we have to exit that function and then when evaluated it's done after a few iterations of the main loop we have to come back so we have to do some fancy state tracking uh even if we wanted to put this into the core of our interpreter okay so uh what the actual solution we're going to use is the following so uh we're going to write three more functions which are simpler versions of the three we've seen so instead of so the simpler version of this statement is single statement which is only able to handle a single block the simpler version of four statement is simple four statement which is only able to handle iterations which are indexable and the simpler version of trice statement is trice statement error which only handles uh so only does the second part when an exception happens so the first part we're actually going to run it inside the core and actually there was also a while true there so we'll also have a simpler version of while which is only able to do while true so this is what a single if looks like so it's only able to take one condition and one block and then it evaluates the condition uh turn it into a boolean checks so we built a dictionary here such that if that boolean is true then we'll evaluate the block that was passed here uh so this is some kind of a bit of a small talk inspired function and if it falls then it just evaluates a block which contains a single pass statement which basically just does nothing right so that's the definition of single if uh here's the definition of simple four right so simple four uh takes so it takes no else block it still takes an index variable and the block and uh here we assume that the iterable is indexable so then we can have an index variable which starts at zero we evaluate the iterable evaluates its length and then do the assignments we had before evaluate the block just increment the index and then check if we have attained the length and in that case just return from this function right because that should be the end of the loop okay and trice statement error is more or less the same thing as we had uh in trice statement right so the last few lines of trice statement that stays the same there's no problem here uh but here you can see that is calling simple four instead of four right there's no simple four statement in Python normally and uh we should also write single if instead of if on this cert line here but actually uh we'll just uh delegate that to the parser so when the parser sees that an if statement only has a single block it's going to put a single if abstract syntax re node instead of an if statement abstract syntax node syntax node okay so here's the new call graph again right so by doing this we're able to put all of these things in the library instead of in the core of the interpreter okay so i want to talk about one last thing and that's postmortem debugging so here i have a file that i'm going to run which is buggy right here there's a typo ii and there's two total which should be total so let me run this uh in c python first so the right the python interpreter we're used to okay so we got an error here as accepted right it printed the first few iterations and then we got an error at the first error so uh normally in regular python what we can do when we get an error is uh import pdb so the python debugger module i think someone's going to talk about this later and we can import this and run pdb.pm so uh for postmortem debugging and if we run this uh we'll get the python debugger uh interpreted here and now we can look at the values at the last error we got right so we can print i print total and if we had function calls we could go up and down the stack so that we can evaluate expressions in different uh context okay so that's uh how postmortem debugging works uh in python normally so let's see how postmortem debugging works uh in our interpreters test so i'm going to oh whoops uh i forgot to uncomment simple simple way import simple asc so let's do that again okay so it it prints the first few lines like regular python and then we got this error it's this key error because it's tries to find this variable ii in scope and can't find it so let's first see a uh stack trace of this right so here you can uh as a demonstration you can also see that it arsenates between going into this uh erroneous file and this lib slash simple st so that's what we've been talking about right so it's currently in the augmented assign function because it's trying to run total plus equals ii and then you can see uh you know it also has uh column indent instead of line indent okay so let's uh look at the last stack frame where we got this error so here's the node associated with it right so it says name ii and that's wrong so let's try to fix that right so the correct uh value is i and now what we're going to do is just continue so now we hit the second error but before that it printed total which means it passed by this line but it didn't print all the rest of the stuff so it really did just continue so let's examine this one and fix it too and continue right so and now it goes along its way and also in the last iteration it didn't hit this bug which we already fixed right so this is something which as far as i know you cannot do uh with just a regular Python debugger okay uh so uh here so just to say that you know everything i've shown today even though i didn't show you all the details isn't that complicated so here's the current line count and as i've mentioned the first part for parsing if you only wanted parsing you could have just had all of that uh in about 500 and 10 lines instead so we tried i tried to move as much as possible uh into the library and of course it's not fully featured right now but maybe as the last thing to show you i can show you that it's able to run its own uh parser in itself so this is probably going to take some time so maybe let me just mention that uh this project is looking for help so please talk to me uh here so where i mean we by we i mean me i'm mainly interested in reducing the complexity of everything we have here right so reducing these number of parser uh and also reducing the dependency on cpython and make more steps sort of self-generating right so you know uh each of these library functions like simple if helped us make if statement and so on so have more things created in steps like that uh okay so it almost started here so here's uh this was just some test to see if my implementation of objects work so here it is right so now it's trying to parse itself so it's matching things from the beginning uh i see so the index here the number is just the index in the input of course this is going to be very slow right i'm running an interpreter inside another interpreter and actually the parser is also uh sort of a parser interpreter instead of a parser generator so of course it's very slow uh it's not going to end so i'm going to just stop it here okay so that's all i wanted to say thank you anybody has questions so this is part of something bigger i guess i showed it like two months ago so so i guess i want to so uh first of all so this is so that it's possible to experiment with changes to the programming like it's more easily and so other like if you wanted to do that with c plus on you have to add things in multiple places in the source and then recompile every time you want to test out the change and here since it's all in plus on you can sort of just test it out right away right you don't even need to exit the interpreter in theory uh but it's actually also to write something else i guess uh with a graphical layer uh after this so to make changes so that the next step is easier to write or can be right written in a more succinct and natural way though well actually i mean the new statements are actually just for bootstrapping for now right like simple live for uh single single live for a simple four statement uh i haven't added anything i mean the thing i wanted to add is actually this debugging thing and also uh late binding so you know change how class lookup works so that i can reload my class definitions so that all my instances use the new definition uh but i haven't added anything else in you know any particular statement but it's made so that it's easy for anybody else to add new statements so for the python part yes so this is i mean it's derived from the uh python two grammar in the source it's at this part so it's i mean if you look at the one in the python source i mean it's of course not exactly the same thing and also the handle intends slightly differently uh but otherwise it looks more or less the same except you know for things we uh i talked about right like a single if things like that which are here which are of course not in the normal by cell interpreter in the back the future of python sorry do you intend to spur all the future of python life generator dictionary completion uh so there's dictionaries so though so the primary goal isn't to have all the features of python it's really to make it so that experimenting with languages is easier and to get a shorter description of the interpreter right maybe a subset but a subset where it's easy to add new things right because now that even the abstract syntax is right so generators for example are not there but if you wanted them then you could just add a definition in the library at some point and then just get that but the primary goal isn't to be fully python compatible no so thank you