 So, it's another boring Saturday afternoon. Let's do some more coding. Incidentally, I was looking at the YouTube analytics of the last video I did, and bizarrely, people appear to be watching these things, like all of them. I can't imagine why do you want to do that. You're all insane. Anyway, thanks for watching. And today I'm going to write a compiler. So, the backstory of this is that a while ago I wrote this compiler suite called cow-goal. It's a simplified ADA-based language, well, ADA-inspired language, really, which is intended to both generate code for 8-bit micros while also being self-hosted. So, the compiler is in fact written in itself. This is part of the source code here. And I did get itself hosting technically. I had it rebuilding parts of itself on a Z80 CPM machine and also a BBC Micro second processor system that's a 6502 with 63.5 kilobytes of available RAM. And it worked, and it even generated reasonably good code. But it's just very, very big. It's something like eight different binaries all pipelined together. And compiling Hello World takes seven minutes on a BBC Micro. And it's just too big and too complicated. And as a result, what I want to do now is prototype a compiler for a subset of the cow-goal language, which is the minimum viable product for a compiler. I want it to generate as simply as possible the worst code possible. And then I will try applying some simple heuristics to see if I can make the code better. And the intention is to try and compare the original cow-goal code generator, which was a traditional compiler suite with an abstract with a tokenizer and a parser generating an AST and a type checker on the AST which generates register-based bytecode and then several sets of code generation on the bytecode generating actual assembly code and then a complex linker to actually make it work. And what I'm going to do instead is try and simplify as much as possible and see how it goes. So I'm actually going to cheat slightly. I'm not going to write this completely from scratch. I already have boilerplate for a Lex and Yak-based compiler framework. The reason for this is that parsers are irritating and fiddly and it's just not fun watching people write them. Yak would generate a much better parser which is so much easier to use and understand and this is a prototype, likewise for Lex. Lexes are subtle and quick to anger. And I did this ahead of time because there's a lot of subtlety into getting stuff just right that's not entirely obvious but yes that would be boring to watch. So let us get started. So we've already got some skeleton and a build script so let's just build and run this and at this point I can now type in code and I press ctrl-d and it will compile it and that has done nothing because our program is empty. So how do we define a program? Well a program is a series of statements and a series of statements is either a statement or a series of statements followed by a statement and for now we'll define a statement as being empty. Oh yes and of course after running the program we of course have to stop running the program. Incidentally I'm going to be targeting my favourite terrible machine code architecture, the 8080 for this. One of the reasons is it's incredibly simple. Another is if I can make it work in the 8080 it should work at anything poorly but the third is I already have a nice emulator machine code debugger and of course the CPM assembler that I wrote a couple of videos ago. So let's try doing that. The shift reduce conflict is because statement here is blank. It can't tell the difference between a blank statement followed by a blank statement or a blank statement so we can ignore that and yes that has produced our red. So let's actually have a statement. Actually no, let's not do that. Let's actually see if we can build and run this. There we go and that has generated a test.bin file and I should be able to load that into my emulator break at address 100, go and there is our red statement. So we already have a working compiler generating running code. This is a good start. Let's run it and it works. So we actually want to add some statements. Cal goal is AIDA inspired. It's got semantics not too dissimilar from C however mutated considerably to work better on the small machines. So it's got nested functions. It's got multiple different types of data type. It does not do any form of implicit type conversion because that makes the rules simpler when you're juggling like 8-bit types and 16-bit types and 32-bit types. It doesn't support any kind of recursion. It's got a few other things like this. Now we want to start by defining a context in which our code will run and this will contain our symbol table and everything like that. In Cal goal the fundamental unit of code is the subroutine. Here we have some of the original Cal goal language. And you have subroutines, one after the other. But we also have statements in the main body of the program. Cal goal supports nested subroutines. So in fact there is a top level subroutine containing all the code which these are nested within. So the very first thing we need to do is actually create context for our subroutine. Now a subroutine can have a name. It has a link to the subroutine in the immediate lexical scope. That is the one it is nested inside. And it has a symbol table. So we also need to define symbols. A symbol has a name obviously. It has symbols held in the link list. So each symbol has a link to the next symbol on the list. And there will be some stuff here for the actual definition of the symbol which we will come to later. So each subroutine has a symbol table which is represented as a link to the first symbol. So when we initialize our program we need to create the root subroutine. And we also need to remember which subroutine we are currently running code for. So just current sub. So here we want to create our subroutine. The name is let's call it that. The root subroutine has no name in Cal goal. It is anonymous. The parent is null because there is no parent to the root subroutine. And their symbol table is empty. So in fact all we need to do is that. So that should work. Actually let's just make a test program. Which for now will be empty. So build compiler test cal. Did I? There we go. So yes it's produced our ret statement. So the very, we know some context for actually writing the rest of the code. So the first thing we need to do the first statement we're going to implement. Without which we can't really do anything else is defining variables. Now the Cal goal variable syntax is a departure from the Ada style. It's more like JavaScript. We have sorry let me just get something to drink. We have a VAR keyword. The identifier followed by a colon. Type specifier. Assignment operator. And initializer. So we have a. We have two syntactic tokens which is bar and colon. And we have the identifier itself. So we go to our lexer. And we say a VAR is a little bit better. Find a token for VAR. Is it going to work? Yep. Okay. So we tell the parser that we have a new token called VAR. And we tell the lexer that when it sees a VAR it returns a VAR. So this is why we're using the lexer. It just makes it so much easier. And we also want a colon. But we don't need to define a variable for that. All right. Now we also need an identifier. And an identifier will be in cow goal it is a alphanumeric followed by more alphanumerics. So that gives us a pattern for the token. And yes, we want to define ID. Do we need to define a colon? I don't think we do. Because colons are, it's been a little while since I've touched lex. We don't define a natural name for this token because we can use the character constant instead, which makes life generally easier. One of the things we do need to do is to copy the actual value of the identifier that we've read into storage so that the compiler can get at it. So we have globals here. This is just a simple intermediate buffer containing characters. And we're going to stick just an arbitrary size on it. We need to define the storage for the buffer. And in the lexer, we copy the text into the buffer. And now let me just double check the definition of string. What I'm looking for is if it actually overruns the buffer, does it store a trailing zero? No, it doesn't. So we can test for this like so. Fatal is a function I've returned here that just returns a error message. Okay, what's that complaining about? 12. What is that complaining about? A pointer from integer? Oh, oops. Size goes last. I was thinking of Snprintf. Can we put this on the next line? No, we can't. Seriously, I need more caffeine or possibly less caffeine. Right. Okay, so we can now enter identifiers. So we can start by actually making our last statement work. So a var statement is a var token, followed by an ID, followed by a colon, followed by an assignment operator. And I don't know if you put that in yet. Back to the lexer, followed by an expression. And we haven't done an expression yet. So is that going to work? That has complained because we do not have a facility here for an empty statement. So our test program is empty, so it's got nothing in it. Therefore, none of these rules match because a statement list should be allowed to be empty. And in fact, because statements can be empty, we don't need the extra rule for a single statement. Because this will match a single statement because this one is empty, followed by this. And actually, so that works. So do we actually want to do it that way round? I think we may actually want to do that. So, yeah, I yak likes these lists collapse either to the left or to the right. And I cannot quite remember which one it is. Once we get onto multiple statements, I can debug that and actually figure out. And we also need a basic expression. So let's just do a number for now. And again, back to the lexer. And a number is going to start with decimal numbers. It is a sequence of digits and we need to go actually going to rename that we're going to call that text. So the idea is that if the tokenizer reads a text object or some description, the value goes in here. If it reads a numeric object, the value goes in here. This becomes text number to the lexer number equals agent. And Cowgirl supports various different numeric syntaxes, including hex and binary and octal. And you can also have underscores inside a numeric constant, which are ignored. We're going to stick with decimal for now just for simplicity. So does that build? Not quite. We haven't put in a warning rule cannot be matched. You can't start an identifier with a digit because if you do, it can't tell the difference between a identifier and a number. So we do it like that. Better. Right. And in fact, we go to our test program. We can do far i in 32 equals one. Oh, yeah. And you also need a semicolon to go at the end of the statement. Unboxable character. It doesn't like base. Do I need a lexer rule for white space? Pretty sure I don't. This is not actually the first time I've been through this. So I have actually experimented with lex and yak and cowgirl before. So this is the, there we go. We do need to tell it to ignore white space. And I actually appeared to be defining some character classes up here. So let's just do that. So we define a character class for a line feed and for white space. There we go. So this, this line matches a hash followed by any character up until a terminating new line. It has a comment. This one matches any number of spaces and does nothing that makes it ignore white space. Took them too long. Yeah. I think I actually want this to be honest. If the length of the source is less than a destination, additional null bytes are written. So string copy should actually clear the text buffer completely. So now what is it complaining about? Oh yeah. I forgot about the type spec. So actually there's a, we define a type here. We'll get on to types in a bit. But for now a type is an ID. I'm not sure where the spaces came from, but this has actually compiled our program, which is nice. If you set while we debug to one, we can actually see it do it. So this is all the details of the parser parsing our text. So we're not actually doing anything with any of this yet, of course. So before we do anything else, we should, we need to add some types. Do we want to do types now? We actually need to, at this point, we need to insert a, I'm going to have to write this up a bit. So I'm just trying to decide the best order to do things. We've got a number of things that all depend on each other. We need to be able to add symbols to the current symbol table. We need to be able to look up types. We need to have some types in the symbol table, and we need to be able to evaluate expressions. So let us start with actually adding things to the symbol table. Now what we'd normally do here is stick some code in that actually does the variable definition. However, we can't do that right here because this ID has added the identified, has copied the identified text into the buffer. But then any ID that's processed after this will overwrite that. So in fact, we need to do this slightly differently. So we're actually going to do this. So the var definition rule, which is processed whenever var followed by an ID exists will add the variable to the symbol table, but we'll not actually do anything with it. We can then store the reference to the symbol in the var definition rule so that the text here can do something with it. So to add to the symbol table, we need to define a function that actually does it. New symbols always get added to the current subroutine. So what we need to do here is we walk through the symbols in the current subroutine to make sure that there are no name conflicts. We can't add a symbol, we can't add two symbols for the same name. We can have the symbol in a nested subroutine shadow a symbol in an outer subroutine. So we just want to look at the current subroutine. So if, name, surname, zero, fatal symbol. Okay, so that checks to see if the symbol already exists. Now we've created a new symbol, very simple. So here what we do is we just, and now we need to stick the actual symbol definition somewhere. So we now start needing to use yak types and I can never remember how these work. And of course neither yak nor bison actually have man pages and the info reader is terrible. I wish the FSF tools were just right flipping man pages. The info rules, info pages are terrible. I just want plain single piece of text that I can search through and read. Okay, so type our definition type single star definition. I think that's, it isn't going to work because I need to define my union declaration here it is. So this is the data structure that actually contains the information about. This is defining the data structure that the data attached to each rule is actually stored in. So I think that will do it. Not quite vected identify before struct, really. That right sounds my screen recorder. I don't put the type here. I put the name of the member. Right, that works. Let's actually just turn this off. I've put in this declaration of built-in function, because I forgot to define string.h. Right. So that has actually added this new symbol here. Let's actually just do a bit of logging. Okay. So we've created a new symbol I. And in fact, if I go over here and I try to do VAR I, this will actually produce our very first compilation error. Of course it doesn't. Of course it doesn't. Yes, that's because we forgot to actually add the symbol to the symbol table. Yeah, so because it's a linked list, let me just do this. Why is this? So this should be setting up a linked list in the current subroutine. We're only defining symbol once, but we actually have multiple statements. We don't have any compilation errors. So why is that not working? Right. There we go. Simple I is already defined, which is just what we want. So let us chop that out. Okay. Now we're going to want to start doing some actual types. Types live, types in Cal goal live in the same name space as everything else, so that you can't have a type and a variable of the same name like you can in C. So we actually define, so we can have different types of symbols. So we can have a type symbol, a subroutine symbol, or a variable symbol. And that's stored in kind here. So do I want to do it like that? No, I don't want to. Don't want that enum. I'm actually going to use tokens instead. This makes the code slightly simpler. Also, tokens will have numbers assigned to them, and they're not zero. So an uninitialized symbol, like the one returned by add new symbol, will have a kind of zero. So we can identify them. Subroutine. Oh, of course, a subroutine is not actually a symbol. But we do want to add some symbols. And the first one is going to be into 32. So that will create the symbol. We tell it it's a type. We actually now need type-specific information. And the only two things that you need to know in CalGall about types is the width, and whether it's signed or unsigned. We're just not going to bother with that. The signed or unsigned bit for now. So this should create a symbol, name in 32, in the current subroutine, kind type, width of type is four. Or not. What's it just complaining about now? No. Okay, and here you can see it's created a symbol. So we now get to the point where we want to look up our type. So we're going to have a function called look up symbol that actually does the work. And when we process a type ref, it will attempt to look up the symbol. If the symbol doesn't exist, there'll be an error. If it's not a type, then it's an error. And we want to define that a type ref will turn symbol. So look up symbol. We actually want to look up all the lexical scopes starting with the one we're currently in. The scope is tied to a subroutine. Sim, if the symbol name matches the actual name, return this symbol. When we run out of symbols, go to the parent scope. If we've run out of everything, then produce an error. Okay, so this should work. All reaches end of right. The compiler doesn't realize that fatal doesn't return. Let's rename log, I think. It's conflicting with the maths log function. Okay, so that has actually looked up the symbol. So if we change this to z, then we get the expected error. Okay, back to our variable definition here. Now this contains the symbol of the variable we're defining. This contains the type of the symbol. So we do need to remember to tell it that our new symbol is a variable. Tell it that the type is $1. Tell it what type of the symbol is. And did I remember to... Am I doing this right? I think... Yep, okay. So if the symbol is a type, we use this to store the width of the type. If the symbol is a variable, we use this to store the type of the variable. And this actually points out the type symbol of the variable. So that actually goes to u.type equals 3. Is that going to work? No, it's not. Because this is actually a struct member. No, this is struct member. Okay, so this should now have defined a variable of the appropriate type. So now what we need to do is we assign the value to it. Right, now we're starting to get to interesting bits because we have to actually generate code. Now, when dealing with a BisonYAC parser like this, each of these rules is processed in sequence. An expression here is actually going to be defined mostly in terms of other expressions. We are essentially generating an abstract syntax tree except that the tree is being collapsed as we generate it. And it is processed in left-hand depth first order. So everything happens left to right. Now, we can exploit this because it actually turns out that this is ideal for generating stack-based bytecode. What happens is an expression pushes its result onto the stack. The order of evaluation means everything is exactly where you expect it to be and the code generation is really simple. So we're generating a constant here. Our code generation is actually going to be just about as simple as I'm trying to remember the 8080 machine code, LXI, which is load 16-bit integer. To push something onto the stack, we just want to do something as simple as this. Load value, push value, done. Remember I said this was the minimum viable product for a compiler? This is going to generate terrible, terrible code. It is of course not quite as easy as this because expressions have types and we need to know what the types are in order to generate the right code. Cal goal supports 8-bit, 16-bit and 32-bit types, so we're not going to bother with the 32-bit for the prototype, at least not yet. So it's not quite that easy. But essentially, actually if I run that, yeah. So it's evaluated the expression and we get the red. Why are we getting some new lines? That is interesting. I don't know where they're coming from. So let us rewind a bit to our variable definition. Now remember that the action here gets executed after all these rules get processed, so expression here has been pushed onto the stack. So all we need to do for our code is pop the value off the stack and SHLD, the variable definition name like this. So the generated code pushes the value onto the stack, pops the value off the stack and writes it into the variable storage. Easy. Okay, not quite that easy and we're now going to do the bits we need. Expressions actually is a little bit more complicated than they look because we don't know what type this number is until we reach this point. So what we're actually going to do is say that the expression node carries some state with it and every expression node has a type, which, if it's a numeric constant, is going to be a special constant type we're going to deal with later. And there'll be some more stuff later, but we're just going to stick with this for now. So for just doing a number, do I want to immediately push that onto the stack? No, I don't. Okay. So when we evaluate a number, then the type is going to be a null will mark this as being a constant and then we will put the number itself into the expression node. So this means that when we reach this code, nothing's actually happened. We're going to have to generate code a little bit lazily. So when we evaluate this expression, it may have pushed something onto the physical stack or it may just return a constant. This is possibly a little early to be doing this kind of thing, but we're going to need it for constant folding and we really need it in order to resolve types correctly. In this assignment, we do need to make sure that the type of the expression matches the type of the variable. So what do we do here? So we're going to define a function, ResolveExpressionType. The purpose of ResolveExpressionType is given a, a bit of refactoring, that's an expert node. Given an expression node and an actual type, this will ensure that the code is generated that actually makes sure that the thing in this node is pushed onto the physical stack as a thing of this type, or it will produce an error. This sounds complicated, it's actually going to be much simpler than it looks, because right now we have two choices only. If type is empty, then this means it's a constant. We also only have one type that really helps. So we, sorry, if the type of the node is empty, then it's a constant, therefore push the value onto the stack. Otherwise, then the type of the node must match the type of the variable, because remember the cowgirl does not do any kind of implicit type conversion. Expression was a node type name used when a type name was expected, and in the case when when a concrete type was returned, the value is always pushed onto the physical stack, therefore no code generation needs to be done. So that may work, it does not work. That's because I'm getting my C type names, my cowgirl type names mixed up. And of course that doesn't work. Why doesn't that work? So we just run this with Valgrind, because that will generate a stack trace. Right, size 8. Ah, right. Yeah. That's not a pointer, that's a natural physical object. FieldExper has incomplete type. Expert node is defined here. I think that's done something. Oh, so let's just take a look at the generated code. Okay, to find that there. Interesting. This is happening inside the Lexa. Yeah, okay. So I'm actually generating two source files. Look at the build script. And they're compiling and linking them together. So what's happening is that the header file contains the union structure in it, but the expert node here is not actually defined there. So yeah, we're actually going to have to move these into globals here. Yeah, let's do that. Because that's seen by both values. And of course this doesn't work. Incompatible type argument 1. So set 5. 1, 2, 3, 4, 5. That is a expert node. It's a pointer to an expert node. Okay, better. 76. Okay, that has worked. So we have pushed a one onto the stack. And we've popped a one into our variable. Now, the reason for doing this stuff here now is because this allows us to do a few slightly nifty things, including arithmetic. So our first actual expression is going to be a simple arithmetic operator, which is add. Now an expression can either have a type or not have a type. If neither item have a type, then we're adding two constants. In this situation, then the result is also a constant. Otherwise, we actually have to generate some code. Of course, when generating code, we have to actually resolve the two types. So there's two different options here. The item on the left can have no type and the item on the right. So the item on the left can be a constant. The item on the right can be an expression or the other way around. So we're going to do that with... Okay, if the item on the left has a type and the item on the right has no type, then we push the item on the right to be the type of the thing on the left. So differently. So at this point, our two values are actually pushed onto the stack. We have the real one that's on the stack here. And we have the constant that we've just pushed onto the stack and vice versa. So we now actually generate code. And this is going to be a little aside on the AT-AT instruction set, actually. It's got... If you use the Z80, it has the A, B, C, D, E, H and L registers and nothing else. The instruction set is limited compared to the AT-AT. What it really has is a single 8-bit accumulator through which all arithmetic flows and a single 16-bit accumulator through which all 16-bit arithmetic flows plus two spare 16-bit registers. Now, D, E and H, L, which in AT-AT terminology are just called D and H, are related. You can swap these very easily using the exchange instruction, which is in here somewhere, exchange, this one. B, C is much harder to deal with because in order to get stuff in and out, you have to actually like copy values. In addition, the only register that you can do 16-bit loads and saves in is H, L. So now two values are on the stack. So we want to pop into D, pop into H. Of course, this is an ad so we don't actually care which way around they are. Double add D, push the result onto the stack. Okay, is that going to build? That works. Let's change our program to 1 plus 3. Unpassable character to be, yep, that's because we forgot to add our operator to the table. Let's just actually put some... This just prevents them being mis-parsed as regular expressions. Okay, that's worked. 1 plus 3 is 4. Our constant folding has successfully pushed a 4 onto the stack. We've initialized our i-variable to 4. That should not be, that should be an int 16 actually. And let's actually just change this. Because you said we weren't going to do 32-bit variables. Okay, let's try and exercise the 16-bit code. To do that, we want to 16 equals 2. Okay, 16 equals i plus j. Right, and what's this going to do? Well, it's created a symbol i and initialized it. It's created a symbol j and initialized it, and then we get a syntax error because we don't support variable loads yet. So let us do that. So an expression can be a number, or it can be a variable reference. This should be easy. Look up the symbol based on the value of the text. If the symbol is not a variable, produce an error. Otherwise, load the value of the variable. Push it onto the stack. Report the type of the variable to be the, report the type of the expression node to be the type of the variable. We run that, and that doesn't work because this is capital letters. Okay, so what's this done? Load i push i, load j and push it. Pop our two values, do the add, push the result. Pop the result and store it into variable k. Right, we now actually have code. Let's remove that log line. Good. Now, it won't run because we're not actually defining anywhere to put the variables yet, which is going to bring us to our next bit. Now, there's our subroutine definition. Subroutines consist of some code plus some variable storage. Other languages would call this the stack frame, but we're going to call it the workspace because we don't have a stack frame. Cal goal is, because it doesn't have recursion, it just doesn't use the stack for anything other than, you know, expression evaluation. So, we actually just define the size of our workspace. Now, symbols get stored in a workspace. So, we need to refer to, sorry, let me just rephrase that, variables get stored in a workspace. So, a variable needs a reference to the subroutine that it's defined in and the offset into the workspace for that variable. When we define a variable, we need to allocate its space in the workspace. Now, that will actually happen here because we need the type in order to do this. So, the, what do I call that, sub, you define the variable therefore it belongs to the current subroutine. The variable lives at the current subroutine, at the end of the current subroutine's workspace. The workspace then gets advanced, the workspace size then gets increased by the size of the variable, which is width, okay. Let's just check to see if this works. No, it doesn't. Current subroutine undeclared, that's because I called it current sub. Okay, now, here we are just using the variable name as the assembler identifier for the variable, which is wrong, we're just doing that for test with. So, what we're actually going to do is, let me think what the best way to do this is, it'd be nice if C let you have functions that return strings. I need to allocate memory for them. Anyway, what it's going to be is, so all our variables going to be referred to by the address of that particular subroutine's workspace and the offset into the workspace. So, every time you refer to a variable, we're actually going to have to have this nonsense here. And there should be a better way to do this. Yeah, let's just do a function. So, we're just going to write a simple function that you tell it what opcode you want and it does this thing. Okay, so instead of doing this, we do bar access. We want to load the variable. So, we want to store the variable. Now, we want to load it. And that is, in fact, the only two places we're referring to variables. That's not right. That should be sim. Okay. So, and you know, I do not like the, let's do that instead. So, the workspace for main plus zero. Yeah. And just for debugging purposes, we are going to put the name of the variable in a comment. So, we now have it referring to variables in the subroutines workspace. What we haven't done is actually omitted the workspace anywhere. And that actually needs to happen here. It's going to have to happen after every subroutine. But we'll get on to that when we actually do subroutines. So, current sub name, current sub workspace. Right. So, now we've omitted a block of uninitialized memory six bytes long, which will store our variables. And we now have a working assembler file. So, let us assemble it. It doesn't like that 11 label already defined. It's not a, not a exclamation mark. Exclamation mark is the assembler's new line separator. Okay. So, that has assembled our program. Break line 100. Go. Load a shell with one shell. This debugger here is using Z80 op codes just to be, you know, useful. Here's the program that we actually assembled. We actually generated the compiler. So, we're here. Pop HL. Oh, that's not right. That's very not right. Why is that? So, I forgot to tell it that our program starts at 100 base address. I thought that was the fault, which is not. So, let's just reassemble, reload. There we go. Yeah. What it had done was that it had defaulted to address zero for assembling all this stuff, but we'd loaded the program at address 100. So, this is a absolute reference to an address in memory, which was wrong because of that. Now it's correct. I is stored at address 121. Load two. Push two. Pop two. Store it into J. Load I, which is one. Load J, which is two. Push. Pop our two variables. You can see them here in these two registers. Add them. Here is the result. Push it onto the stack. We're now here. Store it into 125. Stores K. And stop. We have a running program. Fantastic. Okay. I have completely missed out on 8-bit types, which are a little bit interesting because 8-bit types can't really be pushed or popped onto the stack. So, we actually have to redirect through a 16-bit register. This will become easier later, actually, once we make things a bit more sophisticated. So, I'm wondering whether to go for 8-bit types now or start adding stuff like more operations or go for subroutines or control flow. Let's do... Actually, let's just check that in. Let's do subtract because subtract has special needs. So, subtraction is slightly trickier. The addition is easy because it doesn't matter which order you do it in. So, we were able to do this. This code here, if one of the values was an actual expression node and the other was a constant, then this makes sure that they are both pushed onto the stack, but it happens in both orders. So, the thing that is a real value always ends up on the stack first and the constant gets pushed later regardless of the order. Now, with subtraction, we actually have to do it in the right order. So, that makes life more difficult. But the constant folding is easy. Oh, yeah, and let's just add our operator. So, we actually want to subtract the second value, the right-hand side, from the first value. So, we want to make sure that this one ends up in H. So, if we do it this order, then this is going to be the right-hand side and this the left-hand side. Oh, hang on a second. No, actually, this is worse than it looks. I keep bringing up the wrong thing. Because I actually forgotten that the 8080 doesn't do 16 bits of traction. So, all right. So, in the case when the left-hand side is a value and the right-hand side is a constant, then we can actually do this by pushing the negative of the constant and doing an add. So, what we do is negate the right-hand side, make sure that the right-hand side is pushed and add. If it's the other way round, then we are subtracting a value from the left-hand side, then we actually have to do things the long way. So, resolve expression type here has just pushed the constant value onto the stack, which is now like the wrong way round. That is... So, we want the constant to go into D. Hang on, sorry. We want the right-hand side, which is the expression to go into D. But we know that the right-hand side was pushed first. So, the right-hand side is the next thing on the stack. And the left-hand side is... Sorry, it's the other way round. Okay. Otherwise, then both things on the stack were values, and therefore they are pushed in the right order. So, pop the right-hand side, followed by the left-hand side. And now we actually do the subtraction. And we're going to subtract D from H. And this is a little bit gruesome. So, what we do is we start with the low bit, and we move that into A, and then we subtract the right-hand side, move the result back into L, then repeat for the high byte, and push the result onto the stack. Two values, I minus J. Oh, yeah, and I actually need to... Let's do that. Right. What's that actually done? So, load one and store it in the variable. Load two and store it in the variable. Load our two variables and push them onto the stack. Right. J goes into D, H, I goes into H. Now we go through the subtraction, push the result, the result goes into K. And we are subtracting one minus two, so that should give us minus one. So, let's actually assemble it, run it, and see what happens. That has actually produced a zero. That's not right. So, at this point we have best errors are the stupid ones. Compile, assemble, run. And our value is ff. That's not right. That should be ff, ff. But I know what I did. So, the sub-instruction is sub-ignoring-carry. So, it will leave the carry set, but it won't honor it. SBC is sub-including-carry. So, it will actually... The carry set from this subtraction, this 8-bit subtraction will flow through into the high byte. So, that should work. 22, values are not instructions. This is because SBC is the Z80 opcode, and the 8080 version is, of course, SBB. Everybody knows that. Get that assembled, debug, and hl is ff. So, here you can see the Z80 version of the code over here on the right being executed. Good. Let us... Just do a quick... Just quick sanity checks. Let's do i-1. What this should do is add minus 1 to i. And that is indeed what it's done, which is nice. And let's do 1-i. What's that done is push i, which is the right-hand side, push 1, which is the left-hand side, pop h, pop d, and then do the subtraction. Good. This would be... I mean, we could, at this point, check for adding or subtracting constants of 1 and then use the i and x, or d... dx, dcx. Instructions that increment and decrement a 16-bit value. But that's like a cheap optimization. I'm not going to bother just now. Okay, let us do subroutines. Actually, no, let us check in. So, nested subroutines. Now, in a nested subroutine, the code normally looks like this. And a proper compiler will take this code and emit it out of line with the subroutines nested inside, so you end up with a whole bunch of non-nested subroutines. To do that, you end up having to buffer the code currently generating so that you can, like, stop, you know, you need to generate them all separately. But we're not going to bother. We're going to do this the cheap and nasty way. And we are just going to emit the code in line but then jump over it when you actually execute it. And what we do is we're going to start needing control flow. So, we need to generate a series of identifiers for labels. A subroutine, it needs to remember the label to the code after the subroutine itself. And let's add some keywords. Oh, yeah, we've just put in parentheses. So, let's just do this. That gives us parentheses, parentheses expressions, which is nice. Why doesn't this work? The R subtype end. Oh, that's because I haven't actually done any subroutines. Okay, so a subroutine consists of a sub keyword followed by the name of the subroutine, followed by a list of input parameters. We will follow by a list of input parameters, followed by a optional set of output parameters, which we're going to ignore for the time being, followed by statements, followed by end sub. Now, a parameter list for the time being is just going to be empty. We're not going to have parameters just yet. Just like here, we're going to factor this out into a sub-definition rule and for the same reason. So, sub ID $$ equals add new symbol X. We could probably use the same definition, really. Yes, we could. So, if I go for new ID, so we can actually do our new ID. Yeah, that should work. So, a sub-definition is now a sub followed by a new ID, followed by a parameter list, followed by statements, blah, blah, blah. Now, we're going to have to be a little bit sneaky here, because statements here will actually generate code, but we want to do stuff before that happens. So, immediately, let's actually do it here. So, immediately after the parameter list, we set up the sub-routine itself. We say it is a sub. The symbol has already been added. We need to... Do I need anything in here other than the... Yeah, I can just do... Let us create our new sub-routine. I wasn't actually expecting that to work. I'm not sure any of these values helped me at all. That was like pop-up help. I've spent a little bit of time trying to make it work in Vim, but, you know, it still doesn't. Name... Yeah, okay. So, let's set up our new sub-routine. So, the name is the name from the symbol. The parent is, of course, the current sub-routine. Symbol table defaults to empty. Workspace defaults to zero. Label after... This is going to be the label immediately after this sub-routine that we're going to jump to. And we've got current label here that's got the set of labels in it. So, let's just define a new label. And we do, of course, now want to jump over the sub-routine proper. Right, now we're in a good position to actually generate the code, which will happen in statements. After statements, we then need to do a bit more work. We need to emit the return for the sub-routine, which is simple enough. We need to emit the workspace. We're just copying the code from the main program here. The reason we're going to have two copies is because the main sub-routine is a little bit special in that it doesn't actually have any sub and sub syntax around it. It may be possible to factor this out, but I'm just not going to bother for now. So after the workspace, we then need to emit the label like so. And then we need to return back to the sub-routine we were emitting code for before we started to before we started NS to sub-routine. So I think that will work. Of course, it doesn't. I haven't defined new ID as being a symbol. Oh, yeah, an important thing I forgot to do is after this jump I do actually need to generate a symbol for the code itself. We now have a nested sub-routine. So if you look at the code it's generated actually I will bring it into the editor proper. Our main sub-routine starts line two. This is where the nested sub-routine goes. So it jumps over to the code after the sub-routine. This is where the sub-routine starts and it terminates with that ret and here is its workspace. Now where did syntax error? That hasn't actually still a syntax error. Why am I getting a syntax error? So it's getting to here we have the end sub so we should then start doing more statements including the var. Interesting. There's a way of getting line number information out of the XR. I just haven't done that yet. This is the debug information for the parser. So we can see it read a sub, we read an ID we collapse that into new ID parentheses blah blah blah. Parameter list statement, executing yes this isn't helping. Right, that has actually worked now. Okay. So, main program starts at line 2. We jump over the sub-routine to X0 here and we proceed with the rest of our program which is as before. The sub-routine itself lives here. Let us actually do a bar i in 16 equals 42. Compile that and here you see the code to initialize the variable is referring to something sub-routine's workspace which is 2 bytes wide here and main here is unrelated. Fantastic. We are now actually generating a sub-routine. Next stage is to call a sub-routine. There are two ways to call sub-routines in Cal-Goal and they look very similar. You can either do this. Now in unlike C where everything is an expression except for the things that aren't expressions in Cal-Goal expressions and statements are distinct. You are not allowed to have a bare expression as a statement. So there's a big distinction between doing this and doing this. And the distinction is that this one lives inside the expression tree and returns a value. This however only works with void functions that don't return anything and we're just going to do void functions just for now. Now that's a statement. So that is a ID followed by two parentheses followed by a semicolon. Now I am just I can be cleverer here actually. We're going to have some rules. New ID is a ID that defines a new symbol. Old ID is going to be one that looks up a symbol. So type ref here is a old ID so our variable lookup here is old ID so this is a old ID we check to see if it's a subroutine. We expect this to be a subroutine and then we just call it. And what does that program look like? Yup, that works. Call something, something is here. Let us assemble it something through expected identifier. What's wrong with that? Is it not like leading underscores? Hmm. Don't think it. December likes leading underscores. I think that's a bug in my assembler to be honest. Yeah. Okay, let's work around that and just prefix these with okay, that assembled correctly. Okay, we're just going to prefix the actual function entry points with an F. I don't want to use bare names because they will conflict with other things. So load it set breakpoint go. Okay, we are here. We're jumping over to one OE here. We're calling our subroutine at one O3 load 42 store it into our workspace at one OC return program finished. Good, we've got subroutines. We don't have like parameters for subroutines yet but we have subroutines. I think let's just turn this off again shift reduce conflicts why are we getting shift reduce conflicts? Now we're getting shift reduce conflicts earlier because my my expression was like limited but I get to report you can get a yak to tell you where the conflicts are report file. Okay let's do that. That generator report that did not. Do I need I think I need minus R as well valid arguments are okay state 36 conflicts shift reduce so this is the raw grammar of my program and then here are the states of the automaton that's actually making the code work. So state 36 is this and R I know what's going on right it's because my user arithmetic operators are defined as like expression expression so it can't tell whether I this expression here may contain an additional expression with a plus statement in it and it doesn't know which order to collapse them so we actually have to do we need to tell it precedence of statements and tell it that these are left associative there we go and now it's happy because it knows to it knows the order in which you collapse them okay let's that work okay let's do a control flow now it's been a little bit of a while since I've worked with cowl goals so let me just look up the loop syntax here's a simple one loop end loop is an infinite loop that's very easy so that's a loop followed by some statements followed by end loop and we need to find the values but finally the keywords now a these are control flow things we need to store a label so we're just going to do am I allowed to define type for a yep I can define a type on a basic token so let me just walk you through what I'm doing here this is an infinite loop so what we do is when the end loop happens we jump back to the beginning of the loop to do this we need to emit a label at the beginning of the loop then we do the statements followed by the end loop and then we need to emit the jump so we define a label and actually emit it and in the end we do jump D like so and that builds so loop something end loop and what code did this produce yep that looks like a loop to me good that works while we're at it let's actually just put in empty statement does nothing because that will allow us to like do this sort of thing or work fine do I want to do a while loop now now while loop involves conditionals conditionals are a bit weird they also involve comparison operators which are extremely weird and unpleasant on 8-bit machines let us ignore those for now we'll get on to those later let us actually let's make this compile so this involves actually doing a expression evaluation and assignment and assignment is an additional operator assignment is an old ID followed by an assign operator followed by an expression followed by a semicolon and it actually shares quite a lot of code with the the var here because this has an expression in it by quite a lot I mean these three lines so old ID has already done the work of looking the thing up you do want to check to make sure it's a variable we then want to resolve the expression on the right 3 and a 1 and a store ok type mismatch expression wasn't in 16 used when I was used when I was expected this is the tracing in my resolve expression is wrong I'm passing in the variable type rather passing in the variable is the symbol rather than the type of the variable so this actually wants to be u dot var dot type type of the variable ok that looks better and what code does it generate so here we've got our loop and we load I increment I we store it back again fantastic that works at some point I'm going to have to tackle 8 bit variables so I did have a plan in mind for optimizing things the code generated right now is like dreadful I mean it's simple our main compiler is 318 lines long and it works but it's not what you would call good so the question is do I want to do 8 bit variables now or wait until after no actually I'm going to do them now but first I think I need a rehydration break so be back in a moment I need a couple of to use mean acquired let us proceed so 8 bit types the reason why I've been holding off and doing these is there's a couple of different ways that we can implement them on the stack so the 8080 doesn't have a native way of pushing an 8 bit value on the stack you can push you can either copy it into a 16 bit register and push those or you can push the process status word which contains the A register and a flag and remember all arithmetic has to happen in the A register the problem is that the A register ends up in the high byte of the word if you do that so anyway let us add a 8 bit type like so now the only bit where this matters actually let's just write some test code so 8 4 now this is going to work, this is going to produce incorrect code so it has successfully allocated a single byte the workspace here is 1 byte long a single byte for the Aver variable let's just actually call that V so don't distinguish it with the A register however the code here to actually assign it is all the 16 bit stuff so the assignment code here assumes a 16 bit value now assignment is relatively straightforward all we need to do is type dot width if it's a byte, then we put the byte code here if it's a word we put the word code here now we have going to need to decide how the value stored on the stack so the two real options are so let's say we have something on the stack we can push it this pushes the A F register pair so we end up with A in the high byte and F in the low byte now that's absolutely fine because if we pop it back again it pops back the same way and A is therefore equal to 1 the trouble is that if we then pop into a normal 16 bit register then the byte value goes into H rather than L which is surprising now I do not believe that there should ever be a case when this will happen so I think we can probably make this work well let me go into the other one first the other thing you can do is do this and not sorry not that do this this will push A F onto the stack and then increment over the flag word so you end up with a single byte popped in order to pop the value back again you do deck SP pop H that will still put it into the H register yeah I saw this code generated by SDCC but it's obviously more complicated so the other thing is the brute force approach where you just do this I think I to be honest I will go for this one because we can push and pop the A register as we like if we wish to put a value into a 16 bit register then we just have to remember it's in the high byte rather than the low byte and we will end up corrupting the low byte as we have no register allocator and all my registers are hard coded then that is fine so so we pop our 8 bit value into A F and we store our 8 bit value STA is correct is it not? yes it is and we store our 8 bit value into memory and we're actually going to need to do the same code here so that makes me think factor out this a bit the expression the type and the variable the variable you can get the type from the from the variable so all we need is the expression node so the variable is percent 2 the expression node is 6 function symbol node so this is the node and this is the type VARUVAR type VARUVAR type you type width ok and where's my assign it is here and this just becomes a sign VAR is dollar 1, expression node is dollar 3 why hasn't that worked? the original cow goal language was a 2 pass compiler and you didn't need 4 declarations my prototype here is very much a 1 pass compiler so at some point I will need what am I doing at some point I will need a way to do forward declarations ok so what's this done we have loaded a constant into h we've popped it now that's wrong because of the byte order issue described earlier so this is actually resolve expression here if if the expression node has no type then then we push the constant value onto the stack but we need to know but we need to know the type of we need to know what kind of thing to push so if it's a byte then so now what's that done move forward into a push psw, pop psw store ok let's try assembling that just make sure everything's right yep that works we don't care about the constants of the flags word the 8080 don't actually store anything useful in there you know things like interrupt flag and so on they are all stored out of line so this contains solely like condition code flags so provided we don't rely on any of them we can just corrupt them at will which we are doing ok and there's some more bits to do loading things also needs to be aware of the type is it a byte in which case we load it and push it is it a word so load for push, pop, store here is our loop load variable, push load constant push then we get the 16 bit add which is wrong and then we assign so here is our add code we have at this point we have successfully resolved things onto physical stuff on the stack so we know that the two types are the same so we can actually do type you type width ok add 8 bit add now pop the right hand side into a pop the left hand side into hl add, a and h push the result back onto the stack looks correct now for subtract subtract it is more complex and it also shares code with the add so I wonder about taking this stuff all out of line to help the functions I think I probably do so let's create so that is the so this is the left hand side of the expression the right hand side of the expression and the destination node which we will need so here is our add and let's just copy this explode test explode left hand side right hand side and add code if this one actually works then I will rewrite the whole thing in cow gold and see if I can compile it with itself and this also has the advantage of being significantly easier to read and now the call to the function itself let's add left hand side destination left hand side right hand side ok now we do the same thing for subtract actually just see if I can do such a new place place this with this place this with left hand side yep right hand side it hasn't quite worked but never mind ok now the reason for this is that this allows us to chop this piece of code out completely and just replace it with expo add test left hand side right hand side ok that seems to be working let's just change my code do you subtract this has oh yeah it's added minus one which is exactly what I wanted let's do one minus v ok that has done the 16 bit add which we want to avoid so let's try and think of the cleanest way possibly we just want to wrap the whole thing although we don't know the so h is the is going to be the left hand side in both cases so I think we can probably do so this will pop the right hand side into d into actually d where e will be corrupted and the left hand side goes into either psw or hl so that then allows us to I have actually this is not right so resolve expression here doesn't actually change the type of the thing here so this gets called if rhs type is null and this gets called if lhs type is null but it doesn't actually set lhs type so if it flows through line two one three we'll then crash here as we dereference lhs type so what we actually want to nope I apparently don't have cross referencing enabled so we're actually going to do node node type equals type so that will set that so at this point we know that lhs now it has both have the same type variable so I can do so we're doing an eight bit subtract and the left hand side is already in a so the code is really simple it is simply sub h and that leaves it in a so we can then push this onto the stack with a simple psw for the word case we use the existing terrible code and this one so is that going to work yes it is nope sorry hang on this has the right hand side is actually in D so that should be sub D so I believe that has worked but it's nasty really nasty now I'm hoping that a future step will actually simplify things and make the code much nicer but we'll see so what does our test program look like I don't want to do some actual testing so let's find a variable called one and do v equals v minus one so that compiles into this code 100 okay what does our test code what does that compile code look like so the first thing we do okay I did not want to actually run my program but it hasn't crashed or anything I'd be slightly surprised if it could okay step we've loaded a with one push to af top af store into variable one two two which is the one variable four store into one two three which is the v variable now we start our loop load one two three push one ease one d is one which is what we wanted a is four which is the value expected in v subtract d is now three push pops store loop okay now we go through again yeah that's his working good this is what our program looks like in a hex editor this much is code we then have two variables two bytes of workspace and the rest of it is padding to get it up to 128 by record okay so things to do next condition conditionals gives us flow control subroutine parameters allow us to call subroutines that actually do real work which will need to get things done but the actually interesting bit is dealing with the registers because we want to try and get this code as simple as possible I mean this is kind of dreadful let's try and make it smaller oh yeah how big is our program 17k of text if you compile with os we did oh and no debugging let's actually just change that it's pretty small I mean there's a ton of stuff that needs to do to make this actually a real something you can really write code in but it is looking promising calling subroutines is a bit tricky because we need to start dealing with lists of parameters I'm just going to pass all the parameters on the stack and then the subroutine when called will pop the parameters off the stack and put them into its workspace this has some advantages over just storing them directly into the the subroutine's workspace which is what the old cal-gold did both code size in that storing a variable is 3 bytes so we're pushing it on the stack is 1 and also it allows to subroutines workspace to overlap in a more safe manner yeah the old cal-gold had this really cunning if I do say so myself algorithm called the placer which walked the it's this it's actually pretty pretty small is it this? is this the placer? no it's not the placer it's the is it the classifier? it's the classifier yeah it's a much chunkier piece of code this is just like one file of many what this did was it walked the call graph now cal-gold does not support recursion this means that each subroutine can appear only once in a given set of subroutine ancestors subroutines may not call themselves this means that a subroutine workspace can be shared with any other subroutine provided you can guarantee that the two subroutines will not be called simultaneously that is one calling the other a vice versa with any calls in between and what this did was it actually thought this out and the result is amazingly effective even with big chunky programs like you know the cal-gold compiler itself it was reducing the amount of variable storage enormously and on platforms like the 6502 where point of variables have to live in zero page because they won't work otherwise the instructions are not there to do variable dereferencing unless the address you're dereferencing lives in zero page it just wouldn't have worked otherwise you only have like 200 odd bytes of zero page memory and yeah this was really effective there's all sorts of cool things you can do once you get rid of recursion we're not going to do anything with that in this prototype besides I probably can't remember how the algorithm works anymore what was I saying oh subroutines so the trickiest thing with subroutines is type checking the parameters because we have to store both the formal parameter types when the parameter is defined but also in a in a function call each of these things needs to be pushed onto the stack but correlated with the parameter of foo and type checks and potentially if it's a constant coerced and that's actually a little bit hard to do in Bison I would need to do some experimentation to see what the cleanest way to do it and that's not going to be interesting in this but let us commit this and let's start looking at the generated code now in a stack based architecture like this is one of the common tricks is that you don't actually push the value on top of the stack this comes from forth where top of stack is used a lot you can see here here we do a push and then a pop here we do a push and then a pop here we do a push and here we do another push but this one's followed immediately by a pop and likewise push and pop so by avoiding doing this push we just defer it, we store the value keep the value in a register then if the next thing is a pop then we can just if it's in the right register we just emit it and if it's like this and it's a different register then we can copy it much faster simpler and let's implement that now the way we do this is in our expression node we already have if type is unset then the value is stored in the node itself so we're actually going to put an additional an additional parameter called reg that is just a it just tells you where this expression is stored let's call this location and the location can be either on the stack which is the default in hl in a yeah we probably want to expand at some point to more registers but let's just keep it like that for now so let's okay so this has just loaded a value so rather than do a push we simply say is in a hl I don't think we've got any other references here I don't think we do this is going to this is going to build a generic bad code but it does at least compile so now this is where to resolve expression type and likewise we do location equals in a okay I'm just going to ignore this stuff for the time being so let's focus on the slightly simpler case which is assignment so here we're going to want something like if location is not in a then copy value into a somehow but we are going to want to farm that out to a helper function which is so this is actually the node that we are wanting to store so put node in register where's my assignment gone here it is so rather than just popping it we just say and node in register the node is node and the location you want is in a in hl okay and somewhere we have resolve put node in register so if it's already in the right place do nothing if did I call my stacked if it is on the stack then then pop it from the stack into the register if it if it is not stacked then it's already in a register but it's the wrong register and we currently do not support this 8 bit values will either be on the stack or in a 16 value 16 bit values will either be on the stack or in hl but not in any other register so change the location to where it is now otherwise fail so this has actually reduced the size a lot relatively optimal now this is I screwed up if something is already in a register then we cannot we cannot put something into the register without pushing the old thing first so it's not enough to store the location in expert node we also need to store what node is top of stack or rather we want to store what value what node is stored in which register so we can say I'm about to use a and therefore a gets pushed if it needs to be but this starts getting complex when there are more than one when there's more than one live value in registers because they must be pushed in the right order to be popped back again later so it may be a useful rule that you're only allowed to you're not allowed to push something that we don't know is top of stack okay let's keep track of what the top of stack is in any new expression I'm going to do it like that so we're going to need a invariant that if there is no expression currently in operation then top of stack is null and we're going to enforce that by setting it to null at the end of anything that completes an expression and right now the only thing we'd have of doing that is the assignment operation so by doing this we ensure that top of stack is reset so now all the places where we actually push something onto the stack need to be annotated of course the expression nodes live inside yaks stack so we can't just take the address of one cursors I think that we don't necessarily need to keep track of what the value is we just need to know that the register is in use so this is probably eventually going to turn into a really basic register allocator but again assignment applies so let's have to register in fact no no no this is not right what we're going to do is just do toss location and this is going to be the location variable of where the top of stack is so there is only ever one top of stack so toss location equals like so so in fact this is going to be just evict top of stack so if it's in A in HL push otherwise do nothing there's nothing needs doing once it's finished it's stacked we put this assignment here so that if you try to evict toss when it's it's none or stacked then nothing happens so what we're going to do here is just do evict toss we're not going to evict toss for the number because those live outside the stack completely they get coerced onto the stack separately but the fact that they get coerced separately is actually what caused us our problems in sub here what I've done with other compiler projects in the past is have a a a pseudo stack so the compiler keeps track of what the stack actually looks like without actually generating code to like put things on the stack you need this if you're going to map stack entries onto memory locations statically which we will actually have to do if we ever make this work on the 6502 so probably I actually want to go down that route so in the interests of ruthless simplification because remember this is supposed to be the minimum viable product for a compiler we are actually going to we are actually going to hmm things if we do this then this will completely wreck our ability to do constant folding so either we go for the fake stack option which is perfectly viable and not particularly difficult we don't do that at all so we shouldn't if you want the fake stack option we shouldn't need an actual fake stack because we can use the the yak expression node stuff to do this instead because this does actually correspond to the stack the issue is that we don't really have control of when yak will create and destroy stack items so our ability to do lazy pushes is a little bit minimal let's keep that as it is for the time being but so loading a variable we victop the stack expression adding to resolve expression type this definitely is going to the victop the stack but this is not I forgot to set this okay let's look at add is the constant version this is the um this point you want to get to the top of the stack is psw so let us try and remember what I call my function register yeah that's not quite right we don't want to put the node this has to be the top of stack let's just bodge this for a moment in register is going to be of course I do not know which one of psw and h is the left hand side or the right hand side because at this point one item is in the stack and the other is most likely one of them is on the physical stack and the other is probably in a register so rhs is the one that's probably in the register so we do in so it's either a or hl the pending on so at this point we know that the one of our values is actually in psw or h so we don't need that we don't need that toss location is in a location is in hl yeah not happy with this neither is the compiler so load a value into a the value is then evicted into the stack oh this is the subtract code that I haven't done okay what does this do load v into a evicted onto the stack load one into a pop this into h add h to a store back so that has generated correct code and it's actually commendably small the the one thing we could do to fix this is to simply rather than having a portion to pop is we just do a register move into h that has actually worked so can we apply the same code to this yeah let's just optimize this a bit so this is so for sub we want the right hand side left hand side even yeah we want the left hand side to be in a or hl and the right hand side is going to be in d otherwise the two values are both no no we can't do this because this is no longer top of stack hmm so in fact this has to be the right this has to be the the right hand side and it's actually going to go in this in ps w or h is the right hand side because I'm taking the type from the right hand side but this is actually the left hand side you want to put into ps w or h and the left hand side is in fact our constant so this is correct you want to do this the left hand side goes into a or hl and the right hand side goes into d now the e in the other direction so not the other direction when the left hand side and the right hand side are both values the right hand side has been popped first so this is going to be and we can't do this here because the so actually we want the left the right hand side to be in d that is the thing that got pushed last so we are actually going to want in d in d e and we're going to have to add constants for that later then we pop the right hand side into a or hl yikes okay the subcode is fairly straightforward did I add a location to expert node yes did we ever use it yes it does need to be right so where's our add and sub so this okay um yeah this is sketchy as hell I am quite convinced a lot of this stuff is wrong but that's generated the same sort of code which is nice let's just try something a little bit more sophisticated that's going to push a number of things okay what's this done push v add one right that that's left it's left associative I just put quite a lot of effort to making it left associative so it's not actually stacking up let's do this okay right load v push load one push load v push load one add add add store that looks right ish and to be honest the code's not bad of course I don't have anything that does real work so I don't have anything representative other than stupid little bench tests what could I do to actually do work I'm going to have to add more features to work so let's just do let's just commit this sketchy attempt at optimization so let me just show you a thing which I may or may not have prepared no not that one okay so this is a little copy memory routine I did in original cow goal what you can see here are pointers so source is a pointer to an in-date here's a subroutine to actually copy the memory and here we invoke it with some random values just to generate some code we can look at now I want to try and compile this code with new cow goal so we can compare the code quality and while I'm there let's I did pre compile this so dz80 er yeah here is the code that got generated so let's original cow goal generated z80 code rather than 8080 code so it's going to be a bit different but anyway you can see here here is the main program which is the copy memory here and it just loads up values in these are the workspace variables for source, desk and length this version is poking directly into the subroutine's workspace and it's tail optimized the call to the subroutine the subroutine here actually does the work so er load length check for greater than zero that's terrible code I should use not equal to here's the code that actually does the copies so the z80 allows you to load and save bc and de via memory addresses and the 8080 doesn't so it's going to be a bit different but let's see if we can make this work now I need pointers to make it so the pointers are interesting pointer types do not have names in cowgold they're all anonymous and we need to create them lazily so to create a pointer we whenever you reference a pointer to a particular type it will generate the new type and then cache the pointer to it here so the next time around you don't need to do it again but let's find our parse tree type reference, right a type reference can be either an id which is a just a straight reference to a type or a type a type ref wrapped in square brackets so if no type ref has to be a pointer so if this is correct then we just do otherwise we need to create a pointer which we do as create a new symbol the width is always going to be 2 the the other thing that pointers always have to change the name of this so if it is a pointer we also need to keep track of the thing it is pointing at which is another type so let's do this so pointer to and pointing at I think is suitably confusing pointing at is of course our current type and I think that's all we need oh yeah we need to put the things in the lexer why doesn't this work dollar 1 of type ref has no declared type really type ref has no declared type it's got a type right here oh dollar 1 of type ref yeah okay dollar 1 is the square bracket dollar 2 is the wrapped type okay now we're going to change this to equals we're going to have to put a lot of features in we're going to finally need to do our conditionals and we're going to need some unsigned types on these 8-bit machines signed types are really tricky so let's ignore those for the time being just make these make everything unsigned so it's going to get as far as the while and then fail because we haven't done while so let's just let's just copy there's a bit more we need to do here so the other thing you need to know about types is you do need to know whether the type arithmetic type checking rules are more complex right so resolve expression type currently was focused solely on width so if there is if the right hand side is a constant then we just load it the left hand side is yeah okay that works however we're going to have need to do some more work here so this is based around arithmetic what this is going to do is this is adding a constant to a value now you're allowed to do that with numbers you're allowed to do that with pointers but what this is going to do is it's going to try to turn to coerce the right hand side from a constant into whatever the type of left hand side is which when you can't add two pointers you have to add a pointer and a constant so we're going to have to do if the left hand side is in fact a pointer then the right hand side needs to be coerced to a uint a uint16 or an int16 if the left hand side is a type if the left hand side is a constant and the right hand side is a pointer then we are simply not going to allow it yes I know addition is supposed to be commutative but when you're adding integers to pointers they really aren't okay let's do a little bit of testing pointer to an 8 bit type equals 0 symbol uint ah right what this has done is it's tried to actually find a name for our pointer type now we kind of actually we do want a name but we don't really want it to be in the symbol table because it's anonymous we need the name debugging purposes so do I have an a printf I don't have an a printf let's make one the symbol let me do the symbol check only only check the current symbol table if there is a name and yeah and only add it to the symbol table if there is a name too so this means that pointer symbols will not appear anywhere in the symbol table they won't be part of any scope chains they're just like floating in the next bit oh yeah a printf so a printf is a common extension it's an allocating printf I don't know why it's not in the spec it's really useful I know that glibc has several different versions of them but for this kind of purpose we just want to generate a string and have it hang around forever it's so useful so let's actually just do that and you should probably split this file up at some point here we go and the functions s in printf do not write more than size bytes the return value is the number of characters x and y which would have been written turn value of s in printf s in printf is called 0 s in printf is less than 1 c99 allows steward to be null and gives the return value as always as the number of characters that would have been written c99 is now the correct one so let's go with that including the terminating null bytes allocln like so intriguing I believe I know what might be going on let's try that yeah okay un-matched in the lexer and for some reason I don't know why . here wasn't matching them but for some reason this by default the lexer will just like output un-matched characters so it was printing all the line feeds okay so that seems to have worked let's try some arithmetic what do we got here store in the pointer load pointer push it load1 add store back to pointer I should add that cow-gold's pointer arithmetic is not the same as c's the numbers in cow-gold are always addressing mode units in other words bytes like actual bytes and not c bytes did you know if you're confused by what I just said the c specification redefines the word byte to be a synonym of char so that if you are using c on a platform where char is say 16 bytes wide 16 bits wide then a byte is also 16 bytes wide it is incredibly confusing if you're on one of these odd systems where everybody refers to a byte as being an 8-bit unit and a char as being the character specific unit I don't know why they did that so what was I doing oh pointers we want to dereference a pointer so what I went 8 equals right now the syntax here uses c-style array syntax for dereferencing pointers I have actually changed my mind about that one so I'm going to turn it into this the reason is is that array style dereferencing for pointers has got an implicit addition that is you are indexing the array and these 8-bit machines don't like doing that so what I'm going to do instead is if you want to dereference the pointer then you can only dereference the actual value of the pointer if you want to dereference if you want to index it as an array then what you need is a pointer to an array with a known length the length may be infinite now this is so that platforms like the 6502 which have built-in uh which have the ability to do built-in indexing where the register contains a 8-bit wide unsigned offset uh this allows Kaggle to generate better code because it knows that uh the thing being indirected is an array of 256 bytes or or fewer than 256 bytes without that knowledge it would have to assume that uh the pointer the thing being extended forever and it would have to do a 16-bit addition to the pointer which involves quite a lot of bytes now I don't know if the prototype will ever get that far but let's use the new syntax anyway because it's simpler so let's go back to our code and we want here so dereference we want square brackets plus the expression so if $2. if it is not a pointer give up otherwise now generating code we need to get the value into hl put node in register and now we need to dereference it and on the AT-AT this is moderately annoying what you do is you use mauve you could actually do, yeah let's use mauve so move so m is a pseudo register that means the thing being pointed at by hl uh the ZAT changed this syntax this is equivalent to that which is frankly easier to understand increment hl so a is the low byte move the high byte into hl which of course corrupts hl so we can't use it again for this and then move l move a which we loaded here into l yes it's that bad type is the thing being pointed at toss location equals dot location equals in hl if it's a byte then just do that only go through this stuff if it's a 16 bit value what's this complaining at struct symbol has no yeah what's this done load 1, 2, 3, 4 and store it into our pointer increment the pointer load the pointer into hl why have we just pushed it and here we do the actual dereference and we store the value back into i in a reasonably optimal manner but why have we just i think i've done this in the wrong order fact i don't think i want to call a bit to toss at all there yeah the evictos was what was causing the push but i think this okay the whole top of stack stuff is completely borked i mean the principle is alright but the actual implementation is a mess i i think i am going to have to implement my own stack i don't want to because i'm sure i should be able to get this information out of yak stack somehow but yak stack is based on the syntax tree and it's also full of stuff like other identifiers what i'm actually trying to do is to keep track of which registers stack elements are in and this is rapidly getting towards the kind of register allocator that i was using in original cal goal that i was not wanting to do here the other thing i could do is not try to keep track of top of stack at all and just use a people optimizer to remove pushers and pops that might be good enough and would produce much simpler code i think for the time being well that's actually sort of works except the whatever's going on here you know i know that the expression them cannot be anything in a because expression must be a 16 bit value therefore it must be either in hl or the stack you can't have a and hl use at the same time so i do not need to evict anything at this point so i think that code is actually like right and while i'm at it this this thing's annoying me so if the left hand side is a value and the right hand side is a constant you put a special rule in that if the right hand side is the value one then the left hand side i think in our in our in our a else inks h i think what i'm going to do is i'm actually going to strip out all the top of stack stuff and go back to a raw push thing because i'm not sure this approach is actually simple enough and clean enough the other advantage of the the other thing i would like to do is be able to remember what values were in in what register now this is actually different from the stack stuff remembering what value is in which register means that we could tell that saved it here and we've loaded it here therefore we don't actually need to load it again again this is something a people optimizer would be able to do rather more simply proper cal goal worked by deferring all saves until the last possible moment so in this situation it would have loaded one two three four and two h recorded that the value of h contained a dirty value a dirty contained the current value of this but this was not up to date so it would then have suppressed the load because it didn't need to do that it was already in the register the increment here would have told the register allocation system that this value was changing and therefore did not need to be written back both the save and the load would be suppressed as it knows that h still contains the current value of this then when we come to this code actually loading it actually now I'm thinking of loading a 16th value this is loading a so it would still would not have needed to write back this value only when we come to the end of the basic block the ret statement does it write back this to the actual variable now actually I don't think I implemented it but it would have been able to suppress that right as well as this is a local variable and this is a ret statement so it knows the variable will never be used again so that would have done everything here in registers but that's starting to get really quite complex in different ways just just knowing what a value is is tricky Calgo had a really complex way of doing this that was kind of she shouldn't have gone that route I mean it worked but but that's a slightly different problem than keeping track of the stack so Calgo didn't use the stack at all for arithmetic that this is old Calgo new Calgo is what old Calgo did was statically mapped all queries to variables so that this system of deferring loads and saves actually applied to all temporaries I got into a right model with things like aliasing because there's a number of real subtleties around pointer dereferencing like this piece of code here if we're reading dereferencing source and reading from it if source can point at length then we have to write back length so that we know that when we reload source we get the right value and old Calgo didn't have good enough rules to be able to tell this to tell whether this pointer and length aliased in a reliable fashion so it just had to write back whenever a pointer dereference happened it would have to write back all variables likewise writing to a pointer if desk happens to point at length then writing to desk will change the value of length but the compiler doesn't know this so yeah it made dealing with pointers very expensive in old Calgo I think the solution here is to simply declare that of that aliasing cannot happen unless explicitly declared so that we know that desk and source do not point at length because length is not declared as being aliased and you'd have to do like that and then every time you touch the pointer length will be flushed back to memory and flushed from the register cache dealing with the stack anyway I have that bit working including the increment so let's just check this in and then let's go and strip out all that top of stack nonsense I don't know what I just did oh pointers okay let's get rid of this get rid of this don't want that anymore we do want this here we're going to have to start pushing stuff again incidentally the reason for having a leading space on all the opcodes is the assembler actually requires it to distinguish between an opcode and a label so we want this to be in either p psw psw hl lose that lose that okay there's this horrible thing again right this seems to be swapping the order of the values so so get the order right again left hand side is a constant right hand side is the value resolve expression type will push the constant we want so the constant is top of yeah okay nah that's wrong so the top of stack is the thing that wants to be the value modified so that's the first in this situation the top of stack is the right hand side so now I do want to swap these round and now we get onto the actual code so psw okay interesting thing about the 8080 and the z80 you cannot do 8 bit reads or writes for using an absolute address except through register A very odd omission like you can read and write 16 bit values but not 8 bit a common trick I've seen is to load a 8 bit register using a 16 bit load and corrupt the other register in the pair that complaining about 316 0 move that move that psw alright what's this done that may be a typo why is that popping resolve expression yeah that's a typo so load push pop save people optimizer will simply make that go away pop hl that's another typo I think I was talking about the z80 when I was doing that one push age yeah push age pop age push age pop age push psw pop yourself yeah there is nothing here that a people optimizer couldn't do vastly more simply efficiently and easier more easily to understand in fact we could do that using code that would be even better likewise to store the stores and loads try that I was thinking of creating functions to do pushers and pops but actually that's a little bit dangerous because I need a immediate immediate I need to eliminate these only when I see a push immediately followed by a pop or at least yeah immediately followed by a pop so in fact to do that I would need to wrap all my code emission in a helper function which is like worth doing anyway so this would actually look something like that actually because I need arbitrary parameter on the right-hand side but I also need to keep track of whether it's an H or a whatever one way to do it is simply the brute force approach so I just remember the last string printed that's kind of nasty another thing is just to abstract all the code generation out completely away from strings so that would be so this would be an enumeration that told it what the actual opcode was this would be the primary parameter the register really but that wouldn't actually help with this or have multiple similar just have a family of functions each that does different things that would make it type safe so that would be this sort of thing likewise because I actually tell it what register I'm using the emit code can keep track of what registers are changing that would probably help future register allocation stuff if I ever decide to do that I think I prefer that idea these would all be lots of little small functions but that's probably a good thing okay I am actually going to need to take a break soon in order to make dinner it is now nearly seven o'clock that means I've been going for four and a half hours I'll be getting anywhere closer to actually running our code well we don't have any we don't have any subroutine parameters yet so let's just like mock it up oh yeah a cowgirl will let you cast a numeric constant to a pointer it's very useful let's commit this oh yeah one very important thing apart from the while itself one very important thing I missed which is assignment to via pointer in direction I will need that by a special code let's see if we can make this work unpassable character it doesn't like this anyway let's do the while loop shall we? we need some operators this is going to involve doing conditionals equals not equals possible character 3D oops yeah assignments always use colon equals and equalities or use double equals there is no single equal statement in single equal operator in cowgirl and this is not a operator so you can't use assignments inside expressions because it just makes everything complicated so this has that's all I expect okay let's do while so the syntax for while is while conditional loop while it's of course has a label on it in fact it's got two labels because we have the label at the beginning and the label at the end repeats until only requires one label which makes it much easier to work with but that's not what we've got so we actually want code before the conditional we want to create a label here so this is going to be the label that actually starts the while loop and the end of the while loop will jump back to this then we have the conditional the conditional will this is going to be a bit awkward actually what the conditional does is it evaluates the special conditional expression you get and decides whether to jump to one thing or the other now I don't think I can generate code without telling it what the labels are oh yeah better names for these actually so the true label is the one that you jump to if the expression is true and that makes the while loop continue execution false label is the one at the end that stops execution so in fact the false label goes here hang on I've forgotten the huge chunk of stuff so while conditional loop statements end loop there we go so the false label goes right at the end so we need to somehow persuade the conditional that these are the two labels which should jump to and this would involve telling hmm the simplest thing to do is to leave the is to leave a value on the stack and we just jump according to that but this doesn't allow you to do conditional short conditional short circuiting which I rather want so I believe the conditional needs knowledge of the thing it's embedded inside and that's beginning to get to be an advanced bison topic for the purposes of making this bit of code work let's just assume that the X value is pushed onto the stack and it's a or condition code yeah I can use a flag okay so we're going to assume that the Z flag is set for true and not Z for false which is going to be we're just going to do some basic like comparisons for now so let's just go with that so let me just look up the mnemonics jz, jnz yeah so if the if nz means false otherwise it will fall through to the body oh yeah and I forgot a bit here we do we jump back to the beginning of the expression right let's go let's do a conditional in cowgirl conditionals form their own separate expression tree to make short circuiting work but we are just going to do expression equals expression or expression not equals expression just make sure that builds you know it doesn't dollar dollar for the mid rule that WTG statement has no declared type $1 because we're actually sticking this data in the on the while itself now you know I need to get access to the while from the conditional here the code is going to be garbage but that's at least it's got a bit further so we have at this point we have two things on the stack so this is actually very similar code to the ad so let's just take that out of line and as I said before this is a bodge so this will actually emit the comparison and set the flags so for not equals we wish to emit the comparison and then flip the flag and now to flip the carry flag using CMC possibly CMA I can't remember what these are I go to this page so often also why do I not have a horizontal scroll bar yeah the piece of text I actually want to read is things come off the side of the screen so contents of A are inverted inverts the carry flag so I remember seeing a page of really useful Z80 flag tricks you can do so how do I invert Z flag oh of course I not it don't I I'm an idiot so that is now it uses to XOR XOR with minus 1 okay no that does not invert the Z flag that inverts the accumulator and sets the Z flag off the value of the accumulator which is not what I wanted right I I'm just trying to remember my yak I believe that I can use $-1 to get at the thing conditional is nested inside is going to be the while and the while is going to have my 2 labels on it but again that's not quite right so going to need more labels so loop label is going to be the beginning of the um the loop conditional here is going to jump to either true label or false label false label is after the loop true label is the loop body now this all is predicated on conditional be having access to $-1 which I believe is true I think I can do this I might actually need to I think I need to do this so if this doesn't work it's going to be an instant crash okay so I think we can do construct loop labels labels equals alright now let's find my add actually it's more like a subtract in some situations it actually is a subtract and because my piece of code actually has got a not equals in it and not an equals let's just cut this out for the time being and just implement not equals there right so if the left hand side and the right hand side are constants then and not equals is interesting because not equals actually jumps the other way around to the loop so if it's true then we jump to the to the false label and vice versa just to make things more exciting so I could use a not equals here and put them the right way round but that's actually going to make life more complicated later we don't want that we are comparing 16 bit values so we're actually what we do is what do we do if I do need this extra bit of code we need this both ways around if the right if the left hand side is a value and the right hand side is a constant I'm getting myself quite muddled yeah it's been a while I'm a bit low on blood sugar but this is I mean I've been muddling myself by thinking about magnitude comparisons that is less than or greater than which are horrible but this isn't a quality operation which is much simpler now we want to for the simple version we just want the two things on the stack and we pop them off and then we compare them a bit at a time on the Z80 you can actually do a 16 bit subtract and then test the Z bit but I can't do that here just leave dad sets the Z bit now we can also if we want to compare with a constant we can optimize a bit by using an inline constant we're not going to right now so anyway if the left hand side is a value and the right hand side is a constant then the right hand side gets coerced simple enough it's the other way around do it like that simple enough now both values are on the stack so if it's a byte it's called compare yeah then compare a with h if it's a if it's a word then compare l into a compare with e if they are different then we jump immediately to true label actually you know this doesn't make things easier at all if they are different jump immediately to true label short-circuiting the rest of the comparison otherwise do it all again with the high byte if they are different jump immediately to true label actually if they are not different jump to false label why are we getting syntax error there oh I'm sorry it's actually compiled it's actually complaining at this bit now so what do we got here while length is not equal zero load length load zero that bit's also optimizable do the 16 bit compare if they are different jump to x zero which is the true part of the code otherwise jump to x one which is the end of the loop okay that looks like it's working so now we just need our our dereferencing assignment now I'm actually cheating here so the thing on the left is not just an ID it's actually an L value and an L value is a specific class of expression that refers to a assignable object languages distinguish between R values and L values old cow goal managed this because all values were backed by memory locations so there was no difference between an R value and an L value but we're a little bit different with new cow goals so we're probably going to need some more work here but for now let us budget so this R and inspect that is now true and this is not a normal assignment anymore so let's take this out of line this is going to be a deref assignment this is going to be the pointer so the L value stuff will come back to haunt us if structure support and arrays and so on show up because we'll need to have some kind of some kind of abstraction around all the different things you can assign to it is entirely feasible that well assignable things boil into two varieties there are the things with static addresses where we want to use like stahld a variable is one of those and the things that do not have static addresses that will end up at deref assignment here and for now we're just going to be the only thing that you can do a dynamic assignment to is a pointer but that's all we have in this version of the language old expression type now node here is on the right hand side it is the top of the stack so we can do this pointer type new type pointing out we want the pointer type new type that's a dot actually yes it is so we are top of stack is the address this is always going to be an hl it has to be so let's just put this in so the next item is the thing we're storing and this can either be a 8-bit value and to do the dereference we then just do m,a or a 16-bit value so we pop that into d d the low byte h store the high byte this is so much easier than loading because we don't corrupt the registers as we do it does this work no, it rocks really man getting tired with 3 case label not within a switch statement hey it compiled like all of it wow that's dreadful code however we did take out all the optimizations and now I'm going to go through and manually peephole it so push pop push pop push not that one that's not right this is my variable dereference so we are what it's done is right I know what's happened we don't want the width of the pointer we want the width of the thing being pointed at let's try that again yep that looks better no push psw pop h this is the this is decrementing length to here right now we do the actual copy so so this is the bit that actually loads it ok that's fine and then we push it onto the stack and this is the bit that does the store based on the address that's here the value actually being stored is second on the stack so yeah that's wrong so the value goes first always and the address goes second so we actually want to do this looks more like it ok peephole time to dd push h pop h push h pop h push h pop h push h pop h push h pop h push pop push pop pop push pop push pop ok and we've got the store load just looking for ok we have to keep all those stores and loads this is a comparison against zero and we could actually optimize that considerably because we don't actually yeah you can do it in like three instructions you load it and then you compare you load it you copy one byte to a you compare the two and you're done you don't need the 16 bit comparison at all in fact if you compare with a constant you can use inline stuff and this gets much smaller but let's ignore that for the time being so this could trivially be turned into a load decrement save that's like very little code here is our so here is the here is the code that actually does the copy this line and yeah that's about as good as you're going to get I mean you could you could shift this destination load down here that would be preferable because then we wouldn't need these push and pops the current code can't do this at all if I were to use a stack automaton then we just remember that the value on the stack was actually pointing at desk here so we wouldn't emit code at this point or push it but when we needed to get access to it here we would then materialize the value off the stack as a load but that's a little bit risky because the 8080 is sufficiently unorthogonal that if we needed it in a register that wasn't HL would be out of luck because you can only load values into HL given the way the 8080 works and that you basically only have like HL and a handful of secondary parameters we're barely using it elsewhere but that would save two bytes that looks okay to be honest let's compare it with the z80 code so that's over here we've got the initialization here's our loop and this is where our loop starts so what we're doing is we're loading length oh I never got round to optimizing that okay this is our 16 bit comparison in z80 code we're using let me rephrase that again this is our 16 bit comparison against zero and what we're actually doing is we are subtracting the two values and branching on the result so this is actually four bytes for this instruction four five six seven eight nine ten eleven twelve twelve bytes to do the comparison against zero and what we've got here is three four five six seven eight nine ten eleven twelve thirteen fourteen sixteen seventeen eighteen nineteen twenty twenty one which is rather more however we could actually simplify this to it would be easy to simplify this to let's see if they are different row to x zero if they are the same go to x one three four five six seven eight nine ten eleven bytes but these three instructions could be peep-holed into that so that's one two three four five six seven eight bytes so that's actually smaller than old cal-gold was producing here and considerably faster of course there's no reason why old cal-gold couldn't have done this as well I just apparently never got around to it so what about the the actual copy so we decrement length now this is well that was pretty obvious and yeah you could actually turning this into decrement would be trivial so that would produce exactly the same code we now get the actual copy which it does by putting the two pointers into bc and de so this is four bytes one byte four bytes one byte so ten bytes what do we have here three four five six seven eight nine ten identical but this is faster I discovered later these dereferencing through bc oh no no I tell a lie those are only three bytes no sorry I'm getting myself confused again these instructions are loading bc and de from memory and these are extended instructions these are these ones and these are four bytes long and 20 cycles meanwhile loading hl from memory is three bytes and 16 cycles so new calgo wins there and we have the incrementing source and dest is exactly the same code although the the the saves are interleaved due to the keeping things in registers until the last possible moment so we have here we load the first parameter and increment it here we load the second parameter into bc and increment it and of course this is using the slow increment and then we write them back so again new calgo wins and then we jump back to the beginning so what I'm getting out of this is that all that work I did for old calgo was a complete waste of time because this new compiler is generating pretty adequate code at a fraction of the size how big is this 700 lines okay I didn't need to write my own parser or lexer but uh like the code generator alone is uh that's not the actual yeah getting a line count for old code gen is actually pretty hard because there's lots of stuff elsewhere so we've got all the architecture specific stuff uh how are we going yeah okay uh how big is this thing 20k of x86 code well I would be compiling it with itself and I can't really estimate the code size to be honest but assuming it's comparable then even on a 602 machine that's not bad 20k is a little bit chunky but there's plenty of space for actual program and we don't have to store anything in memory except for the um symbol tables we don't keep NAST yeah well interesting how would I do the people optimizer uh I would probably want to go do the stuff I talked about earlier with a complete code emission library so that would add a bit of space hmm so the other thing I want to think about is portability so the 8080 is actually fairly easy to write adequate code for because there are no possible choices uh everything has to go through hl everything has to go through a if I wanted to generate code for an arm this would be atrocious so I would need the ability to use like more than one register the simplest thing to do there would probably be to keep using new registers for every new value and remember the values in the old ones so that we we wouldn't need to touch the stack too often but we have to do all this yeah I need to put more thought into that um my other target architecture though is the 6502 which has even fewer registers and on that platform we would just give up completely with the stack we would statically map the existing stack into an additional block of subroutine workspace so it would end up having like the workspace containing the local variables but also a workspace containing the stack so whenever we do a push or a pop we simply load and store static values we can do that because we have no stack frame the thing there is that loading and saving a 16 bit value on a 6502 like if you had it actually in memory, in registers and realistically you wouldn't have it would end up being that six bytes using a real stack you can just cite to now the 6502 does have a stack so you can do haphx that's two bytes however the way the code we've been generating here works is that we do all operations in registers so we pop our two parameters into the register we do the addition h plus hl plus de we push the value back again the 6502 has only got three eight bit registers so there's not enough space to actually do that so even if we had like four bytes of stuff pushed onto the stack we still have to like pop them off and put them somewhere to do the work so I would need to do more thinking about that okay I am going to I think oh yeah check this in I'm going to go and make and have dinner and then I'm going to come back and I'm going to think about my stack automaton to see whether I can make this work yep alright be back later okay so dinner has been had, tea has been acquired, blood sugar has been elevated and I've been thinking about the this problem and I realized that there are in fact three distinct different problems one is dealing with the virtual stack we want to try and minimize the use of pushes and pops where possible because there's a lot of them let me actually just rerun the compiler to generate the code there we go and we want to clean this up the second problem is making the people optimize work because we're probably going to need one the third problem is dealing with let me start that again the third problem is trying to preserve useful values in registers to stop us having to reload them either from memory or via an instruction for example that's not a good example I'm not actually sure this code has any but the previous times we were seeing a store immediately followed by a load likewise if you push a constant and the constants used in multiple places actually I can do I can do a demonstration 16 equals 0 i equals 1 i equals 1 if I compile that you see here it keeps loading the same constant now I as I have limited time for the prototype I think I'm going to focus on the first one because that seems the most interesting the second one is probably just rootforce programming it needs a a code emission library that like keeps track of the code like keeps track of the last thing emitted etc that's not very interesting it's just like work the third one I am not actually sure will be particularly useful beyond maybe this specific use case here the problem is the 8080 has so few registers and everything passes through either each or a whenever we work with a value it's going to end up in hl and chances are the next thing you're going to do with it is going to corrupt hl so there's no point remembering what the value was I mean we could start copying the value into other registers but I think we haven't used bc ever on this but as this is a single pass compiler we don't know if we're actually going to be using bc in the future let me rephrase we don't know if we're actually going to be using that value in the future so copying it into bc maybe completely wasted work so I don't think I'm going to tackle that at all but I am going to look at the stack because I think that's actually a lot simpler than I had made out and in the interest of code organization I'm actually going to create a new source file which is the virtual stack so let's okay and what this will do is it will allow you to push and pop registers it will allow you to push and pop various values and it will keep track of the things that have been pushed and popped and either track them virtually if they are trivially rematerializable values such as constants or actually like push them onto the real stack and I believe this will help code production considerably but we'll see and now registers need to be able to refer to our various registers rega reghl regde regbc there aren't very many registers we care about so that pushes a register that rather and will also emit code to push that register pop into a register it may also be interesting to have a pop into a register class where if it's already in a register then we leave it there that kind of overlaps with the problem 3 of keeping old values in a register vpush const this is going to refer to the just to vpush address this is going to be the address of a symbol and the last one I'm thinking about is the value of a symbol this one is tricky because the 8080 I mean this rematerializing this should just be a matter of loading it into a register except of course we can only load into hl so I'm not sure whether this will actually be viable but let's try and of course we can only pop into registers though it might be interesting if this works to allow popping into a into a variable no that wouldn't work because we would need to have a register to put it in okay so each stack slot can consist of can contain physical stacked object an 8-bit constant a 16-bit constant an address or a value so that's got the kind it's got numeric value or the symbolic value and we actually also want to add a vpush reset okay and we have also we need the actual we need the actual stack itself and we need a stack pointer okay reset is easy the stack pointer points at the first unused item so this will be now this is actually going to physically push the register onto the physical stack so this isn't going to help our people optimization issue I'm hoping it will help other issues we'll see we want to keep track of the size separately now let's not okay so that just pushes the value stack depth vstack overflow and this is going to be much the same code for each of these so constant 8 now we only need to take care of one that makes it simpler vstslotu.const value accord the and generates no code so pushing the address and likewise the value is very similar alright now to pop we do so if it's a physical thing then generate the code to the pop into the desired register if it's a constant then we do need to know if this is register a or not because it's the 8 bit value will generate different opcode so mva a comma and the uvc.const value otherwise it's an lxi like so if it's a address then it's just a simple lxi this is actually the this is going to refer to a symbol in a workspace so it's actually this it's the same code that we didn't find a way to abstract out from more or less from here that's entirely the wrong piece of code I want a var var access I want this in fact I actually want this the var access lxi and if it's a value then I need to actually get the register in there now I can't use var access I am actually going to have to cut and paste this code completely let's put that back where it was so that's actually lxi percent s the register is reg name reg and the value so if the register is not hl that's actually an internal compiler error and this is going to be lhd no reg name involved we can actually load into de using two extra bytes what we do is we swap with de load into hl and then swap with de again but we can't load into bc because there's no swap there we could do it by doing a push h hld into h two more bytes to move into b and a pop h but that's like four extra bytes that's ridiculous okay cut and paste errors here var undeclared oh yeah that wants to be slotu.sim slotu.sim now I met somebody once who said that they didn't understand why you would want to do more than one pointer dereference in a row in a single expression all reaches end of unvoid function oh yeah an old error so okay right that is now generating code and now we have to go through this and change all our push and pops to use the new source so this is loading a variable do I want to deal with 8-bit variables in that because we could just push the yeah I'm not going to do that now so this is going to be vstack pushreg a vstack pushreg hl vstack pushreg a hl hang on this is loading a constant so we're actually going to do vstack give me a sec vstack push vpushreg okay so this is actually going to be vpush const node value that will add a constant that will later be materialized in fact we need to do this okay let's go for pops so what I'm hoping this will actually achieve is that the code that pushes and pops constants will just go away or at least a lot of it will go away it's going to need a bit more work to be honest but this should be a good start oh yeah let's just copy before I rebuild anything I don't want to keep cold of the old source why is that pop R what was I possibly even tending to type there this is sub so that's got to be hl surely yeah it has to be how many of these are there okay so what did that come up with a compilation error vreg pop really and that fails because it doesn't like vpush const okay so we actually generated some code how does that look well it has successfully eliminated the pushes and the pops for all this stuff let's actually just do a diff so that's removed all these moved a push pop here though I am interested that this one hasn't gone away as well that should have turned into an LXID we have well this has changed from minus 1 to 6535 yeah that's fine and it's removed a push pop immediately after it why are there two push h's that's not right that's actually incorrect code down here this is still hanging around which it shouldn't be no this actually worked this is correct this has remained because the thing being pushed is not a trivially rematerializable thing I could change that to push value I suppose that might also help here so this is popping this so there is only actually one situation I think it's here two situations where we're popping into DE I do want to know what happened here where did that extra push h come from let me check for trivial typos are actually documenting the stack we are popping the right register and now it's not being called so what actual bit of code is this one thing it's worth adding is line number tracking in the parser and sticking comments for each line of code or maybe even just copying a line of code into a comment so this is the beginning of the loop this is decrementing length this chunk here so what that's actually done is a it's a sub minus one it's a sub one that's turned into an add minus one so we get the 16-bit add so there are two pushers there that's because I forgot to remove this line of code oh and here's some I didn't repush reg reg a vpopreg reg h oh ok these are still wrong this looks better now here's a situation where the trivial rematerialization ain't gonna work where we're pushing the value of a variable onto the stack it's not gonna work because this loads the value and immediately pushes it then pops it into a different register and we can't get our value into register d so let's just stick with that for now now this piece of code hasn't something it's replaced a push with a push psw here yeah that is just wrong what's this doing this is incrementing source add shell that's wrong ok that looks better to be honest that has removed a whole bunch of pushers and pops and hasn't actually affected the code elsewhere in any way which is kind of just what I want this is much better this has remained why does that remain that's my sub so expression so this should be pushing a constant which is then being popped into d so that has pushed the constant and that oh right ok I was actually misled with what was going on here this is fine the constant is being pushed and then immediately popped into h which is rematerialized directly into h so this pop d here is corresponds to that push h was that actually the issue I was complaining about earlier or was I thinking a different one ok I think that's relatively ok because this annoys me I'm going to just do a quick a quick check in and where's my comparison ok ok this is kind of grotesque that's just a cheap and nasty way to make sure that if one of these is a constant it's always the one on the right so if the right hand side is a constant if the right hand side expert node looks like if the right hand side value is zero then this is a comparison with zero so we pop into hl we move a into h we compare against l and we punch if not zero this is it's different to true and then stop otherwise resolve the expression and that has this is our entire comparison routine it is 3 for 5 bytes that is a lot better and with the people optimizer that will be smaller because we can remove these and that's actually which is quite a small piece of code you just check that it assembles and it won't do anything useful if I run it 924 unrecognized instruction bn jnz I'm thinking of the 6502 fun fact the 8080 instructions and the z80 instruction mnemonics are not quite completely different there is an overlap which is that the z80 uses jr to mean a generic jump like unconditional jump no sorry I'm thinking the wrong one it uses jp to mean jump if the n flag is unset no the z80 uses jp to as an unconditional jump the 8080 uses jp as a jump if negative flag is not set and as a result if you use one in your code in the wrong place you will never spot the error and your code will behave really weirdly ask me how I know ok so that looks good what's the next thing that has significantly improved our code generation and was very cheap about the same size it was before yeah that was a tiny piece of code that's made a substantial difference so this thing not that thing this thing can we do anything about that we need to get this value into de well I mean the obvious thing to do is to simply if we were writing this by hand this would become exchange and do that so this would load it into hl and then swap it with de but we don't actually know at this point that we're going to be loading something into de here old cal goal had a register locking facility used by the register allocator so it would be possible possible but nasty to say that like we pop this into hl and hl is now in use so we pop this thing into de and yeah I don't know when I do this that we need the register to be in de here we won't know that until here at this point we know that we need to stack the thing or at least virtually stack the thing we could put the value on the virtual stack but then when we get to here and we want to and we need to pop it h is already in use we can't do anything about it we then have to start fiddling with exchanges in order to it'll end up looking like that's two extra bytes I mean it is two faster bytes than our pushers and pops to be honest so a pop is 10 and a push is 11 while an exchange is exchange 5 so that's like half the number of cycles and it's the same size of code so that might be worth doing but as I said before we can't do that with BC but we've never used BC so and of course in the cases when in the cases when we do want it in hl we luck out let's look at this bit of code so what would this look like with our new model well this gets virtually stacked this gets virtually stacked and then rematerialized into hl which is the right register so we get this this gets rematerialized into the right register here sorry I need to just do this to make the code the same so we're comparing the same thing so we're comparing this with this and these are irrelevant because the people are optimized will take them away so that's actually a pretty big win well it's it removed all the stack operations that's four bytes okay that's worth doing let's give that a try so this happened here v push value dollar one v push value dollar oh yeah okay as we expected so if register is vc vc if reg equals reg de printf exchange like so okay we got some code it's smaller I did forget to take a copy of the previous code but this is the version before we did the vstack stuff so here's our vstack changes here's our conditional optimization here is actually round that's better so we've replaced all of this with a single exchange instruction and we've put the oh yeah it's a little bit misleading so a sufficiently intelligent people optimizer could replace these four instructions with lxid no it couldn't but a sufficiently intelligent people optimizer could replace these four instructions with lxid and then remove this exchange and it would have to keep the other two it would save one byte but that would be doable of course now the compiler is getting more and more sophisticated the chances that it's gonna like break are increasing so at some point I'm going to have to find how to write how to run actual proper code in it and do some compiler tests I have a fairly robust set of compiler tests old cow goal they wouldn't run on this yet and they would need overhauling anyway the language is going to change a bit these operations can actually work with any 16 bit register so if the value happened to be in anything like in any register at all we could just we wouldn't need to do any movement but the register will never be in the value will never be in a register other than a or hl in fact this would end up being the only operation that leaves a value in a different register the one thing we could do is vpop here could rematerialize it into except you can only load values into a so okay so that has replaced this with a dcx just manually take out some of the pushers and pops that are remaining and let's take a look let's compare with the z80 code that this is our while that condition is wrong if it's I don't think I want to compare there I want a or so if it's not equal to zero if it's not zero jumped a true label oh I didn't have put in the we didn't drop out the bottom here so we didn't get the false label jump okay I won't just rebuild that now but I'll just put the x1 in again a people optimizer would turn this into that so we've got three bytes four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seven eighteen nineteen twenty twenty-one twenty-two twenty-three twenty-four twenty-five twenty-six twenty-seven twenty-eight twenty-nine thirty thirty-one thirty-two thirty-three thirty-four thirty-five thirty-six thirty-seven thirty-eight thirty-nine forty bytes that seems more we've got over here four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty twenty-one twenty-two twenty-three twenty-four twenty-five twenty-six twenty-seven twenty-eight twenty-nine thirty-two thirty-three thirty-seven thirty-eight forty-one forty-five forty-one forty-five forty-seven well, new cow-goal that I wrote in a few hours is handily, well once I've done the People Optimiser is handily beating old cow-goal that which was months of work fantastic also this is generally, this is Z80 code which is technically denser than this which is 8080 code so that's not so brilliant this chunk here is probably a really a really good place where remembering values would be useful because we can we can eliminate this because HL is still correct we still have to save actually we only remove that one that's only another two bytes we still have to save all the rest it's very tempting to try and preserve a value like here to here the thing is there's a label here so that's a a basic block boundary so we can't guarantee that when the flow of execution arrives at this label it's actually gone through here it may go through like any other piece of code real compilers do data flow analysis and produce a basic block graph but this isn't a real compiler this is a single pass toy but this is actually looking pretty good I've probably got another hour or so before I want to call it a day so what could I put in well more actual compiler features but adding compiler features I can do anytime this particular slot is to do prototypes people optimizer if I was going to do it I want to do it now before writing a much more code just to make it easier to redraft the other thing is that if I'm going to make this work for real which I actually I think I do now I am going to write something in cow go I'm also going to have to split things up so that there's a a platform dependent layer and a platform independent layer so that I can put all the Z80 code well the 8080 code in one place and then keep the rest of it generic so I can drop in more code generators so adding C to it is not necessarily what I want to do I know I will actually try and make this one real code no I won't I will do parameters because I can't do I can't do anything useful without the ability to actually do sub-routines okay now with my new knowledge of how negative indices work parameter list minus one is probably going to be this ID so what does how are we going to represent parameters well a parameter is a local variable which exists in the scope so the inside the sub-routine scope there are a certain number can I just simply say that the first n parameters as the first n symbols are the parameters no because we always add new symbols to the beginning of the linked list so that's irritating but we can keep a pointer to the first parameter but we add parameters to the beginning of the list which means that the once the list is complete they're in reverse order we wish to type check in does the order in which we type check matter because the obvious thing to do is that the parameter list rule just adds another parameter to the sub-routine but then we get to the call now where is our call here so this thing is going to read expressions and stack them one at a time and as we do so we really want to be able to walk through the list of parameters and do type check and coercion in fact it's more than that we have to walk through the list of parameters and do type check and conversion right there because we need to be able to convert things like constants into the appropriate type so when we're looking at the first parameter this is actually going to be a let's call that an arguments and let's call this a I don't want to do it like that yeah why not so as we walk through the arguments we need to know we need to be able to walk through the parameters in lockstep and this does need to be a list because we're going to have a parameters rule for actually collecting the parameters themselves the convention is that a plural thing like this is just literally one of these it's a list of things whereas a parameter list will be an actual concrete object with a type with data in it so inside argument list the parameter $-1 is going to refer to old ID which will therefore be a give us access to the subroutine proper this hasn't solved our problem of how we are going to represent parameters the simplest solution is to create another linked list of that's not going to work it's a great link list of parameters but of course we add things to the beginning of a link it's not the end one thing we could do is to simply change the symbol table here so we add things to the end rather than the beginning that's not a stupid idea that would allow us to maintain a pointer to the first parameter and then just keep a count of the parameters the other thing is to just keep an array of them either just resize the array if it gets too big or to just have a fixed number I think I'm going to do the go this root so I mean there's only like two bits of code that touch it this just becomes a bit more complicated if there is no last symbol then that means there is also no first symbol so this symbol is the first one otherwise add the new one to the end of the list I believe that is all I need to be honest except something is broken oh yeah I need to update that pointer I wish C had the ability to do like data structures you spend so much time in encoding just doing low level stuff like this I mean it's an amazing feeling of power to be able to create an actual data structure in three lines of code like this kind of linked list and there's stupidly fast but they're also stupidly time consuming and no I don't count the C++ STL as a meaningful data structure one of my other projects is using that oh what a bind not only is it painful to use but it's so so easy to get something trivial wrong and then your program crashes six weeks later in ways it's impossible to debug I wouldn't mind C++'s complexity if it gave me good programs it doesn't it gives me annoying programs anyway so all we need here is to keep track of the number of input parameters and eventually the number of output parameters because CalGal supports multiple output parameters because you know it's a feature I really like and it's my language so so a parameter list can consist of nothing that is empty brackets that is perfectly valid it is a subroutine with no parameters as you'd expect or it is a parameters object actually we can do it like this so a parameters object can consist of nothing or it's a parameter followed by some parameters and a parameter consists of a new identifier which is the the name itself let me just double check cursors this needs to go here because we're going to need access to the subroutine while in order to pass the parameters and that's set up immediately beforehand parameter is a new identifier followed by a colon and a type ref and there's they are separated by commas nope I think this is going to give me a shift reduce error it's not giving me a shift reduce error okay that's legal then and we are going to have to put in the empty parameter list here what I forgot about was the comma so having to have a comma means that I can't allow an empty parameters to be valid otherwise you could have a trailing comma I'm going to allow a trailing comma to be valid yep okay so a parameter adds the identifier to the subroutine and this is the easy bit because the subroutine is the one that we're actually currently creating so all we do is we say input parameters plus plus this new this is a variable the type is a one of them and I think that's it to be honest so that gives us parameters is that going to work in parameters okay let's do a test we can actually build our is that going to compile unpassable character to see that's because I forgot to have a comma in my symbol list syntax error I need line number support but for now let's just turn this on so what's it complaining about what have we read comma id colon id reduce and we add a type ref on the stack reduce and we get a parameter on the stack next token is close parenthesis why doesn't it like that so interesting there should be a rule to allow a parameters to collapse into a what is state 58 so what this is telling me is state 58 represents when we have a parameter on the stack right the cursor is here and we and it's saying that if you receive a comma then cue it push it onto the the compiler stack and switch to state 68 which will then be for reading the parameters what's happened is read a parameter and there is no comma immediately following there's the close parenthesis except that all parameters have to be followed by a comma that's because I forgot to allow a single parameter to be a valid parameters you get used to this stuff after a while how exactly does it die trace vpopreg equals number loop equals expression we are here we should be at the point oh yeah we're trying to do the not equals I believe that it's tried to do something with length and has failed codes it made nothing useful okay it's I could always load it into a debugger line one two one okay this is probably failed because the subroutine name ah okay I have forgotten to tell it yeah I haven't actually initialized the variable at all I need this stuff so the name's already set so all I need to do is so what we had was a variable that didn't have any of the the variable metadata attached to it so it didn't know what subroutine it was attached to or anything like that okay so there's our parameter our type is three okay that worked so here is our subroutine here is our memory copier and length is in variable four in the workspace sources in zero, dest is in two that's exactly where we would expect to be of course we still haven't omitted any code to actually put them there so we are going to do that now that has to happen that actually has to happen as we read the parameter list right now what's going to happen is that we pass parameters on the stack they get pushed left to right so the right most parameter is the one that is lowest in memory so the first popped off the wrong one is the wrong one okay we're going to have to do this the old fashioned way so after we've read the parameter list we now need to walk through the parameters and pop them off the stack now this is slightly more fraught than first appears the reason why is that when you call a subroutine on the 8080 it pushes the return address onto the stack so that is between us and all the things we want to pop off traditionally on most architectures it is the caller's responsibility to adjust the stack so if they want to call something they push n parameters onto the stack they call the subroutine on return from the subroutine they then retract the stack over the things they pushed on our calling convention is going to be different what we're going to do is the caller pushes them onto the stack the call e pops them off the stack into the workspace variables the reason for this is that when we return we will push the output parameters onto the stack so from the caller's perspective they push n input parameters onto the stack and call the thing and then on when the function returns all the output parameters are right there on the stack ready for consumption so this is actually a little bit exciting because we need to save hey look we're using bc for the first time so this will save the return address and once we've finished we pop it back on again I'll push it back on again and we wish to so the first n parameter the first n symbols are the parameters very nice and we have this if statement so that a routine doesn't actually take any parameters we don't bother with the prologue and epilogue except we wish to pop them in reverse order oh dear we're going to have to walk backwards through the parameter list does anyone know how to spell n squared okay yike we don't want to worry about the virtual stack at this point so pop h is our what was our routine for doing register our routine for accessing variables called bar access okay let's us try this so on entry to our routine we the return address pop length pop desk, pop source, push the return address now one of the great things about this is that this will allow us to pass parameters in registers quite easily all we need to do is to decide which ones go in registers and I mean they're already they're already pushed onto the stack so we just pop them into the appropriate registers rather than into uh hang on let me start that again you can pass parameters in on in registers the caller leaves them in registers when it calls the function and up here we decide that these parameters were passed in registers and so we just save them I reckon that probably only hl is worth doing because of the difficulty in saving any of the others but that would help okay well that actually looks like code so how about the call where's our call this one so argument list here is going to uh push our expressions onto the stack so in fact we can have this being a and this is very nearly the same rules it's an expression or it's an expression by more arguments the tricky bit is let's just do argument the tricky bit is that here we need to try and correlate the expression we are reading in with the thing that we're actually trying to call so we can type check it but that's just so copy memory 1000 2000 interesting I thought we would have enough there to actually push ah right so what I was expecting to see was that it would actually push the expressions but of course what it's actually done is it's is that they're being pushed onto the virtual stack but haven't actually being pushed onto the real stack so we really need to pop the thing into a register and then explicitly push it stack underflow this expression should have actually left something on the stack why hasn't that actually done what I expected so this is just a number oh good grief how did anything ever work oh no no no no numbers don't get pushed onto the stack they need to be resolved first so I am not quite as confused and just for the time being let's just do that RHS is oh it's a pointer to the expression node that's what I'm doing wrong no that is a pointer to the expression node and the type okay well apparently I must have mistyped something okay so that's really crude bodge so you can see it here loads a constant pushes onto the stack loads of constant pushes onto the stack blah blah blah calls the routine so that's pretty much what we need except here we want to extract the type from the the subroutine now if dollar dollar is this then what is a dollar minus one is it an argument it must be an argument so an argument spec is going to contain the list of arguments the subroutine that we're actually trying to call and the current symbol the current parameter that we're actually looking at or I don't know if that's going to work because in the list the arguments here is actually going to be different from the one that this will see I don't really know enough about Bison for this also remember that this is an expression and eventually we're going to have nested subroutine calls that's one reason for putting them on the stack rather than just poking the values into registers poking the values into the function workspace because we might actually be calling she's been more likely this would be a problem because we would evaluate this by poking one and two into the workspace which would be fine then we would get the result and poke it into the bar's workspace then we start evaluating this and poke three into the bar's workspace and overwrite the result of this I spent ages trying to work around that so this is one really good reason for passing parameters on the stack so really I want a reference to this argument list here I need to get at that from here okay when all else fails look at the documentation if anyone is still watching then they weren't watching just watch me read things n with n zero or negative is allowed to reference yeah so it's this dollars zero is this I've been doing this wrong additional okay I have apparently did this right by accident so yeah I thought minus one referred to the previous item in the list it doesn't zero refers to the previous item in the list it's just I'd miscounted here I'd forgotten that these actions also count as numbered things so zero was referring to this and minus one was referring to this okay so this is not going to work reliably I think it's not also I should find out if stock yak supports named references because those are very cool and it would be nice to use them avoid the numbers completely is there a different way we can do this well we need things when evaluating argument list I need old ID I need the actual sub routine object well I can actually get it here using dollar minus one well dollar zero or here rather but I then need to somehow get it into the arguments yeah may not refer to subsequent components because it's run for their past you can have a semantic value sets it value dollar dollar can I use this so this is using a stack to add things we could do that we can do that it's nasty but we can do it that's actually it's not actually that nasty what we do instead of this being arguments we call this argument list we then have current call current call will always point at the current thing the current call statement that we're working on so argument list as a mid-rural action it pulls the sub routine from this is a it pulls the sub routine from minus one zero zero it's the thing immediately before the argument list which is the old ID so yes this is a sub routine that's actually the symbol but we know it's a sub so and this actually needs to be except this is a mid-rural action and we can't actually access this so we're actually going to this is thoroughly grotesque so we're actually going to attach all this data to the value for this mid-rural action here sub number we have not processed any arguments yet this is the first symbol and then this is the really nasty bit we store the previous call that we're currently we're currently handling in a global okay so now we can access the current call in via this and once we've finished with our argument list as a end-rural action this is dollar two now we also ensure that the number of arguments we've read equals the number of input parameters okay this is sketchy as hell why is that feeling argspec I misspell that argument spec even it's built with only two s's okay warnings this is a argument spec previous call argument spec argument spec yeah double s's and symbol names are not a good idea okay that seems clean expected three parameters only got zero yes that's exactly what we wanted and that actually means it successfully like referenced all the nonsense so I'm actually kind of pleased there anyway at this point we know the current call contains the call so the struct symbol parameter is current call parameter advance to the next parameter in before we do that if current call so param here should now be the parameter for this particular parameter so what we want to do is simply resolve the expression to the type of the parameter and it crashes oh dear that's a bogus pointer so what's going on there in fact we want some tracing so because we want to be absolutely sure that these things are getting the right pointer throughout looks okay so this is the one where it has created the thing this isn't going to collapse the value no it can't it's in use so why are we getting a segmentation before we exit we haven't printed this but the valgrind stack trace was pointing here to buzz why 220 okay current call sub is wrong that is in fact the only thing it could be it is null but we are setting it here to a thing now it is not that thing now I do not believe that something is scribbling over this but I could be wrong is this the first call okay so it can't be it can't have collapsed over that it can't have collapsed the stack over that because we may want to refer to it here in fact we are referring it to it here let's turn the debugging on and see exactly what's going on there so here we start our call the compiler stack is 0 program empty no that's not what I thought it was rule 28 is really rule 28 these are parameters not arguments using stack by rule 28 line 218 slightly confused unless this is also an argument list okay well let's rule 24 is all sorts of things oh this is the this is the expression handling so it's actually going to go via rule 31 shift and 17 yeah it's just eating the number to turn it into an expression then it sees a comma oh these are states not rules so what state we are in state 56 okay that explains a lot right we are here and it sees a comma which is this so reduce using rule 28 argument yeah as we expect now I'm expecting to see slightly more on the stack to be honest so reducing stack by rule 24 state 25 so it runs this so this is here okay then go to state 38 and it's ready to receive an expression so after running rule 7 we should have a a 25 on the stack what's a 25 so okay my suspicion is this hasn't worked the way I thought it has and this thing on the stack has just been eaten so at this point it's been corrupted now I'm really surprised because I'm pretty sure you're supposed to be able to use that types mid-rule actions now we can add a destructor now we can put types in yeah this is actually doing exactly what we talked about so argspec.5 okay I mean we told it an argspec so we should be able to make that $2 because it's YAC does not support typed mid-rule actions you'll find a proper solution for this at some point oh that's $ yep right that's dying exactly the same way but it is a bit clearer okay this is not going to work I'm going to need a different way to do this I mean I did think this was really sketchy I have to say I'm surprised I thought this would work there's actually one thing I am going to try and of course I have to type my code in manually okay we are here this is our spec object fill out okay I stepped a little bit too far what I wanted to do was watch this variable that is the first one so I can probably do I'm just going to do that again sub through i here in 16 and sub so I want to watch that variable go continue okay we are now here it's just shown me that we've changed the variable continue yeah okay it's done a reduce and has eaten the variable it's eaten this particular structure that is a damn shame I wonder the other thing I can do is to put it in the call itself where is it see I can do it in here but I have a feeling that's going to just have the same effect fantastic okay yeah I am surprised alright let's do something else for the time being I am just going to leak and in fact I don't need this because this is current call this is a perfectly reasonable solution what you'd end up doing is just creating a linked list of objects you reuse them wherever possible I'm just hoping not to so that should be better now wait what why why am I still seeing the debugging information even though I've turned debugging off I have saved so that has written the file right this is why I use yak okay what's happened is when I changed that from bison to yak in order to get them typed mid rules the name of the output file changed by default so instead of generating y.tab.c it started generating pausa.tab.c thank you the free software federation right that is better okay that did nothing there you go and it's even generated the right code probably the right code let's remove some of that tracing so if the value was actually pushed onto the stack then when we resolve it that will do nothing we will then pop it into hl and then immediately push it again the people optimizer will fix that I believe it's not that I've actually seen it happen or anything let's just try just something that's not a constant hello that is the wrong test file I want that one yeah that's better you in date okay so you can see type expression was a close square bracket used when a close square bracket was expected uh interesting that looks like my pointer to code isn't working yeah a printf is broken have I got my var args wrong no that looks right yes in printf that's better expression was you in date used when a you in date was expected okay now this is my pointing at pointing at so yeah when I created the pointer type I forgot to set pointer to to point at the pointer type so it just created a new one the next time it was used there we go and here we have it initialize the i variable push a parameter load the parameter and push it push another parameter call the function return okay let us commit that now we have callable subroutines no return types yet but we have all the pieces necessary to make them work it's just a matter of arranging them in a slightly different order um okay now there is a two more features I want one of them I am actually just going to copy from the old my old attempt at the tokenizer I'm going to copy from all from the old cow goals Alexa because it's kind of fiddly you see this is actually the entire language all the tokens there's a lot of stuff that I just don't have yet which is this except I've got these strings things because I want to be able to I want string constants okay we're going to be able to use some of that but we're going to have to do a lot of it the hard way anyway let's copy this so in order to read strings we have to teach the Lexa how to parse our strings and we're going to chop out some of how the comments work in this no no so when it sees a double quote it switches to state string and then this code here will actually start writing things to the string itself however we're going to use a simplified version of that because we've got the text buffer so if we see a backslash a literal backslash n in the string add a new line character to the buffer let me just check I've got the name of the buffer right text yes so this works too and yeah this should be bounce checking here I'm bashing this out so that I can get the proof concept done this will actually this code on the right here that I'm copying out this is actually cowgirl I did a lex and bison port to cowgirl so yeah I actually have to change how do I change state in stock that's it there's a macro to do it a man page state begin that's the one okay and if it's what's that saying backslash open square brackets oh no the square brackets not escaped backslash followed by a followed by any of backslash or double quote so that's saying that these things escaped are themselves initial I think begin initial right and this point we want to tick a terminator on the string and return token anything we don't understand is an error if we see a new line it's an error so if that's a backslash dot it's an error if it's one if it's not a new line return backslash or double quote then just add it okay I think that will do us for a string so let's just add that here okay that seems to have done something so let's show up all this out because what the purpose of this is is string constants so s equals hello world okay so that's got a string but doesn't know what to do with it so we need to find our expression node because strings can be expressions now the way we do this is we the string data gets emitted into the program and ideally this would happen this would be put into a data segment so out of line with machine code but this is a bodge so we're not going to do it we're just going to create a label we then emit a jump to it actually we create two labels we then emit the the opened label we then emit the and I oh yeah it's db the actual string data and we think how to do this so so this will emit all the bytes in the string we then put the jump point to it we don't have a symbol for this so we can't actually push the constant value but so we're going to have to do it the old fashioned way I missed one of these so what this is for is it just says like we've just manually pushed a thing onto the stack deal with it and this is going to be dodgy as hell because everything here is lxi h, and so this is the address of our string we say that we've pushed it the type is a it's a pointer to Uint8 type and we don't have one of them so this actually needs to be taken out of line and replaced with a a subroutine because we're going to need to call that $2 equals make pointer type $2 make pointer type and this one is going to return the new type it takes the old type symbol this is a sim this is a type type sim oh yeah, okay we don't need that we do need this Uint8 type okay so up here in our string constant code and yeah just like see the string constant is going to be the only string operation the language supports Uint8 type do I need anything else? I don't think I need anything else apart from you know one of those is that going to work? so you maybe use uninitialized no really can't code flow has to go through here interesting alright and what's that done not much have I got this oh no wait this has that just generated a single byte have I done something really dumb here? okay the string is actually present in memory okay it confused here while see it's not equal to zero see I'm well aware that a while on its own like this is actually a completely valid loop that does nothing but the do in front of it am I just too tired to write see that's at least looping I got that wrong staring at the wrong thing can you spot the bug ladies and gentlemen that's the bug that's better there we go so we jump over our string constant and the string constant is emitted in line and yeah that's a bit sucky never mind I should probably generate a symbol for that or at least add the ability to or at least add the ability to push these things yeah and while I am at it let's actually just do a thing there's my assignment let's just do this because I find it so much easier to work like this one two three four okay let's do that that's not right try that okay what's that done for is you might ask wrong file it allows us to do this look type inference in about four lines of code six lines including two contain nothing but braces okay so we now have string constants the next thing we need the next thing we need is a extern statement now there's some way to call actual machine code I've been slightly wondering about how to do this because you need to call things in order to like make things happen because this starts getting into the nightmare which are calling conventions and I want to call cpm it really wants a the cpm entry point has got one parameter and one opcode the opcode is passed in the c register and the parameter is passed in hl and the response is returned in hl now I don't think it's calgo's job to know about calling conventions this way but I do want the ability to call external routines the simplest way to do this is I am emitting text so doing inline assembly is easy but I need a way to get at I need a way to get at the variables and the work space so what I would probably do is have the ability to do inline assembly with callouts where each callout allows you to embed an expression or more likely a simple identifier and the identifier would be looked up and the text address of the identifier would be inserted into your inline assembly this seems like the simplest and easiest way to do things however I don't think I am tracking sufficiently well to do that so is there an easier way to do it we could just poke things into the symbol table we could have an external statement which just allows you to declare external routines then it's up to the implementer to actually do the right thing of course the implementer is me we're going to do external the reason for this is that eventually this version of calgo is going to support multiple compilation units and we are also going to want to forward calls it's not quite as easy as that because of the whole nested subroutine thing in fact there's a hideous bug which is that the name of the subroutine is not scoped because it's like just emitted into the machine code so if you have two nested subroutines with the same name in different other subroutines that's perfectly valid each subroutine has its own namespace but that's not going to work now the external subroutine is actually no actually this is going to copy quite a lot of this code but differently and the stuff that's embedded inside parameter list oh no that will work that's fine yep so this is going to be external I was going to just do external sub I wondered now if I wanted to do ID blah blah blah equals and then an address or some external identifier I think the external identifier would be nice so I can actually so this stuff needs to be brought out of line into an internal sub consists of can I do a mid rule with nothing following it and of course the let's not do that of course we don't need this jump statement here so what I'm actually going to do is just cut and paste the whole lot so sub new ID it's going to have to be an external sub because if this mid rule fires then it's now too late to back out and try something else the parser won't let me so we don't need a label after we don't need we do need a current sub we need a parameter list we don't want any of the parameter stuff we do want a sign and then a string constant and we do want to override the name of the subroutine with the string constant and then we want to back out of the stacked subroutines is that going to work this turn is used x3 okay so our routine here is going to have a external sub putC equals putC we now define a routine to print a string and what this will do is um I am going to need I'm going to need an if then and I'm going to need a quality quality operators uh I've got 20 minutes till midnight I'm not staying up until 1.30 like I did for the last one but let's see if we can make this work c equals 0 to then break end if bc s plus s plus 1 n loop to s hello world okay what's this going to do complain no if so let's add if we've actually reached the point to adding new functionality to the compiler is fairly straightforward so an if is a conditional I need a then to if then end if I'm not going to do else right now it's like not difficult but I don't need it and right now I'm doing only the things I need because I'm going to go to bed so if conditional then statement end if so an if gets labels but there is no loop label and so we don't have that however we do have the true label goes here and the false label goes here and I believe that is our if then end if that has complained because the equals operator doesn't work now so we've got not equals and we also want to do an equals but these are actually the same this is the same piece of code so rather than just passing all the labels we're actually just going to pass in true label false label because this then allows the not equals here to be true label false label so we get like both are the same routine really true label false label cond not okay and change this to be in true label in false label right so and now remember this is now condition equals so this is actually going to change our logic so if the left hand side is equal to the right hand side jump to true label otherwise go to false label okay now we get to the zero test if this this one's relatively straightforward if they are not the same jump to false label otherwise jump to true label this is where it gets a little hairy so we need to be really careful here if the first bite is not equal to the second bite then we know they're not true so yeah okay okay that's not so bad again it's the magnitude operators that are really painful what's that complaining about types I forgot to change the name okay one more thing which is break and then this may actually work now break oh heaven breaks a pain because it's another of these things it needs to stack and we are going to attempt to do this by uh we can't do it there no we can't do it there so where's our loop our loop is here oh that's not right uh no this is not actually that's doing you all wrong um so the things you can break from our loops which is this wiles which is this so what's a loop a loop is a label let's make that a list of labels and we are going to put in here we've got a break label now we don't we have a old break label we need to rename all of these symbols for consistency and here we have a break label now we also need to remember that when we enter a subroutine we also don't want to break out of the subroutine that would be bad and we are also going to do this so that zero is no longer a valid label okay so subroutine old break label equals break label break label equals zero so we cannot break from inside a subroutine at the end of a subroutine equals current sub old break label we restore it in here we do old break label equals break label break label equals we want a label to here uh do we have one so when we are inside a loop you can always you can break out by jumping to here and remember to restore the break label okay while old break label equals break label break label equals we already have one which is false label so we can do that and here break label equals old break label okay I think that's all our loops we now actually need the break statement itself and the code is actually simple it's a just a jump okay now what's that going to do simple break is not defined wow compiled what do we got let's look at the source code so we've got external subput c that will not appear in the code which is good we have our put s routine which is a pretty chunky thing it's this it could be better we have our function prologue where we take the parameter and we stick it into the workspace then we have our actual loop which is this bit here we load the variable load the character store it into a variable load it with LHLD why are we doing that how are we doing that I assume that it's the expression variable lookup old ID we push R oh okay that needs fixing right yep there needs to be I forgot there needs to be a code path for 8-bit registers so what this does we can materialize directly into a this code I should put my VREG access back again to be honest which is LD to LDA okay let's try that I never actually wrote any 8-bit comparison code no I didn't that's what the problem is we're trying to compare this again 0 but we have no 8-bit code LHS type U.type well so we want to pop it into R A or it with itself this code is common okay that's better so store it and see we immediately load it out of C test it for 0 if it's 0 if it's not 0 we print it if it is 0 we jump to 8-4 and then we jump to x3 people optimizer for the win we C is a C is a 8-bit value why are we loading it here with 16-bit load I also what is this doing that's not right we want to pop one value into A and next value into HL and then we compare A with HL let's try that that's not actually our bug so what's happening here this is the this is calling a O right this is calling our this is pushing the parameter on the stack to call out a pristine so what we want to do is find parameter no we don't want to find argument right what's happening here is that we're load the pushing argument into HL rather than into PSW so do we need to yes so it's but why is it loading it into ah it's materializing the symbol into HL yeah you can't do that because this is an 8-bit value the actual value is in the low pair of this so it's going to end up in L rather than H but it needs to be in H in order to push it correctly so we really need to have code here our dot type with so if it's a 8-bit register pop it into reg a if it's a 16-bit register then put it into HL okay let's try that bar type U type to reg a source we load C we push it as a pair with thing in the high byte we increment S we jump back to our loop down here we load our we load our string we push it, we pop it, we push it and we call it okay let's assemble it assembled and it's like small program okay let's run it, does it work affecting more than that I know what I've done nope no I did do that right okay yeah I know what I've done I haven't written the putsy program how did that compile so it's supposed to call F putsy but there is no F putsy oh dear my assembler fails to check for undefined symbols oops okay right we do actually need to put the put a bit of boilerplate in so just after we finish writing the program we're just going to do F putsy this should all go out of line and be separately linked we wish to put the we wish to take the value to be written off the stack SPHL I believe is there's a swap with this one EX SPHL and that's called XTHL okay so what we do is we pop the return address into H we call XTHL to swap what's on the stack which is the byte we want to write with the thing we've just read which is the return address so now we have the return address on the stack and the thing we want to read in HL this is the equivalent of fourth NIP I believe we try to remember how to print a byte in CPM so the code wants to be I think it's 2, I'll check that in a moment to print a character the the parameter wants to go into DE the byte we want to write is in H and we want it in E so we do E to HL and then we just jump to the call address and we don't need to pop anything off the stack we've done that invalid digit in character constant I think I need a another I know I've got the right number of new lines CPM end of file character okay that's not it but we really want that anyway character constant okay that looks like another bug in my assembler so this is like first time I've used it for real okay test it works we have a fully functional compiler capable of compiling trivial but non-trivial small but non-trivial programs capable of doing almost real work I mean it's a hello world that counts right there's tons of things missing so I haven't done anything with structures it needs magnitude comparisons there's 32 bit arithmetic that's not fun and you need to be able to do more than add and subtract it needs a whole huge library of multiplication and division operators and shifts and all that sort of thing but it works how big is it 24k of code that's big quite big 35k binary I imagine it would probably be another 10-15k with all the rest of the features in it we can cut down the size probably by removing some of these strings it does need the real thing would need a rather different way of writing object files than just dumping assembler text because it's going to need a custom linker to manage the core graph stuff I described earlier but that actually works whoa and the technology will it will easily work in other architectures it should be really easy to port did I remember to assemble that yes ok a sudden horrible moment that maybe I hadn't assembled my test program it actually works need some diagnostics we've got a hello world that fits in a single 128 byte record let me just assemble dz80 is very picky about its file extensions but here is the code assembled and then translated into z80 and then disassembled into z80 machine code so let you compare needs that people optimizer really needs that people optimizer that's going to have to be the next thing to do this stuff is actually my string data I need to stop targeting my little assembler and target proper zmac that supports like relocatable objects and multiple segments so I'll be able to put the string data and the workspace data into additional segments they weren't in line one of the things this is doing is the workspace data which there's not a lot to be honest we've got three bytes of workspace data is admitted with the code binary so it's occupying three bytes of disk that we don't need so that can go into bss but yeah fantastic let me save that and with that yes I am going hiking again tomorrow but I don't know where yet I am going to call it a night I hope you enjoyed watching this video assuming anyone still is please let me know what you think in the comments