 have a nice hot cup of tea, which means it's coding time again. So the other day I did a disassembler for the TLCS90 CPU architecture, which I'm not going to summarize here because I just made two other videos about it. But today I am going to attempt to write an assembler for it, and this is going to be moderately exciting because the TLCS90 architecture is a bit bizarre due to the way it's got, you know, opcode bytes in the middle of instructions and things like that. But I do want one and I'm just gonna have to give it a try and see what happens, and I'm not going to try to be too clever about it. I tried to be clever with the disassembler and it didn't really work, and then I just brute forced the disassembler and I ended up having clever assembler, so I'm just going to try brute-forcing this and see what happens. The instruction encoding is odd but regular enough that I think I can make it work without too much difficulty. But let's give it a try. Over here I have the usual setup. Up the top left is the auto build window using Enter to run a command if it sees files have changed. Over here I have Vim and I've already set up the skeleton for the assembler. Since the last time I did an assembler here I have in fact refactored it and fixed a lot of terrible bugs. So now rather than writing the entire program each time you can just fill in a skeleton and you get stuff like symbol tables and the input output layer and multiple passes and so on all for free. So all I need to do, all I say with scare quotes, is add stuff to this table. Over here I have the 6303 assembler I did in the same way. It's only 330 lines. So we've got a custom read operand subroutine, some instruction emitting subroutines and then here we list all the registers that the 6303 has which is 1 and no hang on it's got more than 1. Why is there only one there? Oh yeah yeah there is only one register that appears as an instruction operand. The other two registers A and B are encoded into the instruction op code. So yeah it's just a big table of op codes. Each table entry gets a call back that you make to happen and one machine word of payload which here I am using to encode the actual op code itself and I don't think I'll be able to do that with the TLCS90. Anyway let's give it a try. So the TLCS90 is a re-implementation of a Z80 with different instructions and op codes. So it's got the same registers that the Z80 has as a table somewhere here we go. And each register has a particular number encoded with it which is used in the instruction encodings. So I am going to simply use these. So the first thing we need is a set of registers. I'm going to put these in here and A, B, C, D, E, H, L are the 8 bit registers and these have values 0 to 5 and 6 for A and in fact I'm gonna put these in the front for reasons. These are the 8 bit registers but you also have the 16 bit registers which are B, C and I'm going to use the top nibble to indicate that it's a 16 bit register. B, C, D, E, H, L, I, X, I, Y, S, P and A, F, B, A, F and the numbers that are associated with them are 0 to 6 essentially but missing 3. 3 has a special meaning in the encoding. So that's 0, 1, 2, 4, 5, 6 and A, F is special. It's equivalent to SP in the encoding but it's only used by a couple of instructions push and pop. So just to save time I'm going to make it equivalent to A, F. Right we also have over here a different instruction, a different register encoding which is used for some instructions but luckily these numbers are the same as these numbers but with bit 3, bit 2 unset. So I can just subtract 2 because I can subtract 4 from these numbers to get these numbers so I don't need a way to distinguish between the different tables. So there are our registers and it builds and in fact the assembler will work. It's a submittal work. I have a test file set up which is empty and it uses four passes to compile an empty file. My assembler frame work is quite small and uses not a lot of memory but it is not quick. Each of those passes involves reparsing the source file. All right so I'm going to start with a nice simple ALU instruction. Where are the 8-bit ALUs? Here they are. I'm going to go with ADD. The reason for this is the ALU instructions are the most regular. So you can see here ADD A, something. It always has some prefix data which is normally a prefix byte like this one or well nothing like this one followed by some optional payload followed by the opcode itself and the opcode for all these ads is 6-0. This one is an abbreviated form for adding an 8-bit value to A and in fact it's the same opcode 6-0 but ORD with 8 and this pattern does seem to apply everywhere else. So the way we're going to do this is as the instruction add the opcode tell it that it's an ALU 8 and do that and here we have I need to just double check the symbol callback. Here is the handler for ALU 8 and this should build. If I go to my test file and I put in the instruction and a comma 53 we should be able to run it and it fails. Honestly that's not the error I was expecting. So it should have called ALU 8. Yeah. Yes. So what's happened is that because it hasn't read any operands it hasn't read the new line at the end of the file. So the sembler framework does the thing in the callback which is nothing and then it starts reading the next instruction which is the A. So we are actually going to have to read some operands. Now the way we're going to do this is on entry to simple callback a current instant is set to the current instruction from which we can extract the value. We're going to have to read up to two operands but we won't know how many operands to read until after we've read the first one or rather in this particular case we know there are always going to be two operands. There's a register and there's the right hand side. This instruction encoding the destination is on the left. So this encoding here adds n to a. Some instruction forms but very few have versions that have a variable number of parameters you may go like ret cc. The cc is a condition code. We're going to have to encode those registers that we'll do later but so when we the callback for ret is going to have to look and see whether there's an extra parameter and read it or not. But for now with add we don't have to. But the first thing we need to do is to look at the left hand parameter to decide whether it's an A or whether it's a different register which gets encoded differently or whether it's one of these which is a different encoding. These adds add a constant 8-bit value to something referred to via an indirection. So what we're going to have to do here is read an operand and read operand returns the next token. So this is going to be either a comma or a new line depending. In fact we expect it to be a comma. So we can probably actually I won't do that for now. Read operand which is stubbed out here is going to fill out some global variables with the operand just read. So just take a look at the 6383 version. The assembler framework's a little bit dubious about which subroutines consumer token and return it and which ones leave a token queued. So read operand and read a token both consumer token and return it. So here you can see it reads a token and then it looks to see what it is. So we want to know whether this is a register. Now a register is returned as thinking about this as a symbol. Do we have an example of this in the 6383? I think we don't. Read register reads the next token and checks to make sure it is in fact a register but we can't use that here because we need to understand things that aren't registers. So here is our framework. Here are the magic tokens. We're looking for token identifier and in order to find out whether it's a register or not we look to see whether the callback is a regcb. So if the left-hand side is a register then we want to know whether it's... actually we're not doing any business logic here. We're just reading an operand. So we actually want to define some addressing modes and somewhere to put the result. So addressing modes can be an 8-bit register, a 16-bit register, a number and we'll add the rest later. And this is where we're going to put the addressing modes. Operand value is going to be the parameter whatever it is such as the register number or the actual you know number. So if the top nibble is set then this is a 16-bit register otherwise it's a 8-bit register and the value is the bottom nibble of the number in the table. Alright. However read operand does need to consume the next token so we need to do this and we're just going to put an error in here so that uncaught cases just halt expected and identifier. Right. So this has read the operand. Token is a comma. We exit this, it then reads the A. That should be producing the same error as before. Interesting. Anyway I expect this to fail because we haven't done all the parsing but we should be able to do and we see that the token is 2C which is a comma and the operand mode is 0 which is a 8-bit register. So we know that if the left hand side is A then it's this particular family of encodings. So to do tests for this all we do is if token is reg 8 and operand value not token and in fact we know that token must be a comma or it's a syntax error. I think we've got a helper for this but it's not that one. Okay. So the operand value must be 6 because it's a register it's a rather therefore what we're going to do is call out to another routine to do the work otherwise halt with an error and we do need to implement a still expecting an identifier. Interesting. Oh yeah this will actually bail out. Yep okay. So simple test A we know that this instruction is writing to A. Therefore we don't need the current operand anymore. We have read the comma so now we are going to read a final operand like so. This will consume the trailing new line and that fails because because we hit this. So here we're actually trying to parse a number that's easily done. I mean the framework has already done that so this is going to be token number there isn't a token value token number and again that wasn't what I was expecting. So we've read the comma we should now read the number. So what has it actually read? This parser stuff is the hardest part of the whole thing. The whole CalGo project does have a parser generator in it which is used by the compiler and would probably be sensible to try and use this for the assembler as well but I kind of haven't. So there is a trailing new line in the file so it should have actually read that. Ah I'm not returning the next token that's why. Okay we've now done four parsers each time we read the operand. So what we're going to do here is look at the operand mode for the right hand side and then do something depending what it is. The operand mode for add can be an 8-bit register which gives us this form, a 8-bit constant which gives us this form or one of these index things which we haven't implemented yet. So if it's a 8-bit register then we're going to be using this a lot. Hang on this is the right hand side. We don't care about A for this one. Right if they're adding a register on then the encoding is prefix byte current instruction and the value. If it's a number then it's this form here. It's the opcode odd with 8 I think followed by the payload and if it's anything else then it's a bad addressing mode. Right it's not a mit byte it's a mit 8. That actually assembled something into 682b which I believe is correct and we can test this because we have a disassembler and this got written to, I'm not sure what that got written to so I need to clear up my test files. Add a comma 2b that is the right one. Okay so we're going to be pretty rigorous about the test file so we're actually going to do a, yep that looks reasonable. Intidentally notice that down here we have a generic add form that lets you add a 8-bit value to any 8-bit register. This overlaps with this so if you can encode adding an 8-bit value to A either with the short form in 2 bytes or as the long form in 3 bytes and of course we always want to do it with the short form. Alright well that actually seems to work. Actually I am not going to do all of those but we are going to want at least one of each addressing mode. Okay so we've got this one, we've got this one, now it's time for the indexed operations. The indexed addressing modes are annoying because there's actually two parameters we need to worry about. We have for this ix plus d, ix can be any of the indexable registers which is ix, iy or sp and d can be an 8-bit offset. In addition if there is no offset you can use any 16-bit register and in addition there's other special form here hl plus a which sign extends A and adds it to hl and uses the result as a as the thing to indirect. But we are going to have to record both values both the register and the displacement. For the disassembler I actually treated these three forms as different addressing modes but I think for this we're going to let's just add another global variable. So let's just change this. We're going to pass this in the register because this allows us to do it as an 8-bit value which means that this becomes simpler. So why is it complaining now? We need to identify where all the different error messages are coming from. So I think that would be here, not there, yep it's going to be up around reg, yeah okay that works again. So we are going to want to have a addressing mode for general indexing and an addressing mode for the special hl plus a form. Now how are we going to do this? We can identify all these things by whether there is a opening brace. So we now expect the next thing to be either a register or something else. If it's a register then this is an indexed form. If it's something else then this is one of the it's encoding a address. So read our token. If the token is an identifier callback is a regcb then this is an index form. Otherwise it's an address. So we need to push the thing we just read back onto the queue. Pass it as an expression. We expect the token that we've just read to be a closed brace, closed parenthesis. I read expression returns the token that stopped the expression. Then the operand mode is going to be a indirect address. Let's just call that address actually. Yeah the value is going to be the where does read expression put its value token number. And finally we need to read the token. So we need another one of these, am address, and let's put some examples in. So add a plus register, add a plus constant, add a plug with hl, if ix plus one, constant, constant. Yep that's exactly what we expected. So we then follow two parts, three parts rather. If it's an hl then we know that the next thing must either be the closing parenthesis or plus a. There are no other options. If it is any other register then we can have an optional displacement. The displacement must be zero if it's bc or de. In fact that's even really just those options. Okay so we now know that it's a register. Therefore this is a indexed operation. We store the register number in the field but we do need to check we need to check that it's a 16 bit register. So read another token which must be a plus. No it's not. It's either a, if it's a closed parenthesis then the displacement is zero. The mode is just a simple index. Actually we can do that here. Queue up the next token and exit. If the token is not a plus then that's not allowed at this point. Read another token. If the token is a number then this is the displacement. I don't stop yet though. If it's a register then it must be a. That's the only allowed option. Then change the mode to xhla. Otherwise this is invalid. The next thing must be a closed parenthesis. Now what does expect do? Right expect just consumes, just reads a token and consumes it. So we need to queue up the next token and exit. Too many parameters to syntax error. That should be a simple error. Okay what does this do? And what is addressing mode in read test a? Yep this is okay. So if this is an indexed operation it's going to be one of these two addressing modes. If the, if a displacement is supplied then the register must be ix, iy, rsp. If the, let me just, is there more information about how the, oh here we go, 8-bit displacement. It is signed. So if, so we now know that the displacement register is correct. And the, so we now know that the register is valid and the displacement is valid. It's one of these forms here. So we go back up to our ad and it is, so the minus 4 gives us the correct register number followed by the displacement followed by the opcode. So that for this branch we know the displacement is 0 therefore it's following this form. So this is b0 plus the register followed by the opcode and in fact we can do it like this. These were the numbers 4, 5 and 6, this table. Okay and what does this do? Displacement out of range at 4. So the displacement here is a unsigned 16-bit value but here we are treating it as a signed value. So in fact we need to do that unsupported addressing mode. Right this is the address form it's complaining about. Now the TLCS90 has two addressing codings which is the full sized one and a single byte form which refers to the direct page. And the direct page is a special address range in, here we go, ff00 to ffff which can be encoded briefly as a much shorter instructions. This allows you quick and easy access to hardware registers and other things. So this is straight forward. If operand value is great to this, where is our add and you just emit the opcode byte which is 6-0. That's not the opcode byte. Yes I've actually got that wrong in a number of places. Current instant lot value followed by the bottom 8 bits of the address. Otherwise we need the longer form which is this one which is an E3 followed by a 16-bit address followed by the opcode byte. Now what does this do? Oh that's be HLXA. This of course takes no actual parameters in the encoding so it's actually kind of straightforward. It is an F3 prefix byte followed by the opcode byte XHLA. Unbalanced parentheses line 6. This is the, this is reading and parsing that expression. Does it act, does read expression actually terminate the, okay right. So read expression does not terminate the expression when it hits a closed parenthesis on its own but it should. So I'm going to do this fairly straightforwardly and do that. This will break out of the expression reading loop when it sees a closed parenthesis and there are no more expressions waiting to be parsed. Well there's nothing on the expression stack and that build takes long because it's now rebuilding all the assemblers but test parsed which is nice. Unsupported addressing mode. That's dropped off the bottom of this and it assembles. Does it disassemble? Oh what have we got? Add A comma HL with the correct encoding IX plus 1 with the correct encoding. HL plus A is correct. 1, 2, 3, 4 is correct but this one is actually correct because my test was wrong. That should be FF12. Let's try that. Yep and that has disassembled to the right thing and it is using the short form encoding. Excellent. So that is, this form of add is done. Now we could just go and add all the other ALU instructions to follow the same format but let's actually go ahead and do the other add forms. So this is the case where the destination is A otherwise we just fail. The other situations are adding a constant to any register like this and that fails in the normal way. All right so if the left-hand side is a register, now that case is stupid there, if it's not the A register then it must be this form. There are no other choices. The right-hand side must be a number so we expect a last operand if it is not a number bail and omitting it is straightforward but we do need to remember the register number because expect operand here is going to overwrite it. So this is going to be OXF8 forward with the register followed by opcode byte followed by the payload and that doesn't fail because once again I have typed commit byte rather than animate 8. That's between pointer and non-pointer current and c.value. Okay that assembles and that disassembles B comma 2B. That is what I wanted. Right now we are going to be adding 8-bit constants to all the addressing modes up here. And as expected that dies at the first one. So now we do want to turn this into a case. And what we're essentially doing here is copying the code up here but differently. So let's just deal a lot of that. Actually, no we don't want to do it like that because we still have to read the right-hand side. So if it's a reg8 then do the reg8 path. Otherwise, copy the left-hand side operand. No you can be cleverer than this. So it's going to be one of these three addressing modes. So we do the same validation that we did when these things were on the right-hand side and we also omit the first chunk of the instruction. So for a index, this is this one, instead of f0 it's f4 followed by the displacement. Otherwise it's a e8, don't do that. If it's a address, if it's a direct mode then we omit bf followed by the abbreviated address. Otherwise it's a eb. If it's hl plus a then it's an f7 and all of these is followed by the op code. Now then we read the right-hand operand which must be a number and we just omit it like not like that, but and that's produced garbage. Why has that produced garbage? ea60. So the first one in the test file was just hl. So that should have hit this code path and this code path, no this one, so the register, hang on, e8 plus the 16-bit register number. The 16-bit register number for hl is, okay we did take off the 16-bit bit from the register number so that should be the right number here. So hl is going to be two. So that will have been, yeah right, okay. What's happened is that we want to, we need to set this bit of the opcode byte just as we did for this. Right that's better. So hl, ix plus one, hl plus a, one, two, three, four, ff one, two. Notice that all these instructions allow you to do stuff without actually needing to touch your register. Well you get, you read from registers for these three, but these two don't need registers at all, which is quite cool. Okay so we have, I think all the logic for the ALU stuff. We've done all the ad forms, a comma a, a comma two b, all of these, b comma something, all of these. Right that was simpler than I was expecting. So let's just duplicate all this. Place that with ADC. This will of course fail. Yeah this error is due to a quirk in the syntax for the assembler. I'm actually copying the old MAC assembler syntax from the 8080. If a value is seen that is not an instruction then a label is defined for it. So this will define a label called ADC. Then it will try to interpret that as an instruction, which it knows is a register, hence the error message. It's not the greatest, but I can't really change it without breaking compatibility. So ADC is six one, that's the big table. So we want these ADC, SPC, and XOR or CP, and XOR or CP. And these should now all work. You put a symbol during init and write. The reason why that's failed is because there's already an assembler built in symbol and which is it's used in Boolean operations. It's a operator and I suspect OR will fail too. But all the ADCs work. ADC, SPC, yeah. Okay I know how to get around that and it's vile. Do I know how to get around that? See I can special case it in the main loop, which I no longer have access to because it's been factored out. It's over here, this. So what I would do is stick some code in here to say is it and or an OR if so change it to point at something else. Anyway let's just get rid of that for the time being just make sure this works, yes, yes, and I also want XOR and CP. Interesting, this is actually a little bit of a problem because I want a generic solution to this. So I am out of T so I will take a short break and think about it offline. Okay I did manage to bod that into workingness. What it does now is the assembler framework calls this massage current instruction subroutine before it calls the instruction callback. This allows the back end to look to see whether it's an operator, whether it's something we care about and then it just changes what the currently matched instruction is. So here we have two symbols. They don't appear in the symbol table but that's good enough for the the back end to pull the information from the the current instruction. That does work so we can assemble it and we can disassemble it and you see there we get some and then there we get some more. So that's good. Now one thing that came up is that I realized I forgot to test for negative displacements and of course that doesn't work because we're explicitly looking for a plus here. So we actually want to do this. However life becomes even more complicated because we don't get the hl plus a form only exists as plus a. There is no hl minus a. So we need to do this. So this is the plus version and it's the complicated one. This is the minus version is actually quite simple because this has to be a number like so and that should be relatively simple. And what does that turn into? There we go. ix minus one, ix minus one. Good and I'm actually going to I know I said I was going to be rigorous about the testing but I'm actually going to simplify this a lot otherwise I will go mad trying to keep things up to date. And to be honest if actually I need to do add and and because they go through different code paths but the other the other ALU operations are all you know minor variations of these two so but these two work I can be confident that everything else works. Okay and now look five byte instruction lovely the longest I've seen is six by the way. Right so this gives us as far as I can tell all the eight bit ALU instructions so that all of this row all of this row all of this row all of this row. So next let's get on to the 16 bit ALU instructions because the TLCS90 has 16 bit as a 16 bit ALU. So actually I'm going to just change this because this same callback is going to have to handle the 16 bit instructions as well. So if the left hand side is a eight bit register follow this code path this is going to be another case when and in fact we could inline simple dest a in here but let's not so we're going to have this is going to follow very nearly the same logic so if the register is HL do the HL thing otherwise we do the other thing. This instruction set is nice and orthogonal so let's find the 16 bit ALU instructions we have hopefully this is orthogonal yep so destination is HL it's these we see we have the same opcode throughout and minor variations of the opcode byte I wonder if we could actually use the same routine not quite all this logic is common we would be using different prefixes so so adding a register to HL that's this form is f8 oh it is the same it's just a different register okay what about numbers yeah indexes f0 and e0 f0 and e0 this is the same address is just the opcode or e3 yep that's all exactly the same the only difference is this okay so let's just do unfortunately cal goal doesn't have like comma operations in when statements yet actually no there's a this is the idiomatic way of doing it okay so that is essentially all we need for that this is I think going to be the same as well if this this is register comma register that's this one yep f8 so we could it also occurs to me that these are likely to be the same too but with different prefix bytes so I think we want to move this logic up into simple test but these are going to be different yeah and yeah this will work so if the operand register is HL or A then follow this path otherwise follow this path and it will know whether it's a 16-bit or an 8-bit operation from the opcode which is different for the 16-bit value so we're actually going to need to opcode you in date here and wherever we're using current instant value we need to turn the opcode because then we can pass in different opcodes at the top level it's better code anyway so this is going to be current instant value as you and eight this is going to be current instant value as you and eight or with one oh we should shift the opcode from the 8-bit version to the 16-bit version and I think this is all going to be the same but let's leave these untouched for the time being and of course that doesn't work because we need zero if I remember correctly okay is that going to assemble yes is it producing the right results looks good to me so now we go on to our test we just duplicate the whole lot but we want to replace A with HL except for here and here and here and here and here and here okay now what's this going to do 33 bad addressing mode oh that's the one case where we need to know whether the thing the right hand side is a register or not that's this code here no wait that's not that code at all uh that's here HL comma HL we go through this this code path it's a 16-bit register recall regop done so unless something's wrong with read operand okay so sixes are uh this is reg a let me get a zero which is reg HL followed by a two two's not on that list two is oh wait no HL is to so you could mean right that's better that's better line of tracing and you see that has actually I think worked so what do we get when we disassemble it garbage some of it's worked so add add HL HL has worked add oh yeah this constant value is now a 16-bit value so here we have to know whether to emit a 16-bit or an 8-bit value if operand reg equals reg a then that's better that looks plausible this is wrong this is ooh anyway let's look at these for the time being so yes and HL with a 16-bit value HL HL yeah yeah yeah yeah yeah right that's actually done the right thing there we wish this to be DE and then we get to these and these are tricky yes notice that these are actually so the 16-bit ALU instructions have the 8-bit ALU instructions allow you to do things to A and allow you to do things to memory whereas the 16-bit instructions allow you to do things to HL but nothing else so you can't do things to memory and you can't do a do anything to anything that's not HL so these do not exist and these do not exist except for some exceptions which we're going to have to come to next so that all looks right okay so yes this is actually the right sort of code we don't have to duplicate all this stuff because this only applies to 8-bit values well it only applies to the I think let's just try doing let's see what that does I think this will assemble yeah what will it make yeah garbage that's not actually an instruction so we're going to really want to catch that and disallow it so yeah that change I did to to this ALU regdest we need to undo it because this stuff only applies when the left is a 8-bit register so if operand reg is reg A then ALU accumulatedDest otherwise we follow this other path however if operand reg is not HL then otherwise pucin error takes one value as your date so this should now error out yep 51 by the dressing mode right yes all looks right good now those exceptions I was talking about here they are these two where you get to add to one of the three index registers that was my recorder I'm using simple screen recorder for this which works really rather well and I can turn it on and off with a hotkey so here this is going to need to be another case statement when it's reg HL do this it is only add you don't get anything oh no you do get other things you get these are these the same encodings so add a 16-bit register F8 plus the register yeah and the opcode is 1 4 yeah it looks right add a 16-bit constant it's the opcode or 8 right there's a shorthand version for that add a memory address that's these so EO and FO yep F3 yep okay all right well that's good we can do this mostly so this should now allow us to duplicate this chunk except only with ads of course and these can all be IX what does this do wow that's assembled yeah so IX comma HL yes that works this one does not work because this the the encoding is trying to generate is actually a call instruction that should be one for the others work like all of them oh this is different oh there's no zero page form of that you see that is actually producing the zero page encoding is the shorthand version that it's using to add a constant to the register okay so so we need to preserve the destination register if the destination register is reg a or it's reg HL then use the 8-bit use use the direct page encoding all right so that forces this to ff12 while leaves this intact and this intact good we now need to change this if we are adding to a number right this is actually wrong it just happened that read operand did not corrupt operand reg so if it is reg a emit this if it's reg HL do this but if it's anything else it must be an index register and therefore we wish to just emit this and yep that has actually admitted the right thing this is still correct this is still correct this is right this is right okay so we should now have all the ads oh this is one add ix and a register hmm the annoying thing is i'm going to have to duplicate a lot of this stuff somehow for the LD so if the right hand side is a register no i have done that that's here stupid stupid yep i reckon that's probably done so do i have all the i think i've got done all the simple alu instructions that's actually surprisingly dense so where to go next i think let's just start at the top LD will be a pig there's a lot of LD some of it is very similar to the alu stuff now there's actually quite a lot of LD op codes the base one appears to be 2 0 but here we see 2 8 and 3 0 and 2 8 is used here this here's a 3 7 and there's lots more for the 16 bit ones as well so as well as we have these forms which is LDW the LDW instructions are actually for the case where no register is in the instruction and therefore the assembler doesn't know whether you meant a 16 or a 8 bit operation so these are actually all 16 bit fact let's just do those first so there aren't very many of them now these are all uh these are all similar to this so we should be able to just duplicate that code so let's just put some in test instructions in oh sorry these are yeah if you're used to the z80 you'll notice that you get to do stuff with index registers and indirection that's a 16 bit wide which is kind of amazing okay so we now actually need some code to do stuff now the left hand side is always going to be an operand so followed by a comma followed by a number so i think that what we want is this stuff so operand value anyway let's look you've got indexed operation is three right f4 3f wait operand value shouldn't that be oh no the opcode just down there okay so it's f4 for ix and e8 yeah that's that's normal we've got a simple address is eb eb direct mode address is different 3f and it's a short form so in fact the encoding is quite different xlha is here it's in fact so we actually want let's say your code is 3f that goes there that goes there nothing goes there that goes there followed no it doesn't 3f and then this is a 16 bit operation and that seems to have worked although i noticed that my disassembler has actually produced the wrong opcodes here the 3f is this that is indeed an ldw so it should be one of the other pages yeah there's a missing exception there and yeah that only actually applies to bank four 3f should be there all right ldw is all the way down hl4567 ix minus one ix plus one hl plus a 1234 ff12 and there we have our first six byte instructions the longest instruction i have found so far in this instruction set so that's exciting now i wonder how much of this is going to be reused for the other ld's so here we have these the lde mem comma rr and this is instruction 4f and different yeah so the destination so the source register is encoded into the opcode byte which wasn't the case for the alu operations so i think that we're just going to have to duplicate stuff we can common out this sort of thing yeah so this this is the most complicated bit the rest of it we could probably just you know copy f4 f0 and e0 oh f4 and e8 okay last yeah there's an asymmetry there which is so here's the main opcode table uh we've got the source uh we've got the the various prefix bytes and they're mainly divided into two different banks you've got the sources or three different got the sources which are here and here you've got the desks which are here and here and the registers which are over here so uh yeah so we're actually going to have to pass to in so this is a ix prefix and this is a qq prefix so this is going to be e0 for the qqs that's these these are qq registers and f0 for the ix for these these are going to be the same but for these e8 for the qqs f4 for the ix's and what doesn't it like about line 155 50x okay and now we're emitting garbage this is the these i think these values are wrong e8 f4 no right hang on e8 f4 yes i wanted these ones abusing undo and redo there e8 of these where the destination is an indirect qq register f4 of these where the destination is an indirect index register so what doesn't it like two six eight e0 f0 those are the right numbers so this should be this one add hl4 3 where destination is a memory operation so that will be going down here it's an index thing so if it's falling through here so here's the this is the left hand side why is it generating an e8 sorry why is it generating an e2 this is the wrong prefix byte should be one of these so that should have been generating ea 6 8 so in fact we've got e2 6 8 i'm confused for that e2 we must be following the other code path somehow okay let's add some tracing so e8 e8 e8 e8 hl plus 8 doesn't count it's just these three e0 e0 e0 e8 e8 yes there was in fact no bug i just forgot to rerun the assembler after changing the code uh yes so professional anyway that now seems to be working and we've managed to abstract some stuff out so let's take another look at those ld's we just did these ldw's for the others we're going to have to go look to see whether the things on the left or the right are 16 bit registers or not so let's just add so this is still going to be pretty much the same as before as always if it's a 8 bit register do a thing so let's start at the bottom uh there is method to my madness the top is where the simple instructions are and these are all exceptions so if we go to here these are actually all kind of the same there is a actually we'll do these ones because the left hand side are going to be 16 bit registers so all right now this will fail to assemble yep so if there is a 16 bit register on the left then this is going to be very nearly the same code as in here but the opcode is going to have the actually it's the same as here but the opcode is going to have the destination register attached to it so can i just no sorry i'm getting muddled i'm getting muddled uh it is going to be the same as here where the destination is a register but rather than being hl the destination is going to be the register which is encoded into the opcode and if the encodings are all the same which honestly they look like they are so we've got for the indexing e8 and f4 e8 and f4 hla is f7 a reg 16 is these are special uh we're indexing a reg 16 and no we're yes we're here right e8 f4 for the address it's eb yep direct mode ef yep we can steal all of this code i don't i was hoping i'd be able to just call it directly but i don't think i can because we've got too many special cases but we can probably factor some stuff out and the opcode is going to be current instant dot value as unit eight so here in ld i think we can do omit mem desk the opcode is going to be four zero board with the register and reg and i think that might be all we need to do or possibly not ah we haven't read the right hand side wait a minute i'm getting myself horribly mixed wait omit mem desk so this is emitting the mem desk prefix up to and including the opcode then you need to omit the source whatever that may be and in this case when we're doing alucb the source must be a number so we read the number and we do the thing now in our ld we have in fact i've got myself backwards again because this is not a mem desk this is a mem source so this should actually be here and we are going to want to expect a register actually expect an operand if the operand is not a reg 16 then fail right because we've actually implemented the wrong one we want to implement the version where the destination is a the destination is a register and the source is in memory so that's basically this stuff so what have we got here index eo and f0 yes hl plus a is f3 yes address is e3 yes direct page is not special which yeah i think honestly for this for registers yeah a register for this is special i think i am going to want to make a omit mem source but i don't want to do this because this is special case for the alu because this wants to know the destination so i'm going to leave this code here but i am going to copy it completely so i want to use this for the six but the eight and the 16 bit stuff yeah the thing is this had the ld's have a different set of special cases than the other things but we also have i'm very conscious we also have piles of more of these further down like mold div uh the roles etc so when they okay we don't have destination registers so that we get rid of that bit of special casing for direct page like so so now we have something which doesn't work one three seven yeah this wants to know whether the if the source is a number it needs to know how wide it is but honestly i don't think there's any way yeah let's just leave it without a little error out if we try and call it for the time being right we want to load a memory source into a register well we need to remember what the register is you've done we've read the comma so now we wish to expect the final operand and it's the the opcode is four eight plus the register and i think that's all there is what do we get uh that looks okay actually bchl x minus one blah blah blah bc one two three four and then it all goes wrong right this direct page instruction is incorrect so direct page is e seven n what have we got in mems source right this is actually producing a alu version short form which we can't do so o x e seven followed by yep in fact in every single case the opcode is omitted last so we shall take it out of all of this and put it here okay that's worked e seven one two four eight so i strongly expect that this will also work as well with a different opcode so two eight yep and you notice that i'm using b in bc rather than a than a and hl because a and hl are the abbreviated forms and we're going to have to tackle those next so if i were to just copy this block and replace those bc with hls then it would still assemble some of these forms can be made shorter and actually only one actually this one the direct page form so if the register is hl the operand mode is an address and the value is direct page then we do something special here four seven one two is the short form which only works with hl and here we have the long form nope here we have the long form which works with any 16 bit register and we can do that here as well is here a n two seven in fact there are other short forms we will need to implement but let's just do this one for now it hasn't that doesn't work that's still the long form and that's garbage let's try that there is our short form ld let's just skim down to make sure everything still looks reasonably valid which it does okay so we've done this block we've done not all of this block there's a short form here we need to implement uh where we're writing hl to direct page so if this is wait a minute oh yeah yeah right if the register is hl wait a minute where is memdesk getting it that's looking at the current operand but we just read a register right we haven't actually done any memdesk yet because i don't think they're going to work okay let's correct that one of these in pairs okay um we haven't actually done this one yet no i'm looking at the alu uh so we should be here so the left hand side is a indexed operation the right hand side is a register so we have read the left hand side okay that's what this does so we should be here now we have it's one of it's one of these so the left hand side is an indexed thing the right hand side is a plain register well for start this is wrong the right hand side does need to be a register no oh i hate this stuff i keep losing track of what i'm doing memdesk here is going to be using the the operand data from the right hand side that's just wrong so in fact to make this work we need to take out the opcode the way we did for the source so that we can call expect operand afterwards like this so this will emit the destination part then we read the right hand side make sure it's a register and emit the opcode oh it doesn't like in 261 yep that's the rather misleading error message you get if you forget a semi colon okay okay so hlbc x minus one i y plus one hl plus a one two three four ff1 do good good and now let's duplicate that for b so for mem comma rr is 40 mem comma r is 20 they look all right okay so now we duplicate everything but for a and hl and as before this should work yeah but we'll use expensive forms so so if the we've already emitted the destination yes you see here uh for this direct page form the opcode comes before the address byte whereas for the long form the address byte comes after the prefix byte and we won't know at this point whether we wish to use emit mem desks or not until after we've read the right hand side at which point we no longer have the information needed in the the operand variables to let emit mem desks to do do the right thing because everything's being passed around in globals is there a way to bodge this not really because we can't rewind the emit pointer after we've actually emitted something because it's at least in the final pass the assembler is written to a buffer which may be flushed to disk at any point so i think what we're gonna have to do is read the right hand side now if it is if it's a reg 8 then this is kind of vile and then this will need to be the same so we effectively hang on is that right yeah that's still right so what we're doing is we put the left hand side back into the operand variables so that we can call emit uh emit mem desks so this allows us so is this is this a register and are we writing to an address and is the left hand side direct page then use a shortcut so here is our short form writing a to direct page here is our long form writing b to direct page right it occurs to me that in fact we're using this construction in so many places that let's add a direct page addressing mode here is no we can't do that yes we can so okay so wherever yeah so now we can easily split this stuff up and in fact i won't do this because we get the truncation for free with as part of the cast so here all we need to do is say is the destination register reg a or hl then use the short form otherwise use the long form so this then allows us to simplify this okay so this has yes all this looks fine this is a long form that will be mem desks fault see it did it here for the uh the the short form against hl so uh that's because this is the special ad which i can't remember where it is anymore uh you act test one four ah yes this is long form there is no short form version of this instruction yep and everything else has worked that i can see from a brief skim all right so we have register to memory and i think i've actually missed a yeah i've missed this use case reg equals reg hl and test brand mode is pd page then hl to memories 4f 4f12 there is our long a short form rather and here is our long form okay so we've done lew we've done all this we've done all this including the special case we've we've done this hl comma n hl from direct page yes we have this is a special form that loads a 16 bit value into a register i'm going to skip those for now we've gone to the eight bits that's a typo that should be an n so this is right this is doing stuff with an eight bit constant to memory uh this is the same as ldw down here we haven't done them yet so left hand side is memory we haven't done those yet so we are down here if the right hand side is a number then there appear to be no special forms therefore it is a memdest followed by a three seven actually we've already read that so we're going to have to do the big copy yeah and you notice that this is the case where we don't know whether the whether it's an eight for a 16 bit operation so we assume it's an eight bit operation okay now what does this do can only supply a displacement with ix i y or sp uh quite right you're not allowed to do that should be in a okay and it disassembles into the right code wow three four ff one two okay so we've done this block we've done this block we've done this block correct page comma a two f yep a comma direct page two seven yep okay so now we do the moderately simple ones which are these this is like register to register copies you notice here it's much cheaper to copy to and from a than it is from two arbitrary registers and the same applies with hl in fact it costs precisely the same amount to move it's like b to d directly as it does to do so through a which is kind of odd both in code bytes and in size anyway a comma b b comma a comma c wait that worked but produce garbage interesting so the left hand side is a register which is a reg eight destination is a so so does that go through a mem source i think it must do yes it does okay that was slightly more exciting than i was expecting okay if the uh we know that the destination operand must be a register so if the source operand is an 8-bit register then okay good then we wish to do this so this is the long form destination first source after right and that has worked except it's produced two byte versions of these and also b comma a is the wrong way around a b b a b c good if the destination register is reg a then it is 20 okay operand reg if the source is reg a then it's 28 operand reg otherwise it's the long form a b a a a b b a b c good good right and now we need to do exactly the same thing for 16 bit reg is hl the cal goal doesn't need parentheses around expressions because it's got the closing then keyword i like to brace i like to brace stuff so you don't have to rely on remembering what the operator precedence is but not at the outer level and we're here it's this right this is exactly the same stuff except different 40 and 48 yep and this is 38 hlbc bch lbc de yep that has worked good where have we got to we've done this block have we done r comma n or emit mem source right this is the one where we didn't know whether to emit a 16 bit a constant or an 8 bit constant so where are we using a bit mem source from number of places so the 30 and likewise it is 38 for these a by 2b hl1234 yep good so we've done this block we've done this one this one this one this one this one this one this one this one this one right we've done all the ld's except for this one i don't recall ever seeing this before wow this is load effective address huh i remember spotting one of these instructions in the disassembly with really confused by it but it's actually a thing wow that's really cool uh so what this gives you is a sign extended addition of either a constant one of the index registers which is less useful but this gives you the address of something on the stack frame of this one gives you a sign extended add to hl into a register let's do those there should be simple in command symbol call back a so we are reading a register oh these are not parenthesized so this isn't going to be as simple as i thought we're going to need some custom parsing but they should be relatively simple custom parsing so this is actually a copy of of this code no it's not it's a copy of this code so there's no parenthesis so if we see a same closing parenthesis we don't want to consume it but we do want to set operand value to zero if we see a we want this bit as well actually so read the register make sure it's a 16 bit register set the operand mode set the operand register read the next token if it's a closed parenthesis give up now but don't consume the parenthesis if it's a plus or a minus set the set operand value to the appropriate displacement if it's an a set it to x hl a so here what do we do if it's not one then we take the other code path and assume it's a number so here here we assume that we've actually read the register and we know it is rect register yeah at this point we know that the current token is a register so we just need to check to make sure it's the right register and here we do need to read the next token okay so that means this code gets simplified to read index that makes all of this stuff go away we know it must have terminated with a closed parenthesis like so is this still working looks like it our indexing still works okay so in lda cb we determine what the destination register is then we do yeah no we don't do that we do this if the token is not and a file okay we've now read the entire lda instruction they're not done haven't done anything with it so we should still assemble yep to let put some in let's say hl is ax plus one i minus one as usual bc is hl plus a right so these should just be ignored no expected plus or minus ah we haven't read and checked the uh symbol yet so this actually wants to be right if token is not an identifier or the token is not a register garbage after instruction end of line okay and our lda's have been completely ignored because they wouldn't have done anything with them right so if the operand mode is a index then it's the first one if it's a x hla is the second one so we omit desk reg minus ix this gives us the prefix operand value as print eight that gives the displacement 80x 38 followed by the destination register when it is reg xh a then it's a seven x38 desk reg right and in fact the my disassembler doesn't know how to turn these into lda's it's actually seems to be a bit tricky now i think about it but uh that's right so i will go with it okay right i think that i think that i want to start working down the list at this point because you've done the hard ones and start scoring things off so the we need to implement push and pop otherwise i'm going to miss instructions like i missed that lda so these are the same encoding they're very simple 50 and 58 operand mode is not a 16 and so these are the instructions where that special af register occurs uh it's it's got the same encoding as sp so i'm just going to let it handle both i mean it's easy so all we want to do is to omit the the opcode followed by the uh register yep that worked right in which case we have now completed all of this page now we have this page most of these are trivial we've done lda time for ex ex come ex swaps two registers it comes in two forms which are that and yes that dash that uh apostrophe is supposed to be a register name and i've just realized that terrible terrible syntax so yeah uh ex does not exist so really i mean the parsing is kind of trivial it all depends on how much syntax checking we want to do the reg one is operand reg and two is never don't actually need to do that so will this actually assemble uh yeah that dash so the link the assembler framework doesn't expect apostrophes as part of symbol names i'm going to have to i'm going to have to go away and look up assemblers and taxes to see if there's a standard work around for this be right back so the solution turned out to be simple and horrible all i did was if an identifier ends in a single quote character drop that character on the floor which is horrible but it passes the assembler will now see this as two instances of the af register so okay uh we do actually need to see it check that okay if reg one is and operand reg is reg af and this is the then this is ex af comma af which is simply nine otherwise if it is this then and i need to define reg de as one okay that's worked uh exx this actually gets us onto a whole slew of trivial instructions trivial because they have no syntax and no and it's just you know emit a constant value uh x o a simple one tv meaning a simple one byte instruction and while we're at it we're going to do these as well g i is five nine f e which is a simple two byte instruction values are emitted little indians so the fe goes out first and that's not actually that should be a five eight so we've got l d i r l d d l d d r okay i just realized there's actually quite a lot more exx so there's never mind so l d i l d i r l d d l d d r these are the block instructions l d i is load and increment it copies a value from uh h l to d e increments them both and decrements bc well you can see it here l d i r just repeats it until bc is zero l d d and l d d r are the same but down and the c p are the same but they compare against a rather than copying so f eight f nine f a b c d f right uh however we do need to check for the end of line and i wonder if there's actually a helper i implemented for this now expect token nl and simple to cb is exactly the same but with an emit 16 and no cast so exx l d i r exx l d i r right so we have now scored off all of this page except for these ex instructions that i completely forgot about so back up here to ex so we're actually going to need to store all the stuff in the first in the left hand side so just brand mode use brand mode reg value value so case test parent mode is when 16 and the thing on the left hand side can be anything other than a simple register actually test so it occurs to me that i can actually do this because i know that the destination can never be a constant number so reg eight does nothing and when else goes to it is not equal to 16 so the right hand side of the one of these swaps must be a 16 bit register what this does is it does a 16 bit swap to the thing pointed to by the left hand side save the register put the desk back again those values all right emit mem desk yep emit a 50 odd with source operand reg okay let's add some tests and see how this works and i'm just going to steal and then steal these garbage but i think they're garbage because i forgot to implement them in the disassembler because i should have remembered about these ea five zero so that's a bank four ea five zero yes missing completely this is bank four right that's why i didn't implement them they're not here zero one two three four five they are in five zero they should be here this data sheet is full of bugs i notice a lot of typos so i'm going to make a wild guess that that looks like it's sort of right hlbc x minus one i y plus one hl plus a one two three four ff one two okay right so we've done all these we have marked off this entire page now we've done onto this page which is the alu eight instructions which we've done i think all of shortcut forms have i done the shortcut forms yeah looks like it oh yes that's because these aren't really shortcut forms they're just the same as everything else okay i'm going to call that page done and that page done and now we're onto these so ink and deck these are simple they take a single parameter which is either a register or a memory reference so ink a and deck is identical but different the op codes here are yeah seven in the bottom nibble indicate memory reference otherwise it's a register address right in fact it's seven in the bottom three bits i remember this from the encoding so that's actually pretty straightforward and i want feed zero okay and i expect that to work but ignore them which it has so the implementation we have read the operand now if it's a hmm isn't there a 16 bit register version of this i'm not sure there is ah here it is yep ink work and deck work can i use the same function for both well technically yes i just have to read a register and ignore what kind it is in fact in fact looking at this uh the 16 bit version of ink and deck use the same op codes it's only the memory versions that use different op codes so in fact we do use the same routine for both all we have to do is if it's a 16 bit register then we need to modify the op code that means they're both going to behave pretty much identically yeah we just need to set the bottom bit so okay so operand mode is when when if it's an eight bit register current instruction value as a byte if it's an eight bit register then unset that bit and then so the reason for unsetting the bit is if you use ink work with a eight bit register you get an eight bit increment and the reason for the parentheses is because i can't remember the relative precedents of and or so if it's a 16 bit version we make sure the bit is set and do that otherwise this is in fact a memdest i believe e0 f0 memdest no no it's a mem source it's it's different is it more like an alu come on uh so indexing is e0 and f0 which is the same as mem source hla is f3 again same uh a address is oh e3 that is the same direct page however is different because this is a bear opcode whereas this is not so i think we just want to steal these so address is e3 address the opcode direct page is seven and the opcode uh xlha is three garbage lots of garbage so eight six that should be correct yes might just disassemble it wrong again zero yeah should be here oh yeah that is actually correct uh that's that's done the right thing right the next one is e287 ah and this is wrong should be opcode as you date in fact for these ones we want to do this right how does that look ix minus one i plus one it should also say one two three four ff one two and deck is all the same good now the same but with a w ink hl yep neither all ink w's deck hl and neither all deck w's excellent so we've done these we now have ink x and deck x i have no idea what these are for they seem to do something when you if the x register is set but i haven't come up with a good description of what the x register is it says it's the expansion carry flag or when it gets set i believe it's got something to do with sign extension that's a guess anyway uh they should be simple if there's single addressing mode that takes a direct page they've got uh single byte they have a single byte instruction so they must be kind of important however i have no idea why they only let you do it to direct page so maybe it's nothing to do with sign extension it's something else the opcodes are seven and f so this is even simpler omit the opcode omit the value that looks fine so we're now on to section five most of these are trivial instructions in fact let me just go and add most of them di ei swy halt knob rcf scf ccf swy is the tlcs 90s version of the z80 reset they only get one of them so edc edc zero one two three zero one two three ff right right these ones take an a and only an a they will operate on the accumulator they are effectively the same as these with no parameters but the syntax requires you to specify an a so t a a cpla neg a ob and this is going to be a simple a cb cpl neg so what we need to do here is to read the operand if the operand mode is not a reg 8 or the register is not reg 8 then fail otherwise omit the byte okay mol mol and div hardware multiplication and divide and they are very much the form that we know and love nice and orthogonal you only get to use hl as the destination and you only get to use an 8 bit value as the source so they're not quite the same as a simple alu instruction they're one of the few instructions that do mixed modes so this would have to be a register now we can use the alu stuff for everything except the constant and the parameter so i actually wonder whether we can somehow reuse the existing stuff let me just let's just try this and see what happens i know it all fails so mol is one two div is one three so as expected the first two forms failed but the subsequent ones all worked except for direct page because direct page is doing the same thing that ld did where the uh that particular opcode is used for a constant so it's always going to be alu actdest uh that yeah i just kind of realized that this is going to produce if you try and write you know add hl comma a it will produce the wrong instruction and here if you try to write mol hl comma hl it will produce the wrong instruction the 16 bit register will be interpreted as an 8 bit register according to the numbering so the second one is the constant it's using the destination register to figure out what size the constant is which of course in this case is wrong i think i'm just going to need to copy all this there's actually not much of it in absolute terms so this is going to be all div cb the left hand side can only be hl which is kind of convenient all div cb let's call that so token is actually the same code as here which i should really factor out is not to a 16 and up around reg or round reg is not equal to a to reg hl then the dressing mode so read the final upper end so it can only be an 8 bit register current instant dot value or all right f8 plus g plus f8 or current reg yeah it's the same as just here actually as we're going to take and am red 16 the dressing mode useful error messages are happening to somebody else in this assembler if it's a number we know it must be a 8 bit constant value as you indicate if it's indexing eo 3o follow the opcode if it's an address e3 constant opcode if it's direct page e7 if it's xlha f3 followed by the opcode how does this work looks like it that constant is wrong yeah admitted the opcode again that's better to be this test constant was rather poorly chosen but i suppose any constant would overlap with one of the opcodes eventually right i'm going to assume div works that means that this page is done 16-bit arithmetic we've done all this we've done all this including these i believe rotates and shifts and these are likewise fairly orthogonal even though the mnemonics are a bit weird you also notice that today there is a short form and a long form you can encode a as the short form so rlca rlc a rlcb and these are otherwise all the same with different opcodes right so rlc is a 0 rc 1 2 3 4 5 6 so we've got rlc rrc rl i assume rr yep sra and sll and except this one is marked rll just to be helpful so this is rotates left through carry it's an 8-bit rotate leftwards 8-bit rotate rightwards 9-bit rotate leftwards 9-bit rotate rightwards arithmetic or logical shift left with zero being clocked in arithmetic shift right with bit 7 being propagated logical shift left why is this different from this they look the same but they have different opcodes oh there's more shift left arithmetic shift left logical these are the same operation so shift right logical rotate ah rotate left double this is this is a weird one i don't know what it's for okay well let's get the opcodes set it just counts to oh and there's actually rotate left double and rotate right double one two three four five six rrc is one rl is two rr is three where i missed sla six seven and rld and rrd are one zero and one one and rld and rrd do not have the shortcut a form so now i want to do those i'm just going to copy this lot completely and change these to simple one cp and that's pretty much all i need to do two other than and change the opcodes a a a a a a it goes to me it might be useful to uh rotate cb set the if it detects you're rolling a to use the short form but i think i would rather have the assembler be a bit less smart right test in so so a single operation which can either be a register or a memory suspect operand and i think it's the same as these yes it's exactly the same um i can actually think i can probably just do this i'm not oring anything with the opcode so here it just becomes regular not one bit current instant dot value as you intake yeah that worked so here we have the short form and here we have the long form right so that's taken us down to here now we get the bit manipulation instructions which luckily are also very regular these are different in that they take a bit number as the first parameter which can be a constant value from zero to seven and this gets encoded into the opcode so there is bit which is a eight b eight b zero one eight we've got bit which which tests a bit set which sets a bit that's a b eight res which resets that is clears a bit b zero and test which also tests a bit but in a more complicated way oh it tests and sets the bit interesting anyway that is one eight sadly i don't this is this is the other form of this where the direct page uses the the single byte instruction so i wonder if this is actually a mem source so i think it could be uh what does this mean oh yeah there should be a colon there rather than the plus okay so register is f eight indexing is e zero and f zero yep uh hl plus a is f three yep address is e three and direct page is opcode followed by n that's and it's not a mem source yeah there's too many slight variations of this so i'm actually gonna do regular unop eight bit and type one unop eight bit and i'm gonna clone this as type two so this is different because it's the opcode followed by the operand so down here in bit cb it is expect operand uh you can't use a number in a type two assuming there is such a thing as a type two so i think it's just that ah ah blast set oh set is used by the assembler framework in order to set a mutable value that is an assembler constant which you can then change later okay and now to deal with this so the standard symbols which you get with the assembler are the are these and what i'm going to do is instead of just including that file i'm going to copy it which is not great and take out set it's not actually that useful all right and that's oh i never actually got around to adding the the test symbols okay so failing to type l4 should be again but not this one that should be plausible right this is because i have in fact completely failed to parse these correctly so we need to read a operand which must be a number and the number must be 0 to 7 operand value is unsigned so that will do it and we are going to or that number into the opcode and that seems to have worked yeah i think that's worked okay so that should have given us all the bit operations and oh this is the last page we're almost there that'd be nice i should do this in one sitting okay now we have the conditional operations and things like ret and that sort of stuff and we're also going to need to encode into our table all the different condition codes so now just looking to see if there's an order there are 16 condition codes one of which you cannot actually do anything with when one of which you can't actually assemble instructions for which is kind of weird uh where's the table here we go uh so here are all the condition codes left to right these this one here is the always true version for which there is no encoding instead you just write the the thing you want to jump to or call now this is exciting because it means that these instructions can take one or two parameters and how many parameters depends on what the first one is so yeah and i'm actually going to enter them in this order i'd hope that none of them overlap with anything else so condition code callback this is going to be a thing under define like reg cb it doesn't actually do anything it's just there to identify what type this symbol is and it's eight that we don't want so one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen so f l t le u le p m z c uh g e g t u g t yep p o p and z and c the tlcs 90 has proper unsigned comparisons very proper signed comparisons unsigned comparisons are easy so i'm just going to copy this okay now for the actual instructions jp and call are the same except there are short versions of them that live up here and even shorter versions of them um and also versions that take to do position independent code where you specify a 16 bit displacement relative to the current program counter which also occurs to me that i don't think i've implemented ldar anyway oh we also have jr down here which does eight bit displacements but that's another a difference instruction uh so so this is c d yes c o and d o and this is nothing useful just thinking about how to do that oh yeah uh also the because there's short form jp exists you will never actually use this version of the opcode okay let's put the short form in the top byte and the long form in the bottom byte 16 bit branch so this will then allow me to use d o for calls and so i can get to use the same code for both uh both things the r16 cd implements symbol callback is i'm wondering if i actually just want to put in a t here for this no makes life easier without not to have it that is okay read an operand if it's a new line then it's the short form actually this is the short form is so easy to admit we'll just do it in line like so otherwise it's the long form and the long form you get at via the fe prefix byte leave no here we go additional return instruction must be fe oh no no no no i've completely i've got completely confused oh i'd forgotten about this uh yeah uh the right the right hand side of a jp you recall can be basically any addressing mode so you get to do a computation along with your jump so yes this is in fact more complicated than that so yes you are going to generate the true form the short form is only helpful if it's the if you're doing an unconditional branch to a fixed address okay so let's set the current condition code to eight which is true i completely forgotten about that right all right so if the token is not token nl then if token is not an identifier or it's not a condition code then a usual error message and set the condition code to the value and then read in the second operand okay so if the operand mode is an address and the condition code is true then use the short form current instant dot value to date omit 16 operand value right now you may have noticed that these are not parenthesized also there's this ghastly pc plus something syntax i don't think that's real i don't think they're expecting anyone to actually write pc plus something i think this is a transcription error and they copied it from over here so this is the short form a short range jump it's the same as jr and the z80 you take an eight bit displacement and add it to the program counter when you're actually writing code this is going to be just a label or an address and this is also wrong because this should be calr calr which is listed elsewhere which is the it's the 16-bit relative call so i wonder am i going to have to redo my addressing modes because these don't have parentheses now i've got my read index function as i can call now that will allow me to identify these two but it won't have any of the others i can look at the next token and decide whether it's a number a register whatever yeah that's probably the most sensible thing uh no it's not because i've actually already read the operand i wonder if i think that what i need to do is extend read index to read all the different parameters it basically involves taking taking all of this code and sticking it in read index and having more addressing modes for the versions that don't have parentheses so the things that you can do that have parentheses is uh index x l h l a address and d page it's gonna kind of messy or i can just make the parentheses optional so they're both forms treated the same way but that's not right either okay uh i was hoping to get this done this session because there's really not a lot left but i have in fact just ran out of time so this is a good opportunity to go away and think about it offline for a bit and come back tomorrow for more at which point hopefully i will figure out how to finish this all right see you then so yesterday where i left off i was working on these jump instructions and the problem is that these use a different syntax than the other instructions however having been looking at this i've been wondering whether the data sheet is in fact wrong because if i go over here this is the z 80 instruction set which the tlcs 90 should be mostly compatible with and if you look at uh where is it uh look for the jphl instruction there it is at this hl here is parenthesized but in this data sheet it's not and that leads me to think that these should actually be parenthesized for the indexed operations to say jump to the location at the thing described however the the simple jump to a address shouldn't be parenthesized for compatibility with the z 80 so this is actually a problem with the z 80 instruction set because the what this is actually doing is it's jumping to the address in hl it's not dereferencing hl it's just jumping to hl itself so i think that this is more correct but this is traditional so i think what i'm going to do for this is to just uh treat these it's just to pass these with the ordinary indexed operation syntax so parentheses around these instructions and special case jumping to an actual address so that one doesn't meet parentheses and that will give compatibility with z 80 i don't think there's going to be much tlcs 90 source code around so i think that's probably more important than uh being uh accurate for tlcs 90 stuff so first thing i need to do is to set up my test file again because you might notice this is different this is in fact power pc code because i accidentally overwrote my test file with some power pc stuff i've been working on so i'm going to have to recreate that so if i read in my list file this is the output of the disassembler and i should just be able to nuke all that get rid of this bottom line this is because the disassembler generates cpm format text files that are terminated with a control z and just a quick skim just to make sure that all the hex numbers have o x and the decimal numbers don't and i think that's about right okay uh and i will add some jump instructions so you want to say jump never to the address in hl jump jump never to ix plus one jump never to ix minus one hl plus a and an actual address and i'm not going to touch jf now jr should be much simpler oh yes and call is the same so let's just duplicate this for call and we also need to duplicate the entire block for the other form where there is a where there is no condition code okay now uh the the assembler won't actually assemble because of syntax errors down here but all this is wrong so i will just comment it all out okay so we now have the disassembler so yeah uh m player is not the command i was actually looking for oh okay duplicate symbol during in its oh oh oh dear that's a register that's a condition code can't have two symbols with the same name okay uh this can be dealt with what we're gonna have to do is to remove this and then when we know that we're reading a condition code we check to see if what we actually read was reg c and if so turn it into this that's like kind of awful but unsupported addressing mode like the dean yep okay let's just comment these out for now with the other common character yes that has disassembled correctly good so let me think we need to at this point we need to potentially read a condition code or a register so this point we only care about identifiers if if it is a register and it's register c then what value was c7 and what value was register c is 1 so if if what we saw was register c set the condition code to ccc if if what we saw was a condition code then set the condition code to the value recorded otherwise it's an error and cc reg here is the wrong name so let's just change that that should be condition code callback so this should now have read the we don't want that so we've read the condition code we now want right we now want to read the next token which should be a comma in fact i think we can just do that and then we expect a final operand think that will think that will pass okay so let's put these back again in fact we want to duplicate these with c because c is special 913 unsupported addressing mode so uh oh hang on no i've done this wrong um this is read an operand that will be the an identifier do i want to turn condition codes into a uh addressing mode i think i don't so we're actually going to do this the old fashioned way uh except that if it's not a condition code then it is going to be the destination operand uh yeah we in that case we can push the token back in back onto the stack so so we read a token if it's a if the token we've read is c or a condition code then we want to read the comma and then the target address otherwise we want to push this back onto the stack and back onto the input stream and parse it as an operand cleanest way of doing this okay because we want to go through this route in both of these code paths this one and this one and we want to go through the other code path here and also here um that's probably there'll be a way of doing this as this is that case where block structured programming doesn't really help very well when you have multiple uh when you have complicated conditionals because we want this and this and this to be true or this and this to be true or something else uh we don't really want you know what i was trying to i was trying to avoid having to test token identifier more than once but actually it's just so much simpler to do the the hard way so here is one case i'm still going to end up with the same logic in both in two if branches yeah i'll go back to my first instinct that's still pretty noxious now i am going to do this yeah traditional block structured programming doesn't do a particularly great job of this kind of state machine where you have multiple targets that you're branching to from different states but arranging it like this means that the non cc branch only is only called one place which is here we do end up comparing token with token identifier twice but that's a byte comparison should be cheap so what we want to do here we've looked at the next token and it is not one we care about so we push it back onto the input stream and expect a terminating operand if it's in the case where we have read a condition code what we expect is a comma and an operand and in fact it's not worth abstracting this out for only two subroutine calls is that gonna pass right that pauses so we have now set the condition code correctly it will either be if if no condition code was specified it will be eight meaning true otherwise it will be set to either c for if it's c or a user condition code otherwise and the operand itself will be either a address an address no it will be a number for this form because remember that's not parenthesized or a parenthesized register or index operation and we're not going to support any other forms so if it is a number if the condition code is short form then we can do use this 1a it's 16 operand value otherwise we are going to go for this form which is the eb prefix followed by the constant address followed by c0 with the condition code in it so that is eb value c0 or with the condition code okay now if it is an indexed operation that will be one of these three now e8 f4 f7 is that a standard form so for register so e8 for i think that's standard actually i think it's a standard desk there's memdest gone e8 f4 yep and f7 for xhla yep there is no direct page form so in fact in fact what we want to do is special case this one no we don't mind ignore that okay if it's indexed we do this address will never happen direct page will never happen xhla will be this form which is f7 but we do want to omit the condition code and else bad addressing mode i think that might do it what does that disassemble to mostly garbage oh interesting question mark 8 here means uh condition code 8 in my disassembler which i didn't actually add a condition code for because uh it wasn't on it wasn't on the list here however here we go true if we're going to use any of these extended forms then we do actually need the implicit condition code so let's just edit the disassembler to add that mission code table is here t in okay i mean strictly the disassembler should report that without the condition code but life is too short and if i'm going to do that i'm going to put in the corresponding t here all right so that has actually that has not that that's not reported the parenthesis because um i wasn't i wasn't doing that with this syntax so yeah you see i could put parentheses in here but then that would make but also put parentheses around the simple jump to address form this one which i don't want to do but the disassembler is not really capable of doing that because the the addressing mode is baked into the prefix byte i'm going to leave it it's it's clear enough okay back to the assembler and these are all jumps but it should actually be some uh there should be some calls there so i'm just going to do current instant value as you went a lord with the condition code and now we have our calls and except for that one whereas our call that is one c one a one c otherwise we go through the case code pass we don't need this anymore that is cleaner okay now we have calls no these are all wrong now uh ah i was being clever here being clever never works right because this is actually testing both the opcode and the condition code at the same time but we also need to check the addressing mode as undo all that so if it is a jump true do this if it is a call true do this right here we go short form call long form other calls good and then that is repeated here we've got all the c's we've got trues we've got the ordering here is wrong so we have the short form trues here then we've got the falses here then we've got the c's here okay that looks right back to the datasheet right we've done all these jps we've done this jp we've done all these calls we have not done djnz both forms ret ret i which is a simple one or jr well let's take a look as i put in ret i did not put ret i in ret i is simple it's returned from an interrupt it's a one by simple instruction and it has opcode 1f put that there right ret i is done jr is the next one this is a relative jump with a displacement an 8-bit signed displacement oh i did put the i've forgotten i did done that allow me to simplify things a bit anyway jr is straightforward it's a one byte instruction with zero prefix and displacement uh so and it's going to be using a cut down version of this code make sure this works yep that still works you've got to jump me what i call okay jr jr has as before the condition code for the first place however the operand has got to be a simple number or label now if the so we can calculate the displacement as the current program counter plus two as we figured out from here and i can't remember where i put program counter current program counter is stored in here right the reason why it is dereferenced like that is because uh you can have multiple program counters and the variable actually points at the appropriate one this is if you've got multiple sections so you can go to text segment text section and data section each of them have their own program counters so a positive number is going to be uh jumping forward so let's do that so if d is out of range right if d is out of range then we what we want to do is rather than producing an error we want to fall back to doing a long form which is one of these four bytes so that's going to be eb followed by the destination followed by the opcode now we haven't finished yet this is a multi-pass compiler so it as we proceed uh the code in our generated program may move around which means that this will be recomputed and that in turn means that uh instructions that used to be four bytes long may now be two bytes long but we really don't want it to happen the other way around and the way we do that is if we're on the first pass then don't bother emitting anything but advance the program counter the maximum distance for this instruction just check to make sure so this is the 6303 assembler yeah here we go this is the equivalent code for a relative branch if we're on the first pass then increment the program counter by five otherwise actually do the work so let's go to our test file 3 5 opcode should be operand mode okay assemble uh doesn't like it jrt common near so here we are with jr we've done our t we've always passed through and expect operand that should be a number so here's the 6303 read operand oh no no uh wait this is actually the first time i've used a identifier in this case oh i completely forgot to do this right what it's doing is it's going into read operand here it's seeing yes it's an identifier no it's not a register therefore bail out unsupported addressing mode i completely forgot to put in the full back case which is it's a it's an identifier uh which is actually a little bit more complicated than it looks so what we're going to do is push the token back onto the stream parse it as an expression and value is token number which is where read expression puts the result because that will take care of the different types of uh identifier you can use and still doesn't work okay and what do we get okay well this has actually expanded them all into far calls four bytes long but they appear to be the right value so true carry false true true carry false true then we're at address one zero which is the one being pointed to here so that's right then true carry false true again for the far calls two seven zero should be the bottom of the file yep they've got the addresses right assuming the disassembler is right so in jrcb we want to parse this as a signed integer not an unsigned one there we go and it's wrong but there you go it looks like it's jumped backwards so let's just invert this okay zero eight is the right address true carry false true this is the wrong address and that needs to be an in 60 not an int eight that's better so here we have the near calls here we have the far calls and our target address is 268 which is the right place excellent so we now have jr done now the reason why i've implemented jr to expand branches like this is because this is intended as the assembler for a compiler and the compiler cannot be expected to keep track of how long it's bits of code are that's just not its job so the compiler will always generate jr's and it will rely on the assembler to expand the jr into jp if it's out of range for jr okay so we've done all of this block including this one jrcc pc plus d which is wrong you've done this we have not done jrl we've done all the calls we haven't done this one which is calr calr we haven't done gjnz or red right let's do jrl and calr drill and calr so jrl is 1v 16 bit relative branch calr is the same but it's 1d and these are easy all we need to do is to read a single operand if it's not a number bail out this is always going to be this big so we don't need to worry about passes omit the opcode byte omit the displacement jrl far calr far implement can't remember the syntax for my own language all right assemble let's assemble disassemble jrl 260 calr 260 that's the same address as here so I'm going to assume it right I expect you'll be using this to implement position independent code which you can do on the clcs 90 it looks a bit of a pain but you can do it which actually reminds me instruction I missed this one lda no I did do this one I did do that so I think it was lda r relative addressing mode load here it is this one's a bit special that should go up next to ld so this always takes two parameters but the first one has to be a uh hl if operand mode is not equal to 16 and operand reg is not equal to reg hl at the dressing mode expect the second operand this is going to be a number is not equal to am number at the dressing mode and from here we simply steal this code except this is always going to be seven okay so load the address at load the thing at far should that have parentheses around it does that do a dereference or an ad oh it doesn't add loads the effective address it's the same as lda which I was confused by right this gets this gets the address in hl of some nearby code or of any code so that syntax is correct all right so back down to the last table here we've done all of this we've done all of this we've done all of this we've done all of this we've just got dj nz to go which is an annoying one dj nz is going to be similar to uh j r what dj nz does is it decrements a register which is going to either be b or bc depending and uh if it if the value is not zero it jumps to it makes a short jump to the destination now this is tricky because you can't actually expand this the way you can with jr uh this does a test of b or bc against zero but it doesn't set any flags the z80 is the same so this cannot be emulated by a you know a deck b uh branch if not equal because that will set the flags as a side effect so i'm actually going to not have this or to expand it also makes it easier to code so so we look at the this is actually easier we read at the first operand if if it's a 16 bit register and it is bc then we're actually going down the second path here for the second form otherwise it's going to be the first form so if it's the bc form set the opcode to 198 if it's the second form we then want to read the actual destination otherwise otherwise we already have read the uh destination we just need to make sure that we're at the end of the line like that okay now we check to make sure that the this is a number and then we know that this you know that this must be out must be in range so and this is going to be the opcode be computed earlier okay so djnz near djnz bc near and if i try to do djnz far that will that should fail with an error think reg bc was zero okay near jump out of range exactly what we wanted let's remove that okay djnz 15 djnz bc comma 15 and it's the same address has been used for our near tests here so that's correct that one now works okay this means we only have one more instruction to go which is ret now ret is ret takes a single optional condition code parameter so let's just steal this piece of code djnz ret cb right if the token is if it's c then set the condition code to eight and read another token which should be the terminator if it's a callback so if it's a condition code set the condition code otherwise do nothing if the token is now not a new line then produce an error okay so if the condition code is eight meaning true then it's a one e otherwise it's an fe followed by an extended opcode so that compiles assemble disassemble ret ret f ret good it works so now with luck we have a fully functioning assembler and i would be able to test it if i had any code that would run on it or a simulator which i don't but this is good i think we finished there's a lot of cleanup that can be done you know stuff like this garbage end of lines we can abstract out to make clearer let's just see how many times we're looking at token nl got this yeah so this is three different ways i'm checking for the end of the line in this one we read an operand look at the result in this one we're using the expect helper to check to read the next token and check that it's a new line here we've already read a token and we're checking to make sure it was a new line and here we're doing the same thing that could definitely be cleaned up but anyway that works how big is it so this binary is the calgo native 32-bit code it's eight and a half k of code plus all the tables and i can just disassemble that for you and you can see the the wonderful code that calgo generates it's really not that wonderful i need to overhaul the 386 code generator lots of big instructions a lot of these can be wow how long is that one two three four five six seven seven bytes uh these these addresses can probably be simplified by indirecting through bp down the bottom this is data no these are these are data tables there's lots of data tables in the disassembler this looks like no they're not data tables uh it's hasn't this hasn't disassembled the data tables good but here we have you know lunacies like this huge instructions well i'm looking for a particular one hang on this is ascii even calgo doesn't generate code this bad these should be in the data segment have i not implemented segments and 386 back in i think i haven't okay well let's stop looking at that and let's uh i actually turned off a bunch of the tool chains to speed up compilation so let's put all these back on again okay so that builds the rest of them uh we can look at well i can't disassemble the ms dos version that i don't have a dissembler but i can show you the listing file ms dos which is roughly equivalent the 8086 back end is more recent than the 386 back end and it uses some more some better features it's still not great it produces instructions that look like this it would actually be cheaper to load zero into a register and refer to it rather than use instructions with multiple constants but i can deal with that easily enough how big is that binary it's in the bin directory for all the binaries up and eight eight and a half nine k closer to nine k so it's about it's a bit smaller than the 386 version we've got the thumb two version which i can run size on the thumb two version now generates rather better code than the 386 i did a whole bunch of optimization i can disassemble that one so thumb two is the version of the arm instruction set that has fast 16-bit encodings for most instructions and actually even though the the calcol code isn't brilliant it compresses down enormously the code density is great we've got the cpm version which is 8k so code data tables yeah this this mixes code and data we've got the 6303 version this is the densest of the 8-bit architectures i've got this is a little bit smaller than the z80 version we've got the bbc micro version the bbc micro version generates it's very hard to generate good code for the 6502 this does a reasonable job it's a bit bigger than the z80 version smaller than the 6303 version this is actually for the 65co2 so there's also a 6502 version that doesn't use the extra instructions which is a little bit bigger i've got a pdp11 version uh we've got 8080 cpm but the most the most horrifying one is the nnc gen version nnc gen is the version of the compiler that compiles to c which is used for bootstrapping the compiler and this is the output and it's dreadful this is a data table here's some code so yeah it will actually compile on both 32-bit and 64-bit systems and will work which i was quite impressed by when i managed to make it do that but it's dreadful and if i compare the size of the atarget executable the screen compiled with gcc with optimization off to be honest uh that's a lot bigger than the rather poor 306 version yeah anyway i will actually commit that to this 19 assembler push it so i can't pretend that i didn't do it and call it a day done okay well i hope you enjoyed this video however long it turned out to be please let me know what you think in the comments