 Let's write an assembler again There is some method to the madness here. I Have been working on this operating system Called CPM 65 it's a port of the venerable CPM To 6502 based machines and here it is running on an emulator to BBC master Unfortunately, there's not a lot of software for it. In fact, there are these four commands Asm.com I will get into later. It's the thing in the window over here So there's not really a lot of point having it if you can't do anything with it There is an asm file on the disk which you can type But you cannot actually assemble it So it desperately needs an assembler in addition When I was designing CPM 65 I made it use relocatable binaries because CPM systems are very unhomogenous So having simple memory images that load out a fixed address would mean you'd have to rebuild everything for every platform and I'd also dramatically underestimated how difficult it was to generate relocatable binaries So our assembler needs to both Run on CPM 65 and generate CPM 65 binaries Now I have done assemblers in the past They've always been very heavyweight disk to disk assemblers Which use like proper parsers and they do multiple parsers through the source code reading it from disk each time And it works and they're quite memory efficient, but they're also excruciatingly slow Reading and writing from the disk at the same time on a system with no buffering means you constantly have to seek backwards and forwards across the disk So I am going to today Prototype a different way to do assemblers. We're going to do a RAM based assembler However, unlike the traditional CPM assemblers of the past Which generate a simple memory image at a fixed address. We're going to have to generate a relocatable File so this is going to be interesting So I've got here a That should be that a Boiler plate program written in C that compiles for CPM 65 I want to That's to that and then it will build it's been compiled with the six of O2 port of clang and LLVM it then generates a CPM 65 executable and bolts it into the Disk images so we can run it through this emulator It doesn't actually do anything yet. It opens the Input and output files, and it just copies stuff from the input to the output before closing it But we can demonstrate by doing Hello.asm out Don't know if you can hear the disc noises, but you can hear it seek backwards and forwards That's because I haven't saved file. Okay So you can see that this is showing that we have 22 odd K of memory With no actual program. However, I'm not going to use the emulator this emulator to test with because I have written a User mode emulator with a debugger in it so I can just run asm.com like this In this emulated environment we have loads more memory and If I do Here's a random file and It should have done it It has copied this file to this file and the size is a bit different due to the way CPM does blocks So all we need to do is just fill out the rest of the assembler so what we're going to do I've been thinking about this is Read the source file and parse it Construct in memory a data structure representing the source file We can then do multiple parses through that data structure before writing the result from memory to disk So we read the file once and we write the output file once the We do not have a heap all we have is an array of bytes from There we go from CPM RAM to RAM top so the This value It's just what you get from the difference between those that that's our available workspace We're going to have to put into this All the data that our assembler is going to need so including symbol information information on the the instructions we've actually read and Label information that we're going to use to actually generate the The resulting machine code We're going to make some simplifying assumptions, so we're not going to support the Where did I put the so this is the test program That should be assembled with I think LLVM moss We're actually going to change the syntax to make it easier to parse a little. I think this is actually all perfectly fine So we're not going to be fully compatible with the existing assembler. It's going to be simpler. It's not going to be a macro assembler It's not going to support stuff like arbitrary segments and so on Also for the purposes of this prototype, we're not going to support full expression parsing So an expression has going to be either a constant number or The address of something plus or minus a constant This should allow us to Demonstrate that the thing works so the place to start is always with these things the lexer What this is going to do is read symbols Well read words from the disk into a Pars buffer we wish to save memory So because we know that we are The parsing and writing the result will happen at different times we can reuse the output buffer as a Pars buffer, so we're going to have a Function that reads a single word from disk Into the pars buffer okay, so that was the really boring bit now we have to start actually thinking about how things are going into memory and The way we're going to do this is We start at the bottom of memory and we're going to call this Top and we are going to write a series of records I'm just reducing the syntax right this was the bit I was interested in So going to write a series of records Represented by yes, I do want this to be a struct this will contain the length of the record Because this will then allow us to walk the list to find specific records This is basically going to be like a sequence of Instructions that basically each instruction will turn into one of these We are going to So the simplest record is just going to be a sequence of bytes fact, we don't need that at all so One of these records means Amid these bytes as a literal into the output file We will be using this for constant like this also Instructions that do not contain any interesting parameters We know as soon as we see it how this is going to be encoded We do it like this so that we don't end up storing masses of information trivially This for example, we don't know how message is supposed to be encoded until we see it Which happens later So we're going to have to store this as a reference to something but this is Just bytes now for these This is going to be an instruction with a variable reference or rather an instruction with an expression I said earlier that expressions can be a constant value a Label plus or minus constant. We're also going to have to add High and low so So what this is going to go in what this is going to have in it is the opcode and The expression itself that is going to be a pointer to the variable There's a reason for having it as a UN 16 which I will explain later and the offset and Also, I have completely forgotten that to go alongside the length. We are going to have to have What kind of record this is? so in fact, we are Going to encode the We're going to encode the record to type and the length into Desker here All right So what are we going to do for parsing? So this will be a loop that will It will attempt to pass instructions from the input stream So this is actually fairly straightforward. We need to read a token Did I make that a char or a? You know, it's a charm. Okay. If it's in the file Stop I got that union all wrong. I'm gonna have to read go look at that again. But anyway So the things it can be at the beginning of a line is a Instruction that's and be an ID or eventually a Dot pseudo op at this point. We know that Yes, I know I used to go to It's because you can't break from the for loop from inside the switch statement That is perfectly allowed Okay, so we know it's an ID What we're going to have to do now is to go look at the The Contents of the buffer and try and do something with it So essentially we can need to find out if it's an instruction If it's not then we know it's a symbol reference and you know, just do the appropriate things Otherwise give up those warnings are because better Okay, let's go look at these records again. Now what I did wrong here was that I made made this a union what this means is that the The outside record is going to be the largest of the union members in length Which we do not want because we want to take we want to pack these tightly. That's why I did that Let's just do So this is going to be a minimal record This is going to be our bytes record. Can I do that? This is going to be our expression record. I can do that nice. Oh, yes, and one more that so Once we finished parsing we want to add a EOF record to the bottom of the list So this is going to be Very simple like so So this is demonstrating how to add a record to the The buffer. So if you build that we run it, that's not working We don't need EOF record. You can just do this So we get unexpected token here because it's just read an ID and we don't have any of them so if we actually copy this to a new file and we just put LDA9 in it we get a I forgot to do a thing comment. I'm going to use the backslash character as a comment, which is relatively standard in places So if we get one of them, we just want to keep reading until we hit a new line or the end of the file so Okay, that should be all we need So if we take our test as an and we comment out that line then Nothing should have happened because it should have read one token, which is EOF We got a 59 Which is a semi colon. Oh, it's the end of line Okay, that's fine. We just go around again if we get one of them That means we've reached the end of a statement Good So we've now successfully parsed our empty file So we can now say So our end of file has given us one stored byte in memory We are now going to have to start thinking about actual instructions So let's put that Let's go for a RTS Unexpected token, right? So over here I have a table of The 6.02 instruction set this is not sorted by opcode the It's sorted by instruction encoding the fields of an opcode are divided into these Each opcode byte is divided into three fields a b and c B is used mostly for the addressing mode while a and c These two fields here tend to describe What the app opcode actually is so for example? LDA is C equals 1 a equals 5 this row You see all these are LDAs and They be changes according to what addressing mode they are so In order to parse an instruction we have to read the instruction So for example in our case that was a LD why No, that was yes, of course hindsight that's not a brilliant example because this has no addressing modes, so let's pretend that was an LDA We read the fact it's an LDA, but then we then don't know what opcode it really is Until we read the rest of it This does actually simplify parsing a lot because you notice that all these instructions Except for STA here are very similar. They all follow the same model So we only need one routine for dealing with all of these except for STA Which is nice So instruction matching we are going to So I'm gonna have tables of instructions, so These are going to be each one of these is going to contain three characters worth of Opcode and one byte worth of payload. So for the simple instructions We've got Brooke, which is a zero the HP, which is oh eight not that CLC which is one eight not that not that that You notice there is a pattern here. I think I've missed two zero. Ah, that is not one SEC is three eight. I is four zero HA is four eight CLI is five eight RTS is six zero six eight SCI is seven eight. I missed that one A is nine eight. What's eight eight? eight eight DUI TAY is eight and the next one will be B eight B8 CLV Yes, C8 INY CLD D8 E8 INX SED F8 Okay I should add the reason for these patterns is because This the the opcodes were very carefully chosen to make it easy to implement in the silicon so All these values will be wired to the various logic gates that basically cause the Processor to do or do the right thing. Oh There's more 9a that's an exception and there's these as well But these are actually part of The logic op instructions, so I don't want to implement them right here Okay The in a function to actually match one Really, we want to return the opcode but We also want to be able to distinguish between a map a proper match and a failed match FF is illegal. So let's return that and we also want to tell it where to Stop so this is going to be a helper that given an instruction table will scan the The table Attempting to match what's currently in the pars buffer. So you just want to make a few Slight adjustments. So we're going to store the token length in a global Okay, I don't think we've got to lower. We've got to upper But not to lower Fine Let's just use uppercase So for each opcode Now for each one of these I think this will generate decent code. I don't think it's worth Unrolling this I didn't think it's worth turning this into a loop. Yeah, this means you've got a match So we return the opcode value Otherwise go on to the next one therefore Otherwise, it's a illegal instruction like so right now you want to do something with it. So if The reason why I wanted the token length be public Is it if the token length is three then this might be a An opcode we know that if the token length is not three then it cannot be an opcode so So this is actually going to be The address of the last array member We're gonna have to define length of the total size of The type divided by the size of the member of a type the member of the type so if The opcode is not illegal This means we have found An instance of this instruction. So we're just going to print it for now and Then we continue from the We do not continue from the top We now want to match a New line so we're just going to do a break So here we're going to want to read another token and then we go around again So now if we run it. Ah, right It printed 96 Then it printed one bytes of tokens. So 96 is 6 0 which is Where's RTS on this table, this is the wrong table. Okay, hi 6 0 RTS Good, it's the right one So output byte is going to do the work of adding a bytes record to the to top I'm going to want to be reasonably smart about this So this is going to add a record of the appropriate type This is going to emit a simple byte the reason for doing it like this. Oh Yeah, I know we need to change this to be record add record takes care of Advancing top so right now the reason why we're doing it like that is Here we are going to keep track of Whether there is a half completed Byte Object on the at top. Yeah, we can do better than this. We okay, so We're going to wipe all our memory from RAM top to So all our RAM will be initialized to zero Okay, so when we add a record we are actually Going to look to see if Is there a bytes object on the top of the heap if there is And there is free space in it each record can be 16 bytes long Hang on I'm doing this wrong This conditional actually wants to be down here Okay, if there is a partially completed record If there is a partially completed bytes record on the top of the heap then We take the length we advance top and we update R So this will skip past any half completed record and return a new record if so when we want to omit the bite if the thing on the top of the stack is Is Not a bytes object. We have a bytes record byte record record desker If it's not a bytes record, then it's going to be nothing therefore add one which is of zero length if if the If the length of the record Is 15 bytes and this record is full and we need a new one So we actually want to do this again. So this is going to be like so This will then create If if the thing on the top of the heap is not a bytes record or it's full Create a new bytes record Then get the length right to the byte Increment the length right So what this should do when we do omit byte it should create a new bytes record Put the byte onto it and then when we get down here add record We'll close the bytes record and append a EOF record bad token 12 12 See that's not so great. That suggests that in fact We've corrupted at something it has not read any tokens If I get rid of this mem set does it get better? Yes, it will get better because in fact ram top is the top of memory you want to start writing here There we go Two bytes worth of tokens. That's not right. That should be three bytes So what this should have done is added a record Of zero length so it hasn't advanced top at all, but it's advanced it over. It'll advance it over any previous Record but there aren't any it creates a record This should then Write a byte value To the right place That should be a one all records must be at least one byte Because of the desk or field so this should now give us three bytes Two bytes That's not working because this is actually advanced top by one Which we don't want to do so in fact. We're just going to Do that to bump the field correctly to bump the length correctly right Three bytes of tokens Good, so I think that is working How are we doing for size? Yeah Could be better So we are now emitting simple byte operations. We know She actually probably dump this Okay, that's actually just going to dump the The records buffer into the output file So that when we text dump it we can see that that's an octal. Why is that an octal? Here we go So here is our records to The type is bytes It's two bytes long and that's a zero that should not be a zero That should be a six zero Well, and things are 96 is there This is our end of file record is zero one One byte long of type zero meaning you F Here so this suggests that It's right into the wrong place So the first time through We add a record of length zero We bump the length so it's now one this will be one one in memory We write the value. That's why I forgot to take into account the length of the record header There we go one two six zero and if I change my asm file to put another RTS in and run it six zero six zero, which is exactly what I want the reason why I'm being so anal when it comes to combining bytes records is I Expect to see quite a lot of these and We want to make storage of raw data as efficient as possible Because there's going to be a lot of it Okay, let's go with Full ALU instructions So these are This column essentially I need to find that table again this one So that's going to be these luckily. There's not very many or a One See SDA is a bit special because of course, it's right only so trying to store a store to a Immediate makes no sense We're looking to ignore that for the time being LDA a1 Cnp c1 to be C E1 Okay so if it's Not a simple instruction One thing to think carefully about So I'm passing in the start and end of the table here The extra code needed for passing in the end of the table and doing the comparison here It may actually be faster and smaller To waste an entry at the bottom as a terminator So, you know if the first character of the opcode is zero then just give up We'll look at that later. In fact, this is going to be ALU because that's going to define the How we're going to change the opcode Once we have resolved the argument so here We actually want to pass The parameter Wonder if there's a way we can get away without it for the name being so you need to pass the argument Which will put the result of this somewhere then we're going to add our record and Copy the stuff in whatever it's going to be We can reuse the same type for lots of things So this pars argument is going to have to record what type of argument it is so that it can correctly set be and also If it happens to be one of these or Reference to a absolute address which it knows directly that it actually wants to emit bytes So looking at this table. There's actually some annoying exceptions. So For the ALU instructions immediate are at B equals two for these It's B equals zero So we're going to have to have multiple argument multiple pars arguments routines. This is actually going to be pars ALU argument This is going to return B So the cases when we actually want to change the dressing mode after we resolve all our symbols are Basically turning absolute values into zero page values Zero page values take one byte fewer to encode Also, we're going to have to emit relocations for those so We don't need to do it for this Because these are always two bytes or this or this and in terms of instruction encoding This is identical to this. It's the opcode byte followed by two bytes Likewise, this is the opcode byte followed by one byte So in fact, we can turn one of these into one of these by just clearing a bit and it's Always the same bit It's the bit with value to it goes from 111 to 101 or 011 to 001 so that is in fact So that is going to update the opcode to be the complete opcode Pars ALU argument is also going to have to decide whether The argument that's just been parsed is a simple value like one of these or One of the others that uses a constant or a complex value Which requires the full expression record so we're going to have That's going to be a reference to a variable for and value actually actually Value offset can be stored here in token length. So that's going to be Variable so if token variable This is a reference of something that's using a variable Otherwise It's a simple value and we can do this from bytes So in its bite opcode Amit bite value low bite Amit bite Value high bite and the let's just This to be token variable zero token value equals Now run it Four bytes of tokens and what did we get? Nothing because I've got to actually put anything here So that's going to be LDA and Nothing you haven't that we're not actually parsing anything yet Unexpected token break at the end big to indicate that this is the end of a statement Seven bytes of tokens and we have a Single bytes record containing RTS a one is LDA One of the LDAs followed by our parameter Followed by another RTS followed by EOF excellent Okay, this is beginning to make progress and the size is Yeah Right the next thing to do is to Actually start parsing things Actually before I do that I said earlier about Using terminators here rather than a end pointer. Let's actually put that to test How big are we? to 407 bytes That work I've taken a break and my windows have moved very slightly Okay, that should add an empty element and that should add an empty element So that should have put us up to 243 30 15. Yep. So this now becomes like that like that How big are we? Quite a bit smaller. Yes, interesting Format everything Okay. All right, let's have a look at parsing things Let's go for Um-hmm. Let's just add a another helper here Go here actually That's it Char here It's reading a character from the from the input stream. That should be a char and That is not a char. Okay So we're going to do By which is very simply Because in our parse code, we are going to want to figure out what we're looking at next so If it's a constant value Then we do read it and we read in a constant and then we return The b value for constants is Two we want to shift it left to pressing mode so this is going to be read token C is not a Token number we were going to extend this to deal with variable references later That will have put the value into token value So that should now work Change this to LDA 23 now this is actually going to do the wrong thing unexpected garbage at end of line So what is it complaining about it should have fetched our constant? And that's has put us at the end of the line hmm or so I'm actually going to need to change this a bit. So the reason why Actually, I can do this slightly more cleverly is that We only want to omit a single byte of payload as constant bytes if It's this column We might want to extend this in the future to this column and this column because I'm assuming these are returning Known constant values in fact that we do so if B is 1 2 4 because this has to be in zero page It's there should be brackets around it, but this table doesn't show them 1 2 4 or 5 So in fact the only places where we don't are 0 3 6 or 7 It's four of each that's convenient. Let's stick it like that for now So are we getting this unexpected garbage end of line is actually here 35 that's a hash sign So it must have been through here. We have not been through here. It has not found our instruction Did I get this wrong? Out, okay, so it is searching for LDA in ALU instance Should have found it. Huh? That is wrong. That should be in some name zero There we go bad addressing mode It's not seen the hash sign. I got a space Yes, okay That's because we're peaking a bite rather than an actual token. I'm gonna have to do this differently So this will read and consume a token So if it's a hash sign It's already been consumed Therefore expect constant will have to read another token if however, it's a ID or a number Then we're going to have to deal with it immediately Anyway, this should Solve our immediate problem. There we go. We get a nine Hex seven two, let's change our test file D e that is not the right number Yeah, that is very not the right number. That's our F e Okay, our Lexa up here is that will probably do better I'm Misconverting character into a digit Still wrong. We are starting with token value as zero, which is correct We multiply it by the base which should be the 10 or 16 And we add on the digit this entire expression is complete is bogus Okay, do this way around because we're going down in Numerical ASCII order So at the top is lowercase, then we have uppercase then we have digits Oh five, right. That's not 5d. So the D is Very wrong. So C minus a Yeah, if this is Lowercase a Then D minus a will be 4 plus 10 is 14 or 13 5d. Okay, right that is Moderately incompetent, but still kind of but now it is kind of working. Oh all right, so if it's a Number it must be This form this form or one of the x indexing forms I've got to the other table This also does not show the correct The correct mnemonics ADC should show them Here we go right indirect is Yeah, if okay, if we see a natural number it must be Zero page zero page comma x absolute absolute x absolute y it cannot be Indirect anything with a parenthesis or an immediate so go back to our original table So this is indirect. This should have parentheses This should have parentheses But none of the others do so if this is if we are seeing a number at this point This will eventually become a parsed expression Then it must be One of these so the next thing we want to look for is A comma I am not sure we can do this without adding look ahead to the parser That is being able to cache a token. I don't really want to do that because he can't It's it's like unwieldy in the cache to Cash these things as well, so I think we're actually going to have to do that if you want to say Is the next token a comma? If it's not a comma, we don't want to consume it The thing to be aware of is that peak token will actually update various token variables so if the next thing is a Identifier or a variable Then we'll overwrite the number that we've just read here, but in that situation It's going to be bad anyway invalid So let's just peak it if it's a comma then It's going to be one of these forms. So we want to know whether it's comma X or Not if it's not and the value is small then this is a Zero page form this one That's B equals one otherwise It must be in the abs form B equals three so If it is a comma Then it must be an X or a Y if it's not Then that's a failure The token lengths must be one or is this not an X or a Y how to arrange this let's go for this one if see is the token ID and The token length this one Then we look to see what the value is If it's an X This is the This one abs or zero page comma X Zero page comma X is B equals five Abs comma X is B equals seven If it's a Y Then it must be B equals six because there is no Zero page comma Y option Otherwise, it must be bad. Did we get errors? We did get errors. I should have a return in front of it Okay, so let's edit our test file. So we have LDA So that has actually produced a five five five oh Yes, we did not modify this stuff down here, so zero page zero page That one bite one bite one bite two bites one bite one bite Two bites two bites. So let's see one three six or seven so a five five five a D e five five So a five is LDA zero page a D is LDA abs. Good. That is working So let's go for Go for that right expected X or Y So I mean there's plenty here. So I think we got to here. I Know what I did wrong So we have peaked the comma, but we haven't read it So we do want to read it good. So we have LDA e five five Bd three four one two Bd abs comma X that is correct B nine Two one four three B nine. Yeah, that's abs comma Y. That's correct B five one two. Don't think that's right. Oh wait, that's the end of a bytes record. So yes B five Then we start a new record one five that is Wrong value zero page comma X is this one. So it's got the right opcode. It's got the wrong value Followed by B nine Two three Oh, oh, which is correct because this is the 16 bit abs form there isn't a zero page version of that Oh, hang on. No, no B five one two is the opcode Followed by the the operand the address zero page address then we get the bytes record Then we get B nine two three. Oh, oh RTS and a file That is working Okay, right the next thing indirect This is this form Or this form And syntax It's not entirely obvious Yeah X comma in it's got the parentheses around the whole thing in comma Y does not so We consume the parenthesis. No, we've already consumed the parenthesis. We now need to read a number That's who read number It's not a constant You can use an expression there that puts the value into token value we now read the Next following token it can either be a closed parenthesis or a comma. So if it's a closed parenthesis Then the next thing must be a comma and here in fact If it's not a closed parenthesis, then it's going again. It has to be a comma So this should read the X or the Y. I think actually I'm going to put some helpers in So this is going to return a if it is X or If it is Y Return otherwise produce a error So this then gets simplified to expect X or Y And we know it cannot be anything else So here we can say Z equals expect X or Y if C is not This is closed parenthesis comma Y. So it's this. So this has got to be a Y So we know that this is now this column, which is B equals 4 right this one round We have the comma Then we have the X or Y This has to be an X Therefore this must be This one We need to consume this closed parenthesis Oh, yeah, and after we have done that We need to Consume the closed parenthesis there All these expects can be commoned out with a great deal of ease Yeah, Klang doesn't know that fatal does not return Probably There's probably no attribute for that Okay So where is our print i coming from we've got some debug tracing Not there Here Here right that looks like it's worked Let's double check B1 1 2 is B1 1 2 indirect comma Y a 1 2 3 a 1 in direct comma X. So let's just put a This in see what it does Bad addressing mode. Yep, and let's change this to a number Expected X or Y good Uh, yeah, we're going to put line number information in eventually Right So this gives us parsing constants in ALU Operations We're also going to want to put labels in and symbols So let's go down to here parse So if it's an id then it's an instruction and we go down the instruction code path If we reach here Then we know it's not an instruction Therefore the only thing it can be is a symbol definition But yeah, it can only be a simple definition So we have the Currently the parse buffer contains the name So we need to add a symbol. We haven't done anything to the symbol table yet So that's just going to That's just going to add a symbol That does actually assemble. Okay, so record management And symbol table management Our symbol table is going to be really simple What we're going to have let me actually just define that is Hmm actually and I might change my mind about what I was going to do there So what I was going to do was have a record which contains the symbol type The variable value of it that is if it's referring to another symbol The offset And the next symbol in the chain This allows us to Do a fast backward search Starting with the most recently defined symbol This will be important when we want to implement scopes that will happen a lot later and probably not in this video the other thing we could do Is to walk the the records forwards Starting from the bottom of memory This means that we don't need a next variable It means that looking up symbols is slower because we have to look at every single record but Because we're walking the chain in that direction. We can't implement scopes by simply uh Moving last symbol here to the last symbol of the previous scope Thus making all the symbols defined since then uh inaccessible Okay, and of course the name Let's leave it like this and Let's make these all packed on the 6502 You shouldn't need any alignment, but I have observed that uh LLVM must up will occasionally to align things which is kind of strange So we take a look for If there are no symbols to find Then you don't find anything. Okay. We start with the last symbol Get a pointer to the symbol Uh certain thought certain thought Our records are limited to 16 bytes The record header can names one two three four five six seven eight So that means we get eight byte symbols, which is not enough I can change this easily enough. We don't have very many types three bits will do So we then also do need to go through and change a whole bunch of o x f So that actually wants to be an e one Okay, that means that a record can be 32 bytes So 32 minus 8 is enough name So this should get the length of the Do we have an offset of we should say pointer So this gets the length of the string that we'll want to look up If this is different from the token length Then it's obviously not the right symbol So we only need to actually do any kind of comparison if they match and to do the comparison. We are just going to Buffer the name and the length. Okay, this means that we have found our symbol Return r otherwise Go to the next one on the chain Just like that so in fact We're going to change this to While s is not null iterates like that Right adding a symbol If it already if there is already a matching symbol Don't add it. There's also going to be a separate maybe add Then is going to be The length of the token Plus actually The length of the token Plus the overhead we can then add a record Which is going to be the record symbol Or the length we copy name in The fields inside the symbol Will automatically be initialized to null So we have now correctly added Created a new symbol We now need to know what we're going to do with it To do that we need to look at the next token If the next token is a colon, we're defining a label The symbol needs to be initialized to the current program counter If it's an equal sign This is a normal variable So we're actually going to read a number and initialize the This needs to change to expect value actually Like so and in fact, I've forgotten a bit in Add symbol which is We need to initialize Uh Or we need to add it to our linked list So how's this going to work? Let's get rid of our instructions. We're just going to keep an rts in place And we are going to define A value All right, I forgot to put Equal signs into our lexer Unexpected garbage at end of line So we should have read the number We have read and consumed the token, which is the id We then read and consume the Thing after And then the number Oh end of the file That is a valid Expression terminator Okay, 18 bytes of tokens So uh 2260 is our bytes record Two bytes in it Followed by 6d Which is a symbol record 00 is the variable reference 01 It's easier to see is the value which is One Followed by Wait a minute, wait a minute Flags Yeah, okay flags Variable reference value Next symbol Name F N O R D Byte record rts Finish That looks good So the next thing Is Uh, well we've we'll have various different types of symbol Such as label definitions that will be defined to the current program counter Uh Label definitions where we have not seen the label yet Constant values like this one. I don't think there's anything else actually That's nice Oh, uh, and we're also keen to keep track of what Memory area the variable is referring to So it can be in zero page Or in the main memory Or be an untyped constant We need those in order to generate relocations and that will be Done there The other thing is we're actually going to need to start parsing these so these expect numbers should actually turn into an expect value So if it's a number Then it's a number if it's an id Then we need to look the symbol up And return a variable reference But I think i'm going to do this next time because this is a multi-part series I don't want to make these too ridiculously long See you next time