 Greetings and welcome to part 9 of implementing a 6800 processor on an FPGA using NMyGen. So, as we've seen from the previous videos, we have implemented all of these instructions here, and so the plan is to implement the rest of the instructions with the exception of RTI. Wait for interrupt and software interrupt since that is going to deal with interrupts, which we have not even had a signal for. So, we're going to leave that until later. We are going to implement, subtract B from A, compare B to A, transfer A to B, transfer B to A, digital adjust accumulator, that's going to be an interesting one, and add B to A. We're also going to implement the stacky type things, which are transfer the stack pointer to X, increment the stack pointer, pull A and B otherwise known as pop A and B from the stack, decrement the stack pointer, transfer X to the stack pointer, and push A and B onto the stack. We will also be implementing return from subroutine, and return from subroutine will have its associated instructions that we will implement, branch to subroutine, this takes a relative value, so these are short branches, and jump to subroutine. These are actually call subroutines, one is indexed and one is extended. Then we will implement the load S and load X instructions, and finally the store S and store X instructions. Notice that these instructions right over here, STS and STX are undocumented, and I'm not actually sure if they accomplish any sort of storing whatsoever. In any case, those are the instructions we're going to start writing, so let's get down to it. So subtract B from A and compare B to A, these basically do the exact same thing as the sub instruction, and the test instruction, except they just subtract B from A, and in the case of compare B to A, it doesn't store the result. And here we can see the implementation of subtract B from A and compare B to A. It's basically just feeding A and B to the ALU and performing a subtract action, and if the low bit on the instruction is zero, it means we're doing a subtract rather than a compare, which means that we have to store the A register using the output. That's done, and formal verification is pretty much as expected. We have two instructions here that we are going to be formally verifying at the same time. There is input 1 and input 2, and all of this code is pretty much a copy of what was in the formal verification for subtract. Obviously we don't want B to change or X to change or the stack pointer, and we don't want to read or write anything because this is just based on registers. We want the PC to have incremented by 1, and in addition, if we are doing a store, then we want that store to happen, otherwise we do not want it to happen, and finally make sure that the flags are correct. And so as we can see in the output here, SBA and CBA have been formally verified correctly, so that is done and dusted. Let's quickly implement the transfer A to B and transfer B to A instructions. Here's our implementation right here, so we're just combining the two instructions. If it's transferring A to B, then put A on source bus 2 so that the ALU has access to it, because we're going to be doing the LD operation on the ALU, and the reason is that the flags are set in the same way that LDA would be, or LDB. So that would be the zero flag, the negative flag, and the overflow flag, so we can do that. And depending on whether it's transferring A to B or B to A, we simply perform that operation right here. Formal verification is no big deal, we just make sure that nothing that is supposed to change did change. The PC incremented by 1, this is the input, it's either A or B, depending on whether we're doing a TAB or TBA. This is the output, again it depends on the instruction, and we just want to make sure that the output is now equal to the input. And of course that the other register that we retrieved from didn't actually change, and we want to check our flags. And it should come as no surprise that formal verification did work. Let's quickly polish off the ABA instruction, add B to A. Interestingly, there is no AAB instruction. It's a pretty straightforward implementation, we're just going to use the ALU to add A and B and store the result back into A. And again it is a two cycle instruction, just like all the others are. Formal verification is exactly the same as the formal add, except of course we don't have all those modes to deal with. So we have our two inputs A and B, and we want to make sure that nothing changes except A. The PC has to increment by one, and the addition has to be correct. And finally we want to make sure that all the flags which are set during addition are set correctly. And again formal verification works right away. Well hey, it looks like we've got just one instruction left on this row, Decimal Adjust Accumulator. So let's take a look at implementing that instruction. But first, a bit about binary coded decimal. We represent the digits 0 through 9 in binary the usual way. However, incrementing a BCD counter from 9 results in going back to 0 with a carry out. But if we use our ordinary 4-bit binary adder to add 1 to 9, we end up with A. That's not a BCD number. This is where decimal adjustment comes in. For the value A, if we add 6, we get 1 0 in hex or 10 in BCD. This will work for all the decimal additions you'd want to do to one digit. We just look at the result and its carry to determine if we need to add 6 or not. And don't forget that the original addition could have a carry in. So now we know how to adjust one digit after addition. If the value is A through F, add 6. And if there was a carry from the original addition, add 6 also. Now we have 8 bits or 2 BCD digits in every byte. However, notice that if we add 6 to the low digit and it's A through F, we will get an additional carry into the high digit. So effectively, the adjustment to the high digit must also take place for 9 in that case. And that's how decimal adjustment works. Let's try an example. 99 plus 1 becomes 9A. A in the low digit means add 6, and 9 in the high digit with A in the low digit means add 6-0. The result is correctly 100 in BCD. Now try 99 plus 99. This is 3-2 with half carry and carry set. Half carry set immediately means add 6, and carry set immediately means add 6-0. Note that we keep the carry in the output, and the result is correctly 198 in BCD. Now, does this also work after subtraction? No. And in fact, decimal adjust accumulator on the 6800 isn't meant to be used after subtraction. There is no decimal adjust instruction for after subtraction. So here I've basically implemented what I showed in the animation, where I have an adjustment signal that I'm going to load with... Well, it starts off as 0, and then I can load it with 6, or 6-0, or 6-6. I look at the low nibble and the high nibble, and I determine whether any of them are 10 or more. And then this just carries out the logic of the adjustment. And one final note here, and originally I didn't actually have this last part, and it was failing formal verification, so I finally realized what the problem was, that when you have an original carry, you still need to propagate that, because if you have, say, you know, 100 when you're adding, well, the output should be 0 with a carry, so that's 100 as well. So in order to formally verify that, I sort of hacked up something pretty... I don't know, not exactly Rube Goldberg-esque, but in any case I have this extra signal called TestDAA, and if it's high, that means that we go into this else, which will test the specific case of a decimal adjust accumulator, where you're actually testing over two cycles. The first cycle you're doing an add with carry, and the second cycle you're doing a DAA. So I set up all the assumptions here, where basically the past function has to be ADC, and the current function has to be DAA. Also, the current input has to be the same as the past output. Also, the inputs which are past, that's over here, the inputs all have to be valid BCD digits, and what we're asserting is that the input decimal, which is over here, it's just 10 times high plus low, the inputs added together have to be equal to the output in decimal, and of course the N and Z flag have to be correct as well. And out decimal also includes, right over here, you can see it includes the carry as the 100's place. Also, I made a cover statement over here just for fun to see what it would come up with in order to have an output of 142. And the answer seems to be right over here. So the input was 62 and the output was 80, which of course we know in decimal is 142. So the hexadecimal output was E2, and there are the flags over there. They get propagated over to the next cycle, and the input is E2. The second input doesn't matter because we don't pay attention to it, and the output is 42, and that one at the very end means that there was a carry, so 142. So it decided to do 62 plus 80. You can see that the adjust over here was 6-0. So anyway, that pretty much shows that any addition of any number will work. Now, one interesting thing is the way the programming manual for the 6800 describes DAA. It's got this huge table in it, and as we've seen and as some other people have pointed out online, this huge table is overly complicated and not really needed. It's certainly equivalent, and it also is a little more strict, and that it only takes into account when there's a carry and 0-3, where we just say, well, you know, if there's a carry, it's actually impossible to have a lower digit of 4 and so on, unless you've improperly used the DAA instruction, in which case the outputs may in fact be different from my implementation and this implementation. But in any case, the other interesting thing you can see is that the V flag, the overflow flag, is actually not defined. It basically means that they don't want you to rely on the output of the overflow flag. It's essentially random. Now, of course, it's probably not random. It probably has very specific outputs for very specific inputs. They just didn't feel the need to document it, and I hopefully don't think that anybody actually relies on the output of V after a decimal-adjust instruction. So that is pretty much what I've implemented, modulo some of the stricter form of this table. And also the other thing is that they sort of gave themselves an out to abuse of the DAA instruction where they basically said here that if the contents are the result of applying any of the operators, add B to A, add or ADC, to binary coded decimal operands with or without an initial carry, the DAA operation will function as follows. So again, they're basically saying if you applied DAA and it wasn't as a result of adding two binary coded decimal operands, then all bets are off. Well, that's the DAA instruction right there. There is not much to it. Okay, and here's the formal verification for DAA. So first I define a whole bunch of variables just to make things convenient. So I have the previous value of the half carry, the previous value of the carry, and the output value of the carry. And I have the low nibble or digit of the input, the high digit of the input, and the output. Now, there are some conditions that I've restricted formal verification to. These are coming in the form of assumptions. And that is that according to this table here, if you take a look at the half carry, that whenever it's 1, the lower input digit is always between 0 and 3. So same thing here and same thing here. And there are a couple of other restrictions on the upper digit when there are certain things happening on the lower digit. So basically what this boils down to is that if the half carry is set, then the low digit has to have been less than or equal to 3. In addition, if the carry bit was also set, then the high digit must also be less than or equal to 3. So these are the assumptions under which DAA is supposed to work. And if those assumptions are violated, then there's no requirement on DAA to do anything meaningful. Okay, so again, here we're just checking that nothing that shouldn't have changed, didn't change, and that the program counter was incremented by 1. Now, these are the conditions and the results in sort of array form. And it implements directly this table from the programming manual. So basically the first thing is what the carry input has to be, what is the condition on the high digit that we're checking, what is the input half carry, and what is the condition on the low digit. And the outputs are what is the adjustment to add to the output and what is the output carry required to be. And then I basically go through a for loop and check if condition 0 is true, then check that the results for 0 are true. If condition 1 is true, then check that the results for 1 are true and so on. Then finally I just check the Z, the N flag, and the C flag. I just put here because while you do have to check the carry flag and all the other carry flags should not change. And again, it should come as no surprise if you look down at the bottom that formal verification on DAA works just fine. Implementing TSX and TXS is pretty straightforward. In fact, it's pretty much the same as TAB and TBA, except that no flags are affected, so it's just a straight copy. And the same thing with formal verification, it's basically the same file as TAB and TBA, except again, no flags are affected. And that works too, it's formally verified. So how about we move on to increment and decrement the stack pointer. And my bad, TSX and TXS are actually four cycle instructions, so I just sit around and do nothing for the other cycles. Same thing with INS and DES, so apparently there was no 16-bit single cycle increment or decrement unit, at least not for the stack pointer. So it also takes four cycles. Okay, INS and DES formal verification, there is very little that goes on here. Basically, we just check that the output is plus one or minus one of the input depending on whether the low bit of the instruction is one. And if you look at the instructions, you could see that the difference between INS and DES, not only in bit three, bit two, but also in bit zero. So I just use that to check since I'm only looking at those two instructions. So that's really the only thing that matters. And that's the only thing that matters also is that formal verification works. So that's done. Okay, next instructions are pull A and pull B. So for the first time, we are actually going to be using the stack. We're going to be popping off the stack, so let's see what that looks like. These are also four cycle instructions. So one of the things that surprised me about the pull instruction is that the flags actually are not affected, which is odd because with the LDA instruction, the flags actually are affected, namely the negative flag and the zero flag. So pull is kind of like a load except you're just pulling it from the stack. So why it wouldn't affect the flags, I'm not really sure. Now if we look at the cycle by cycle breakdown for pull, we can see that this is the four cycle instruction. And it looks like during the third cycle, the memory address is not actually valid, because we're doing the actual read on the fourth cycle. So there it is, stack pointer plus one. And if we look at the poster that I've been making, you can see the stack layout here is, there's the original SP right over here. And this is what happens when you push a value. So you push the value and then you decrement the stack pointer. So that's the current stack pointer, which means that to do the opposite to pull from the stack, you have to increment the stack pointer and then retrieve the value from there. Okay, here's the implementation for pull. Now before I get into that, I just want to show you the modification that I made to read byte. So read byte originally took a combinatorial destination. And what would happen is you would set up the address lines to output whatever address you wanted. And then on the next cycle, data in would at some point have the value that was being read. And you could copy that into a combinatorial destination. So what I've also done is I've given you the option of storing data in into a synchronous destination, like for example, a register or maybe temp eight or something like that. So you can specify them by name by keyword argument. So you can either specify combinatorial destination or synchronous destination. And I've cleaned up the comments a little bit here because it wasn't exactly clear to me what was going on. But now I make it clear that you start from the given cycle. And the address lines are output during cycle plus one at which point data in will contain whatever data you're reading. And if the combinatorial destination isn't none, then that gets copied in. And if the synchronous destination is not none, then that also gets copied in at the end of cycle plus one. So you can actually do both if you wanted to. So I've leveraged that in the implementation for poll. So if we're doing a poll a, then we're going to do a read bite starting from cycle number two, which is actually cycle number three, if it's if it's one based using the address sp plus one. So I'm not really happy with the implementation of this. And so first of all, the reason is that I've got this if statement. And basically I'm copying the read bite code twice in this instruction, because you cannot have a destination or a left hand side that is a multiplex. So I have to use an if statement. So that's one thing that I don't really like about this instruction. The second thing that I don't like about this instruction is that I'm passing sp plus one to read bite. And I'm also computing sp plus one on the last cycle in order to increment the stack pointer at the end of the instruction. So I'm this works. I've done the formal verification. In fact, here it is. Let me close this. The formal verification is just basically make sure that, you know, sp has been incremented by one and that the address read is actually sp plus one and that the data read is put into the correct register B or a or actually a or B. That's let's see right here is the output. So if pull a is true, then the output is post a otherwise the output is post B. And I'm doing that check right over here and none of the flags change. So it works fine formally. If you look over here, the first run, I actually had an error in formal verification. When I fix that formal verification work just fine. So I'm really not very pleased with the way that I've implemented this instruction. I should be able to compute sp plus one just once. And looking at the cycle by cycle breakdown right over here. If you look, you can see that the address lines are actually loaded with the stack pointer before you incremented. And then they're loaded with the stack pointer plus one. So that would be the increment. So wondering if what they did was they loaded the address lines with stack pointer and then they incremented the address lines and then stored whatever was in the address lines into the stack pointer. That's one way of just having a single increment in there. So, you know, maybe it's possible that the address lines are both right and read, which would be interesting. So that's something that I will probably look at. In fact, I may actually make an alternate version of this implementation to sort of exploit the idea that the address lines can also be read back. So let me try that. Okay, so here's my alternate version. So during cycle one, I set up the address lines to output the stack pointer. Read write is going to be one, but valid memory address is going to be zero. And this is basically the same thing as what happens during this cycle right over here where the stack pointer is output on the address lines and VMA is zero. Now on the next cycle over here, what we do is we increment the address lines and we set VMA to one. And that means that by the time the next cycle rolls around, the address lines are now set up properly, valid memory address is set up. And then we just wait for data input to come in and we store it in either A or B depending on the instruction that we're doing. And of course, we have to insert our little formal verification hook and we end the instruction. And in addition, at the end of the instruction, we load whatever we have in the address lines into the stack pointer. So I just plug this in and I ran formal verification again. And as you can see, it worked just fine right here in the end. So it's a little wordier. But I think I like this implementation better than this other implementation, which kind of seems a little hacky because it's really, really trying to use Read Byte. So I think I'm going to keep the old version around just as a reminder to maybe, you know, make Read Byte a little more flexible or clean it up a little bit. And this is really what I want to do. So that's the pull instruction. Well, now that we've done the pull instruction, we may as well do the push instructions as well. So now we're actually going to write to the stack. These are also for cycle instructions. And if we look at the address, the cycle by cycle breakdown, we can see something interesting here is that we're doing the right over here where we're outputting to the address line stack pointer. And again, it looks like we're using the address lines to store to be able to decrement the stack pointer. And that's where VMA is zero. So let's go ahead and implement that. It should basically be the opposite of pull. Okay, well, here's the implementation for push. It is fairly straightforward. So just like before I load the address lines with the stack pointer, except this time I do want to do a write and the data out is going to be either the A register or the B register depending on whether bit zero is set. If it zero is clear, then it's a push a if it's set, then it's a push B. So on the next cycle, I'm just decrementing the address lines. The valid memory address of course should be zero because we don't want to read or write that address. And we set up our formal verification hook. And then on the next cycle, we simply store the address lines back into the stack pointer so that it's been properly decremented. And we end the instruction there. It's pretty straightforward. I didn't use the write function that I wrote earlier. So I think I kind of like this mechanism. In fact, now I'm actually wondering whether read byte or write byte is even useful, considering that it does hide a lot of the things that you're doing. So I might just change that. In any case, here's formal verification. There's not a whole lot because we're not doing any read. So A and B do not change. We're just making sure that we write to the correct address, which is the stack pointer and that we have properly decremented the stack pointer. And of course, that we have written the correct data either A or B. And it should again come as no surprise that formal verification worked properly. Now the only instruction left in this row that I would want to implement during this video is return from subroutine. But it seems kind of weird to implement return from subroutine without first implementing at least branch subroutine or jump subroutine. So let me go ahead and implement. How about branch subroutine and then I can implement return subroutine and then I'll follow it up with jump subroutine. So here is the cycle by cycle breakdown of BSR, which we are going to implement. This is branch subroutine relative. So it takes one operand, which is a signed offset from the address of the instruction following BSR. So here we can see, so we are going to have to store the return address, which is of course the address of the instruction following BSR onto the stack. So here we can see the same thing that we did with push where we put the stack pointer onto the address lines and write data. And then we decrement the address lines, write data, decrement the address lines again. And at that point we can store the address back into the stack pointer. And then we just jump to the subroutine address, which I assume that at this point we are doing the calculation of the offset. So let's go ahead and implement the instruction according to this cycle breakdown. So here it is, the implementation of BSR. So the first thing that I do is I read the immediate byte. And this is basically just like the beginning of branch, just branch relative, because of course I need a relative target. So this gives me that. So one interesting thing is that during cycle one, okay, so here's the thing. First of all, my cycles start from zero and these cycle by cycle breakdowns start from one. That's issue number one. Issue number two is that during a cycle, say during cycle one, if I assign something for phase one, that means that this assignment actually happens on the next cycle or during the next cycle. So in other words, VMA is not set to zero in this example until cycle one clocks over to cycle two and then it remains at zero during cycle two. So I've done a little mental translation where I say that if I'm writing pH one code that has a cycle of say one, mentally I just add two and then I look at the table like this to make sure that I'm actually doing something. So two plus one is three and there's VMA right there. It is indeed zero. So that little mental translation actually helps me, you know, past the whole chain of logic about how, oh, this is the cycle number and it's zero based versus one based and this is pH one versus combinatorial. So I just mentally add two to my cycle number. So we can see here that that's during cycle one mentally cycle three. So that's there. Now cycle two mentally cycle four in the in the data sheet. So what I'm doing is I'm assigning I'm computing the target right over here. So this is the PC at this point, plus the relative offset is going into temp 16, and I'm not going to use it until all the way at the end of the instruction when I jump to there. So in any case, I set the address to the stack pointer. The memory location is valid. I'm going to do a write and I'm going to output the current PC, which is the return address, the low value of the PC. Again, mentally add two. So that's four. So during cycle four, the MA is one. I output the stack pointer. I write zero. I the read write signal zero and I write the return address low order byte. So same thing for the next cycle, except I'm decrementing the address lines and this formal verification hook, of course, is for the previous right same thing here. This is for the previous right up here. During cycle four, data sheet cycle six, again, I decrement the address. Valid memory address is now zero because it's not valid. And read write is one. So cycle six, stack pointer to the address is not valid. And we're doing a quote read. And there's irrelevant data going out. And coming back in, I guess. So during this cycle, we still don't have a valid memory address. And we can now set the stack pointer equal to the address after we've decremented it during the next cycle. Again, the valid, the address is not valid, but we are outputting the return. We are outputting the target address here. And then finally, during the last cycle, we just end the instruction and jump to the target address. So here's the formal verification. And by the way, I should note that formal verification passed not without a few debugging stages in between. Mainly because I did stupid things like transposed a plus with a minus or use the wrong index on something. But in all of those cases, formal verification actually caught the fact that I did something inconsistent. So formal verification is basically this. So I calculate the return address, which is always the PC before the instruction plus two. So in other words, that's the address of the instruction after the BSR. The offset is just a signed signal, which I'm going to assign. Where do I assign it over here? So it's the first data that I read. And the target is simply the return address plus the offset in signed arithmetic, of course, because offset is signed. So that's the target address. And I make sure that A, B and X do not change that the PC after the instruction is the target, the SP stack pointer has been decremented by two. I read one address, which is the relative offset, and I write two addresses to the initial stack pointer. I write the low value of the return address and then to stack pointer minus one, I write the high value, the high byte of the return address. And that's it. The flags don't change. So that's pretty much that. And again, as I mentioned, not without a few iterations, formal verification passed. So great, we've got BSR. So now let's do the opposite thing, which is RTS. So here's RTS. Really, it's just the same thing as a pull, except in this case, we're pulling two bytes and we're sticking them into the PC and then going from there. So let's go ahead and implement this. Here's the implementation of return from subroutine. And again, I basically just followed the cycle by cycle breakdown over here. You can see that it's a five cycle instruction where we just read the return address off of the stack. And if we see it ends at cycle four, which is of course five cycles, so that's correct. And basically I just output whatever is supposed to be output on the address lines and VMA. So again, we're using address to increment. We're loading it up with the stack pointer initially and we're storing the stack pointer at the end. So the stack pointer ends up being incremented by two. And the first value that we read is the high order byte of the return address. And I'm just storing it in temp eight. Over here, during this cycle, DIN contains the low order byte of the return value. So you can see that when I do my end instruction, the instruction that I want to go to is equal to the data in is the low order byte and temp eight is the high order byte. Again, I could have done LCAT over here to make it a little bit more, I don't know, easier to see. The interesting thing is that I did that in formal verification. So what I added in my basic verification file is I added that LCAT function that I had in the ALU because it is actually quite useful. And I use it down here to make sure that the program counter after the instruction is equal to the left concatenation of the first piece of data that I read, which is the high byte and the second piece of data that I read, which is the low byte, high, low. Everything else is pretty ordinary. A, B, and X don't change. The stack pointer has to increment by two. The addresses that I read have to be the stack pointer plus one and the stack pointer plus two before it gets incremented, of course, and the flags do not change. And again, modulo, a mistake that I made, which you can see here during my failure. It works just fine. So that is a return from subroutine taken care of. So these instructions and branch subroutine can all turn green. Next, let's implement jump to subroutine. Here's the implementation for JSR. I basically just followed the same advice that I did for the other instructions with the exception that I did call mode extended because why not? And then I just implemented whatever the signals have to be. Same for indexed. So now I did notice one thing about indexed. So let me split this and let me go to indexed. Now you'll notice that cycle two, which in the datasheet will be cycle four, I had set VMA to zero all the time. And if we look at the datasheet, we can see that during cycle four, VMA has to be set to one because we are actually going to be writing the stack with the return address. So in every other indexed instruction, VMA is set to zero during datasheet cycle four, except for JSR. I may have mentioned that before when I wrote this. So the thing is what I'm going to do is I'm just going to comment it out. I'm not going to delete it entirely because that comment is going to be a reminder to me that I've done something weird. And that will remind me exactly what I've done, kind of like tying a string around your finger to remind you of something. The thing is that I thought that always during datasheet cycle four, VMA is going to be zero. It turns out that that's not the case. Now it also turns out not to matter because I'm not doing any formal verification of the VMA line. I'm not even doing any formal verification of the read-write line or the number of cycles. And I promised that I was going to do that. And when I do that and I comment this out, it's very likely that formal verification will fail for a whole bunch of instructions because the default for VMA is one. So this is sort of a reminder to me that if I do want to set it to zero, then I should probably go into the JSR code right over here and explicitly set VMA to one, which I've done over here. So, yeah, in fact, I guess, okay, let me go and uncomment this because I think it's going to work, right? Because again, with nMyGen, when you're elaborating code, it's the code that comes last that takes precedence over the code that comes first. So here, if it says during cycle two, set VMA to zero, and then back in JSR, where is it? Right here, during cycle two, set VMA to one, well, this code is elaborated after this code, which means that VMA will get set to one. So it's going to be an override. So that's going to remain true for every instruction that overrides VMA. And since none of the other indexed instructions override VMA for cycle two, it should work. So again, should work. That's a very bold statement, and only formal verification will prove or disprove that. So speaking of formal verification, here is the formal verification function. So the return address, if it's a JSR extended instruction, that means that there are two operands, which are the target. Otherwise, if it's indexed, then there is only one operand, which is the offset from X. So that explains the return address. A, B, and X do not change. The stack pointer, of course, just like BSR gets decremented by two, regardless of whether this is an extended mode or an indexed mode instruction. Same thing with the data that we write to the stack pointer. We always write the return address in the same order. Finally, if we're doing an extended mode JSR, then we have read two bytes, namely the two byte operand, and the post PC is just going to be the left concatenation of read data zero and read data one. Otherwise, this was an extended, this was an indexed mode instruction. So we've only read one address, namely the operand, which is the offset from X. So, of course, the post PC is going to be X plus that bit of data. And the flags don't change. And again, modulo a few mistakes, formal verification works just fine for JSR. So now I can turn those instructions green. All right, it looks like the only instructions that we have left aside from the interrupt instructions are load s, store s, load x, store x. So let's get to load s and load x. This is the implementation of load stack pointer and load x. They only differ by one bit, and they handle all four modes. So I'm not going to go in detail through everything. It's just really long, as you can see, because I had to, in addition to executing the mode at the beginning of the instruction in order to get the data or address out of the instruction, I had to increment the address lines. So, of course, it requires very careful determining of what has to happen on every cycle. The interesting thing that I want to talk about is this new function for the ALU that I've called LD chain or load chain. Now, the thing is that load stack pointer and load x are 16-bit loads. So if it's an immediate instruction, then you have two bytes that come after it. The high byte and the low byte, and that's your 16-bit value. If the mode determines an address, then you have to go to that address and then read two bytes, the high byte and the low byte. So, for example, you can see here that we're loading the high byte of sp, and here we're loading the low byte of sp. Now, when we load the high byte of sp, we do an LD. And as you know from before, what LD does in the ALU is it looks at the data to see if it's zero, and if it is, it sets the zero flag. And it also looks at the high bit to see if it's set. And if it is, it sets the N flag, the negative flag. So that's what it does with the high byte. Now, the thing is that the flags have to be set such that the zero flag is only set if all 16-bits are zero. And the N flag has to be set if, for the 16-bit value, bit 15 is set. So we've already set the negative flag. That's for bit 7 of the high byte, which is bit 15 of the entire 16-bit value. Which means that we can't use LD for the low byte, because then if the low byte happens to have its bit 7 set, then the negative value will be set, which may not be correct. In addition, LD doesn't pay attention to the high byte's zero value. So LD chain basically chains an LD across 16-bits. So here's what it looks like. It's fairly straightforward. It's basically the same as LD, except that the Z flag is ANDed with the previous Z flag. So that only if both values are zero will the Z flag be set. We don't set the N flag, because it's already been set, and we always set the V flag to zero. So the assumption is that you've already done an LD, and now you're going to do an LD chain. So formal verification, it actually looks a little neater than the actual implementation. Basically you extract the mode, you extract whether you're doing an LDS or an LDX. You look at the output, there's a read address that depends on the mode that you're performing, and basically that's about it. We just make sure that everything works out correctly. And again, you know, modulo some errors that I made. We have a successful formal verification. So that's LDS and LDX taken care of. So one thing that I did forget is we will have to implement CPX, compareX. So we'll do that after we handle storeS. Now, notice that I've put the undocumented instruction storeS immediate and storeX immediate. That was a bit premature. I'm not actually certain if 8F actually stores S anywhere. If it's anything like the storeA immediate undocumented instruction, if there's any justice in the world, STS for immediate would actually store the stack pointer in the two immediate values of the instruction. But I haven't actually gone to the simulator to actually determine whether that is the case. So instead, we're just going to implement storeS and storeX for direct indexed and extended. So again, storeS and storeX, not that interesting. In fact, it's pretty much the same as loadS and loadX except instead of loading, you're just storing into the 16-bit value. And of course, you're not implementing an immediate mode. And formal verification, again, is pretty much the same except you're doing some writes and you're always writing two bytes to the write address. And for once, formal verification worked right away. Look at that formal verification time. It's now up to four minutes for this set of instructions. So I'm not sure if that would carry over to most instructions now, simply because there are so many other instructions that it could execute before the store to get things working. I guess one other thing to point out is that the store has the same flag semantics as the load in that if you store zero, then the zero flag is going to be set. And if you store a negative value, then the end flag is going to be set for whatever reason. So all that we have left to do is compareX. So the thing about the compareX instruction is that it doesn't do a 16-bit subtraction. It's real only use is to just compare whether the two 16-bit values are equal. And it basically says that in the programming reference manual that the Z-bit is set or reset according to the results of these comparisons. The N and V bits, although they're modified, are not intended for conditional branching because they don't tell you anything useful about the comparison. In fact, they basically say here that the N bit is set if the most significant bit of the result of the subtraction would be set clear to otherwise. So in other words, it does that for the high byte and for the low byte it basically ignores the negative flag. Same thing with the V flag. If the subtraction of the most significant bytes results in an overflow, then the V flag is set. And the low byte subtraction has absolutely no effect on V. So we're going to have to figure out how to implement this using the ALU that we have. We could always modify the ALU for something extra like, you know, something similar to LD chain, but let's see what we can come up with first. So CPX, it's basically the same as LDX except instead of storing the data into X, I'm just throwing the data onto input number two of the ALU and I'm throwing X onto input number one of the ALU. And yes, I had to define two new functions, compare X high and compare X low, but it does make things kind of easy. So compare X high and compare X low basically do the same thing as sub with the exception of the flag. So there's just a little bit of complications with the flags, especially down here when you're doing a compare X low. The only flag you're going to change is the Z flag and the Z flag is basically chained with the previous Z flag. The carry flag is never set for compare X low or compare X high. It's only set for the actual subtraction operations and NZ and VR set as usual based on compare X high. Formal verification, again, no real big deal. I did use this interesting equation that I found in the programming manual. You can see this equation down here. Oops, now you got to see my YouTube view. Okay, so you can see down here that the V flag has this particular formula. Now I didn't use that formula in the ALU, but I did use it in formal verification. And in fact, it seems to work just fine. So all righty then. I guess that is probably the way it is actually implemented on the 6800 itself. So that's pretty much that compare X does formally verify properly. And this means that we can set all these other instructions as green. Excellent. So the only instructions that we have now to implement are return from interrupt, wait for interrupt, and software interrupt. And because that involves the interrupt line, I've left it for last because I suspect there's going to be some difficulties involved. So all that's left to do now is see how many LUTs we've used. And let's see how many LUTs we've used. I'm betting we're way over 2000. And the answer is 2700 with 3100 cells. Look at all those carry units thanks to additions and subtractions all over the place. But I don't care about optimization. I just don't care. How's that cat?