 So, I think what we're going to try now is to handle all of these instructions. These are the branch relative instructions. And I think it's going to be a fairly easy win. There are 15 instructions, 16 if you count the undocumented branch never instruction. First of all, as I think I mentioned before, they come in pairs, so they're opposite pairs. This is like branch if carry is clear. This is branch if carry is set. This is branch if zero is clear, and this is branch if zero is set. Overflow clear, overflow set, and then plus and minus are actually combinations of things. Zero is included in plus. Greater than or equal to versus less than, greater than versus less than or equal to, and then there's branch always versus branch never. Branch higher versus branch lesser, lower, anyway. So the point is that we really only have eight instructions to do with an invert bit. And I think the eight instructions can basically be considered as combinations of flags or maybe no flag at all. So I think this will be an easy win. Let's go ahead and try it. And we can see in this chart here what the conditionals are that are checked for each of the different branch instructions. So for example, you can see that branch if equal to zero checks if the zero flag is one. If plus only checks if the negative flag is zero. So of course this would include a zero value because the high bit wouldn't be set. On the other hand, something like branch if higher checks if the carry flag or the zero flag are equal to zero, or actually they both have to be equal to zero, don't they? So plus is different from higher, which is different than greater than, because plus is a logical check. Higher is an unsigned check, and greater than or equal to is a signed check. So in any case, the branch tests are laid out quite nicely for us, so we can just code that up. All right, so I've started by creating the decoder for the branch instructions. Of course, there are 16 of them, and they're going to be handled by the same function, BR. So looking at BR for branch, the first thing that I'm going to do is I'm just going to leverage the existing function that I wrote to get the next byte out of the instruction. And because that operand is actually unsigned, and I want n-my-gen to generate signed arithmetic logic, I have to create this intermediate signal, which is a signed 8-bit value, and then assign the operand to that signed 8-bit value. Now this should get optimized out, because really all you're doing when you get right down to it is that relative is just a copy of this operand signal. It's just that all operations on relative are just signed operations, so the logic is different, it's just that the signal is identical. So this should get optimized out. So the next thing is that on cycle two, all we're going to do is compute the target, if the branch actually works. So it's just going to be the PC, because by the time we get to cycle two, the PC is the instruction plus two, because it's a two-byte instruction, plus the relative value. So what this means is that if you want to spin in place, you just do a branch always minus two, because minus two will always go back to the beginning of the instruction. And then on the third cycle, and this is a four-cycle instruction, I have a bit of logic that I called branch check, and if branch check returns true, or the signal returned by branch check is true, then you go to the target, otherwise you just go to the PC, which is currently pointing at the next instruction anyway. And that's it. So let's take a look at branch check. So first of all, what I've done is I've taken the bottom bit of the instruction on, I've called that invert, because again, as I've said before, the op codes come in pairs where one is basically the inverse of the other. So conned is just going to be the condition that is being checked. So for example here, I'm checking the next three bits of the instruction. If they're zero, zero, zero, then we're handling the branch always and the branch never instruction, where branch never is the inverse of branch always. So the condition here is going to be one. And then down here at the bottom of the function, I just invert that signal if invert is set. So branch never is going to have invert set to true. The condition is going to be one. So of course, one X or one is zero branch never. Same thing with all of the others. And basically, I just took this straight from the table. So I'm just taking the carry flag, I'm oring it with the Z flag and then inverting that to say that the conditional is equal to this or this equals zero. And that's branch high. Same thing with all these other things. And that's really all there is to it. So take branch is actually a signal that always exists in the processor. It's just that the only time that it's looked at is during the branch instruction. Now for formal verification in sort of keeping with the philosophy of doing it again, but in a different way to make sure that at least it's consistent, I defined this branch enumeration. It turns out that I don't actually need that, but it was kind of useful for me to just refer to. So here is the instruction matcher, it's basically the same thing. Here is the standard block for things that shouldn't change, and none of the registers should change except for the PC. We're not writing any addresses and we're only reading one address, which is the address after the beginning of the instruction. So here I've just got some convenience, some convenience variables for NZ, VNC. Another convenience variable for the offset, which again is going to be signed. And down here I assign to the offset. And then I have another convenience variable called BR, which is the branch that we're going to do. And that consists of the bottom four bits. And here's an array of the conditionals. So basically all I have to do to find out whether I should take a branch is just take the BR, the four bottom bits, and use that as the index into this array. And that'll tell me whether I should take the branch or not. And again, this is taken straight from the table in the documentation. So the final assertion is whether the PC after executing the instruction is equal to, well, if we're taking the branch then it's equal to the PC of the instruction plus two plus the offset. Or if we're not taking the instruction, it's just the PC plus two. And that's really that. And formal verification of course worked. Now it would be interesting to simulate this with a branch always minus two, just to make sure that signed arithmetic is working and everything looks the way it should. I know that we've got this signed signal in here. And because we're using that in the addition over here and my gen should generate signed arithmetic, but it's nice to check just to make sure that it's doing the right thing. So here's the simulation section of core dot pi that we haven't seen in a long, long time. And basically I've just programmed it with two zero, which is branch always and F E, which is negative two. So in theory, this should go from one, two, three, four to one, two, three, five and back to one, two, three, four. So let's run this and see if that actually does it. Okay. And here's the result of the simulation reset state goes from zero to one to two to three. So now we're actually starting. So we can see that the PC goes from one, two, three, four to one, two, three, five then to one, two, three, six. And this is expected because the instruction is not yet over. And then at the end of the instruction, we go back to one, two, three, four, one, two, three, five, one, two, three, four, one, two, three, five. And you can see that these bits are take branch and conditional. So take branch in this case is always one. We can see the instruction here starts at two, zero. And because I have a knob at one, two, three, six, if that branch were not taken, then we would see the instruction change to zero, one, which is enough, but it never changes. So it's basically doing a branch minus two properly. So now that we've fully implemented this entire row of branch instructions, we can now just turn that green. Excellent. Let's take care of some other easy instructions. How about CLV, that's clear V, set V, clear carry and set carry. I'm not going to handle the interrupts yet because we are not going to deal with interrupts until we have completed all of the instructions, except those instructions that are about interrupts. So those should be fairly easy to do. Okay, so the implementation is pretty straightforward. I decided to have two functions, one for the clear overflow and set carry instructions, and one for the clear carry and set carry instructions. So if we go to, for example, the overflow version, really all this does is in cycle one, because this is a two cycle instruction, we have to tell the ALU to either set or clear the overflow flag. So obviously this required a modification for the ALU, where I've included at the top four more functions just to clear and set these flags. Remember that the ALU basically controls the flag, so you have to tell the ALU what to do, and that's why we had to add these extra functions. So all the functions do is clear or set the carry or overflow flag. So that's really all there is to those instructions right over here. And then verification of that just involves. Okay, so I did things slightly differently, I used these constants instead of actually declaring the bit patterns over here, because why not? I think that eventually I'll want to put in clear interrupt and set interrupt here. So it would just be nice to add that later. And then everything remains the same with the exception of the PC and of course the flags. So basically what I do here is I say, okay, normally the carry and the overflow flags should not change, except depending on the instruction, they do change to either zero or one. And formal verification does work, great. So now we can turn these four instructions green. Now there are two other flags related instructions, that's the TAP and the TPA instruction. What this means is transfer accumulator A to, well, I don't know what P means, but it's the flags register or the CCR. And TPA is transfer the flags register into the accumulator A. So why don't we just tackle those? And we can see in the Motorola 6800 programming reference manual that P apparently stands for processor condition codes register. So processor, okay. Now here's the TPA instruction which is transfer from processor condition codes to accumulator A. And we can see that the top two bits get one because really in the condition code register, if it were an 8-bit register, the top two bits would always be one because they're just not used. Okay, and again the implementation is pretty straightforward. Aside from the decode, I've decided to just implement the instructions as two separate functions. And again, we have to give the ALU two new functions. So the first function, tap, if we go to ALU, what tap does is it takes whatever input one is, sets the two high bits and stores that in the condition codes register or the flags register. And for TPA or transfer the flags to the accumulator, we take whatever is in the flags register and just in case we set the upper two bits, they should always be set but still let's just do it anyway and set the output of the ALU to that. And then of course in the instruction, when we do a TPA, we set up the ALU to do a TPA and then we take the output of the ALU and stick that in the A accumulator. And these are two cycle instructions, so we've only got cycle equals one. And again, formal verification is pretty straightforward. Essentially nothing changes except the PC and in the case of which one is this tap. So we make sure that we have set the flags to the correct values. And same thing with TPA, we make sure that we have set the accumulator to the correct values. And of course the flags don't change. So that works. Now speaking of this, I told a lie. I did say that I didn't want to handle clearing and setting the interrupt flag yet. But the thing is that when you're doing a tap transfer accumulator to the flags, you are actually setting the interrupt flag. So why not just implement SCI and CLI? And of course the implementation is pretty straightforward. It's pretty much the same as before. Define the decode, define the function, define the functions in the ALU, and call those functions. And then add it to the formal verification and it works. So let's celebrate by setting these four instructions green. Okay, so it looks like we've almost completed the very first row. We would just have to do increment x and decrement x. So why don't we close out the video by implementing those two instructions? Now incrementing x, x is a 16-bit register. So we will have a 16-bit increment or a 16-bit decrement. But again, I'm not going to define that in hardware. I'm just going to write it in code and let nmygen create the logic. Now in terms of the flags, we can see up here in this section of my poster that I'm putting together, we can see that the effects of decrement and increment x means that the zero flag changes according to what happens after you do the decrement or increment, whether the result is zero. So that's a slight complication because we will have to implement something in the ALU to do that. Now we can see that incrementing the x register is a four cycle instruction. So it's not like, for example, incrementing the program counter, which you can do on every single cycle. So that sort of implies that the 6800 doesn't actually have a special 16-bit or a general 16-bit increment decrement unit. Because if it did, then this execution time would just be two cycles instead of four. So that sort of implies that it probably goes through the ALU in order to do a 16-bit increment or decrement, and it probably does it eight bits at a time, which is probably why it takes so long to do it. Nevertheless, I went ahead and implemented it using just 16-bit plus or minus. And I had to basically sit and do nothing for one of the cycles. And I could have, in fact, sat and done nothing for two cycles, but I chose to use the final cycle to actually do the comparison to zero and then set or clear the Z flag using the ALU. So obviously this isn't exactly how the 6800 implements it. And the other interesting thing is that if we look at the cycle by cycle breakdown provided by the datasheet for increase or decrement X, we can see that during cycles three and four, valid memory address is set to zero. So basically what the processor is saying is, well, I'm not doing anything useful for the memory, so just set valid memory address to zero. Don't pay attention to the address lines. It does say that the address lines contain the previous register contents and the new register contents. I don't know. That's actually kind of interesting because it implies that X is put onto the address lines for those two cycles, which is kind of interesting and we can do that just for completeness sake. So why don't we do that? So there's the implementation, including the side effect of valid memory address. Remember that it defaults to one. So we explicitly set it to zero on cycle. We get it ready to get set to zero during cycle one, which means that it will be set on cycle two and this will be set on cycle three. Same thing with the address lines. We're getting it ready to output the previous version of X and this is the current version of X. And then of course during cycle three, that actually happens. Okay, and formal verification. Well, I'm just defining increase X and decrement X, increment X and decrement X. It's a one byte instruction, so that's the PC check. And if the instruction is increased, then we increase otherwise. And if the instruction is increment, then we increment, otherwise we do a decrement. And of course, we truncate to 16 bits. And finally, we make sure that the Z flag was set properly. And that works just fine. So let's go ahead and fill those out. Okay, so here we go. We can now take these boxes and make them green. And now we've got two entire rows that we fully implemented, minus the undocumented instructions, which we don't know what they do yet. And so as we bring this video to a close, let's take a look at how many cells and LUTs we've used with these new instructions. So did we break 1000 LUTs? Let's find out. We did. We definitely sailed right through it. And again, a lot of this is because we're not doing any hardware optimization whatsoever. There's probably a lot going on with the 16 bit increments and decrements. So again, we're not going to worry about this until we have to, which would be at the very end of implementing all of the instructions. Let's take a look at the timing. The timing says that we are now down to 51 megahertz. And again, hardware optimization, ignoring that for now. How's that?