 Greetings nMyGen friends, and this is part six of programming an FPGA using nMyGen for a 6800 processor. So for this video we are going to look at some of the other addressing modes. We've currently been implementing instructions in the extended mode, which means that the address operand is a 16-bit address. We'll be looking at the direct mode, the immediate mode, and the index mode as well. And we are going to implement all of our instructions with those additional modes. So let's check it out. So we've implemented these green instructions here, and the only difference between these instructions and the other instructions down in this section is the mode, immediate, direct, or indexed. So I think it would probably be worth some time to simply implement all of these instructions and see where we end up. Now what I've chosen to do here is I'm going to implement the direct version of the instructions. The direct version differs only from the extended version in bit 5. So in the matcher for the instructions, I can put a dash or a don't care in the bit 5 position. And I haven't done that for store A yet. So if we look at the modification to the general ALU instruction, we can see that I've extracted the mode bits. Now the mode bits, because of course we have four rows or four different modes, the mode is encoded in instruction bits 4 and 5. So I can extract this mode, and then I can simply switch on the mode. So if the mode is 1, we're in direct mode, and we want to get the operand using a function called mode direct, which is basically the same as mode extended, except it's only going to look at the first byte as the operand, as the address. This is the equivalent of zero page addresses in 6502. Basically this is zero page, and it's probably no coincidence that 6502 has a zero page mode just as the 6800 has a direct mode, simply because the 6502 team worked on the 6800. So the next thing of course is that we want to read the byte that is contained at that address. However, because we only needed one byte of operand in order to get the address, we're going to do that during cycles 1 and 2. And again, the only difference between extended mode here and direct mode here is the cycle on which you actually do the operation. Cycle 2 for direct mode and cycle 3 for extended mode. Easy enough. Let's take a quick look at mode direct. So here is mode direct right here, and mode extended is below it. And you can see that it is pretty much the same, with the exception of this multiplexing. So because we only have one byte, we can use DN, we don't have to use this concatenation. And we're also going to store it in temp 16. Now, because the high byte of the address is always zero, we can just go ahead and set that during the first cycle, which we couldn't do in the extended mode. Now, of course, the extended mode requires one extra cycle to read that second byte of address. We don't have to do that in mode direct. And formal verification works pretty much the same way. This is the modification that I made to the formal and, again, I'm extracting the mode. I've moved input 2 into a different signal rather than define it down here. And we basically just switch on the mode direct or extended. And of course, the number of addresses that we read for direct is only going to be two, namely the one byte address and then the contents of that address, versus extended where we're reading two bytes of address and then one byte of content. Now, when we synthesize the code that we've written, we can see that the number of cells is now 658, which is greater than, of course, the number of cells that we had before, but it's not like double the number of cells. Now, one thing that I'd like to do is take this mode and make it more globally available. And the reason is that I'm also going to need the mode for store A. So let's just go ahead and make that a signal and put that in the combinatorial domain so that we have access to it. So the first thing I've done is I've defined our signal. It's going to be two bits. And I've also defined an enumeration which decodes those two bits. Okay, and now I'm just going to extract those two bits and put them in mode. Now, of course, this doesn't apply for every instruction, but for the instructions that it does apply to, I can now use self.mode. And finally, instead of just having a mode local, we now use self.mode. And we can use the mode bits enumeration to make things a little more readable. Now, if we go ahead and synthesize and take a look at the number of LUTs we've used, well, we're now at 658, which really didn't make that much of a difference, which is fine. Now, for the store instruction, we can basically handle direct mode in pretty much the same way. We simply copy what we had for extended mode down here, and instead of mode X, use mode direct and decrease the cycle counts by one. Now, one thing that's really annoying is having to explicitly set the next cycle. Well, it makes more sense that the next cycle should be gone to unless you have an end instruction. So let's make that simplification. So the first thing that I'm going to do is I'm going to set up that increment as a default. So if you don't end up setting the cycle to something, then the cycle will automatically increase. Now, the only time that we're ever going to set the cycle back to zero and not increment it is when the end instruction flag is set. Now, notice when this function is called. It's called after we handle fetching and executing and formal verification. So basically because this code is written by nMyGen after the default is written, this code takes precedence over that code, which is exactly what we want. And now I can just do a search for self.cycle.equals and remove those lines, like so. And then, of course, I'm going to run formal verification to make sure that I didn't break anything. Okay, now that we've added store for the direct mode and we've also modified the cycling code, let's just check in with the number of lots and see where we are. Okay, 724. I've also cleaned up formal verification quite a bit. What I've done is I've abstracted out all the common checks that I do just for ALU instructions because everything else is going to be the same. So here's the common check basically uses the mode out of the instruction. And of course it uses the B bit, whether we want to use accumulator A or B, and it basically does the same thing that formal verification previously did. Except of course now we depend on the mode in order to assert which addresses we've read, how many addresses we've read, and so on. So this sort of abstracts out the mode from the actual checking because once we're past the mode, we know what the inputs and the output to the ALU are going to be. So we can concentrate just on those outputs. Now the immediate mode is going to be pretty simple because the one byte that comes after the opcode is the actual data to use instead of an address that you have to then go look for the data. So for immediate mode, we're just going to use the following byte after the opcode as the actual operand. So mode immediate 8, and I use 8 to indicate that this is an 8-bit operand because there are immediate mode instructions that load a 16-bit register, which means that it would have to take two bytes. So really during cycle one, what we're going to do is we're going to take the data in and we're going to store it into the temp8 register, and that way we can use the data after cycle two. And that's really all there is. Otherwise it's pretty much the same. What indexed mode does is it takes the one byte following the opcode and it uses that as an offset to add to the X register, or the index register. Now the offset is positive always, so you can go from X plus 0 to X plus 255. So basically what I've done here is I've stored the data in byte into the low byte of temp16, saving zeros into the upper byte of temp16, and then on the very next cycle I take temp16 and add it to X and store that back into temp16. So on cycle three, in other words after cycle two, temp16 contains the correct address to look at, and that's why in this comment I say the address is not valid until after cycle two. So if we look at the ALU type instructions, I've added the immediate mode, so here we're taking the operand from mode immediate, and of course we don't have to read a data byte because well we've already done that. And then during cycle two, we simply take the operand and stick it on source bus two, which read byte would have ordinarily done for us, but now we have to do it. And everything else is the same. For indexed, it's basically the same thing. You take the operand which is now the address from mode indexed, you do a read byte on cycle three of that address, because remember the address isn't valid until cycle three and beyond, and we stick it on source bus two. And on cycle four, we go ahead and carry out the operation. And I've also gone into the common verification file for ALU type operations, and I've added the immediate mode and the indexed mode as well. So here we can see that the second address that we read is expected to be whatever X was plus whatever we read from the first address, but then of course remembering that we're equating two signals and the addition of two 16-bit signals is a 17-bit signal. So of course we have to truncate that to 16 bits. Now our formal verification only has to change in this one don't care bit, because now we're handling all four modes. And remember the mode bits are these two bits, bits four and five. And then of course in the core we have to make the same change and running formal verification on this will work. So all we have to do now to upgrade all of these instructions is change bit one to a don't care and we'll skip store A just for a moment. And now we can just run through formal verification for all the instructions and just make sure it works. Now store A is a bit of a special case. With store A, if we look at the table we see that store A immediate actually doesn't have an implementation. It's not defined. In fact what this does apparently according to what I've looked at in the transistor level simulation that visual6502.org provides is that it takes whatever is in A and stores it to the data byte immediately after the opcode. So it's kind of like LDA except instead of loading from the byte after the opcode we store to the byte directly after the opcode. Now I'm not going to actually implement that instruction at this point. I'll leave implementing the undocumented instructions until after I've done all of the documented instructions. So what this means is that for store A we just have to add the one case where we're storing A indexed. So it would be one dash and then indexed would be one zero zero one one one. And that's how we implement store A. Now of course we also have to modify the store A function to handle indexed. So now in this case I basically just copied extended into indexed and now because the operand is actually valid only after cycle 2 all we have to do is increase the cycle numbers for the operations that we do because the operations that we do are the same once we get past calculating the address. And that should work as well for formal verification. So we go into the store A formal verification and we add our new instruction one dash zero one one dash one zero zero one one one. And then again unfortunately because we're doing a store and the ALU verification basically assumes that we're not doing any rights here. I do basically have to do a bit of a copy paste job. Now one nice thing that I've written is a make file here that actually goes ahead and formally verifies all of the instructions. So as long as you create a formal underscore instruction dot py file in the formal directory right here this make file will just go through all of those files and run formal verification on them. As you can see from this right here so you can see here that I'm invoking the make file. You do make minus s formal the minus s just means do it silently. In other words don't actually echo the commands that you're running onto the terminal. And I've specified minus j six in order to run six simultaneous independent formal verifications. So this is nice because however many processors you have or however many cores you have you can run that many formal verifications in parallel which can save you a lot of time. It's certainly a lot better than just running them one by one. And it's definitely a lot better than having to compile with the minus minus ensign flag and then run formal verification yourself by typing it into the command line. So this is nice. Okay, so since we've basically filled out this bottom part of the chart we can go ahead and turn these boxes green. Okay, that's pretty good. Let's also fill out the jump instructions. Now you'll notice that there is a jump indexed but there is no jump a or jump B. It's not exactly clear what should really happen if you do that. I would have to go to the visual 6502.org simulation and actually code that up to see what happens in the simulation. Maybe jump a jumped to the zero page address located at a or maybe it doesn't do anything at all. In any case I am going to go ahead and implement jump indexed and that pretty much worked straight out of the box. Basically I just updated the jump instruction so that it handles indexed addresses in addition to extended addresses. And of course it takes one more cycle to execute. Not forgetting of course to update the decoder so that that one bit gives us our mode. And of course formally verifying it. I just added the indexed mode and that's pretty much all there was to it. So now we can go ahead and update this box. So now that we've implemented all those instructions let's see how many LUTs and cells we've used. Alright 900 with 736 LUTs. So again this is fine we're not concentrating on optimizing for the hardware. We're just going for refactoring Python so that it looks nice and readable. Once we're all done then we'll get to hardware optimizations. How's that cat?