 Now that we have some really basic hardware done, let's talk about actually implementing one of the opcodes from the Z machine. So what I've chosen is to implement the OR opcode. Why? Well, if we look at the table of opcodes, we can see that 0 is an illegal opcode, so we're obviously not going to work on that one. The next bunch of opcodes are jumps to locations, which I don't want to deal with at the moment. But so the next opcode, which seems fairly simple, is to simply OR two numbers together and store the result somewhere. So this is a spreadsheet that I have put together, which basically maps out various bits of dedicated hardware and the commands that get sent to the hardware in order to implement the opcode. So in the beginning, we can see that we're just doing our fetch and execute cycle. Remember earlier, I talked about a micro-address counter, and so the very first thing that we're going to have to do is have some sort of a bit of memory hardware, which reads a byte from memory at the current instruction pointer. So we have an address that we pass into it, we give it a destination register, I'm just calling it M for now, M is an 8-bit register, and the operation that we're going to do is read. We are also going to have an ALU. In this case, it is simply incrementing the instruction pointer, because of course we've just read a byte, so when we go and read something next, it's going to be the next byte. You'll notice that we have two ALUs, and I'll get to that in a moment. And then there's this var hardware, which I'll also get to shortly. So once we have read the memory into the register, we need to increment the micro-address and go to the next location in the micro-program. Now the next thing we're going to do is it's just some sort of operation that I'm calling op-jump. Basically what it does is it looks at the value in M, and it goes to some location in the micro-program based on M. So of course M can be any of 256 values, so basically we're doing an indirect jump table essentially. So we are going to jump to this location, which is for opcode 8. Now remember that for the two operand instructions, they have essentially five different configurations. They're 0, 8, 2, 8, 4, 8, and so on. This configuration that we're going to choose is just the one that takes two small constants. So because they're two small constants, they fit in 8 bits in memory. So we're going to read one byte, increment the instruction pointer, read the next byte, increment the instruction pointer. But also we are going to compute using the 16-bit ALU, not the address ALU. We're going to compute the logical OR of op-0 and op-1, which is where we stored the two operands. And we're going to put the result in op-0. So we can do all of this in parallel. So here's the thing where we got to two separate ALUs. One is specifically for addresses and one is specifically for 16-bit values. So remember that the Z machine is a 16-bit machine. However, also recall that it can address more than 16 bits of memory. So typically you have maybe 128k. So that's 17 bits and rather than stuff all that functionality into a single ALU, I've decided to have two ALUs. One is just for computing addresses and the other is for more general computation on 16-bit numbers. So of course the address ALU is going to be 17 bits or 18 bits if you happen to have 256k of memory and so on. Because versions 1 through 3 of the Z machine can only address up to 128k, this address ALU is going to be 17 bits, at least for these versions. Okay, so we've read the two small constants into op-0 and op-1. The next thing that we're going to do is read where we're going to store the result. I have read it into an 8-bit register that I'm just calling v for var because this is the storage variable. Now remember that if the storage variable is 0, that means that we're pushing the value onto the stack. If it's 1 through f, then it's a local variable. And if it's 1, 0 through f, f, then it's a global variable. So we will have to deal with that somehow. And that's what I've done here. I have given the micro-address a call capability. We do have a limited stack of return addresses, you know, maybe 4, say. So we can call into subroutines that will be common to our micro-program. In this case, the subroutine is called store var. And basically what we're going to do is, first, we're going to compare v to 0. Now you'll notice that I'm doing that using the 16-bit ALU. So in this case, we're just going to 0 extend v because there are only 8 bits. And we're doing a subtract. This is basically the same thing as a compare, except that we don't store the result anywhere, which is allowed. And what this is going to do is it's going to set a z flag. And the z flag will tell the micro-address counter whether or not to jump to an address or just to continue on. So in this case, we have a new instruction called jnz, jump if not 0. So let's take the case where v is actually 0. So we want to store this, the result onto the stack. So in that case, we're going to jump to, no. In that case, we're going to increment the micro-address to call the push subroutine. The push subroutine simply writes the high byte of op 0 into wherever the sp the stack pointer is pointing. And we're going to use the address ALU to increment the stack pointer. And then we do it again for the low byte in op 0. And then we return from our subroutine and then we return from that subroutine, which goes back to jump to fetch, which starts the fetch execute cycle once again. So similarly, if v is not 0, we would jump to store of our 1, which is here. So here is where we exercise the var hardware. The only thing that the var hardware do is it does is compute the address that we want to access based on v and based on the frame pointer, which is where the locals are stored, or the global pointer, which is where the globals are stored. And the result is going to be put into an x register. I just called it x, and that's an address type register. So once we do that, then we know that all we have to do is write op 0 high and op 0 low to wherever x is pointing to, incrementing x in the meantime, of course, and then we simply return. And we jump back to our fetch execute cycle. So that's really how it works. So this basically tells us that we need, aside from the micro address counter, we have four other bits of hardware that we need to implement. Now, before we get to that, let me just make a brief correction to the discussion about frames. So remember, we have local variables are stored on the stack in something called a frame. Now, the problem is that earlier when I talked about the frame, I said, oh, we just put the return address in the frame as well. Well, there's the problem. The problem is that if the stack is a 16 bit stack and addresses are 17 bits, well, we can't fit the return address onto the stack unless we use, say, three bytes per stack entry, but that gets a little wasteful. So instead, what I'm going to do is I'm going to have actually two stacks. So we're going to have a user stack, which is 16 bits, and we're going to have an address stack, which is however many bits is, however many bits wide is necessary to store a single address. The user stack is going to be stored in RAM. The address stack is also going to be stored in RAM, but only on the FPGA. So you have RAM blocks inside the FPGA, which we can use. And we're just going to use that for some smaller address stack. So the address stack stores addresses. We have the frame pointer. The frame pointer is an address. We have the return address. That's an address. We also have a pointer to the address stack, which isn't an address, but nevertheless, we are going to store that on the address stack anyway. The user stack is basically going to be reserved for the locals and whatever you push, whatever values you push onto the stack. So there's a procedure that you use for popping a frame and there's a procedure that you use for pushing a frame, but basically the idea is that you can maintain these two stacks in parallel. So that's the micro-program. So let's get to implementing something simple. Let us start with where are we going to store all of these registers? So let's start with the micro-address counter. Now, I've enhanced our original micro-address counter. Let's look at the micro-address types package first. I've named it micro-address types in order to differentiate it from the actual micro-address counter and to basically be very explicit that this package is for types only. The other thing is that it's best practice to put underscore T after the name of a type. This is true not only here, but also in C and C++. You should always do that so that you're sure, so that you know that you're dealing with a type. So what I've done is I've expanded the number of commands that the micro-address counter can take. You can see that I've renamed load to jump. So we have jump if not zero. We have call, return, and the op jump command. And I've also placed a special type def to indicate how wide our addresses are. So here I'm saying that our addresses can go from zero to 2K minus one, or we have 2K addresses. So this is useful because we can now use it everywhere instead of having to explicitly state everywhere bit 10 down to zero. So here is the micro-address counter module itself. You'll notice that I'm importing the micro-address type from the micro-address types package so that I can simply refer to that type as micro-address T. As opposed to this where I didn't bother doing an import for the command type. Number one, I'm only using it once right over here. And number two, all of the other modules are also going to have commands. So I'm just going to specify it right here. So differences from the previous micro-address counter, you can see that we're including a Z flag. This is necessary for the jump if not zero command. I have also included an M, which is 8 bits, and that has to be used for the op jump. Now also you might notice that I've changed all of the input and output types. Well, I've changed pretty much everything to instead of logic, I'm now using bit and byte. Bit and byte are different from logic vectors in that logic has four states. There's ordinary zero and one. There's high impedance, which is Z, and there's don't care, which is X. Now in this case, there's nowhere that I'm really going to use high impedance states except maybe for one or two special cases when we're dealing with external hardware. And also we're never going to have don't cares, I'm pretty sure. So instead I'm using the two-valued types. So bit is the two-valued type that corresponds to logic. Byte is also two-valued and consists of eight bits. Also byte by default is considered as signed when you're doing operations with it. So I'm explicitly saying that this is an unsigned byte. Now in the micro address counter, I never actually use the sign-adness of M, but nevertheless it's good to be pretty explicit about that. The rest is pretty much the same. Here I specified a parameter. I'll get to that in a moment. Now I've also specified an array. And this notation over here is known as an unpacked size. So this is a packed size, which basically means that this is a number of bits. This is an unpacked size, which is not a number of bits. In other words, it's not four bits, in other words, 16, but it is actually just four elements. So what I've done is I've basically specified a call stack of micro addresses, and the depth of the call stack is four. Now that's internal. I'm not leaking it out of the module because it's strictly internal. And also because this is essentially a state machine, I have a corresponding next call stack. Same thing with the call stack pointer. Of course, I have to specify the number of bits properly. And the next call stack pointer. Now the next thing that I have is the op jump table, which we talked about. And of course it's going to be 256 micro addresses. And I'm using this construct here, which is actually synthesizable by Vivato. It's an initial block, and you can put a begin and an end around here, but since it's one statement, I don't do that. Readmemh is a function that is defined in system verilog. It means read a file that is formatted in hexadecimal, and it looks like this. So this is my test op jump table. You can see that I'm specifying 0, 0, 0 for the first element, 1ab for the second element, and so on, all the way down to the full 256 elements. Now of course this is element 0, this is element 1, and so on. It's just that these are the line numbers in sublime text. So I am specifying the file name as this variable, which is a parameter. So this is an example of using a module parameter, which you declare like this. The default value is just going to be empty. Obviously that would never work, but when we instantiate the micro address counter, we get to specify what the op jump table file is. And the second parameter is basically where you want to put the resulting data. So it's obviously going to be an op jump table. And then the next thing is basically just more of the same. So I'm saying that by default the next call stack pointer is the current call stack pointer. And the next call stack, all of the values in the next call stack, is equal to the current value of the call stack. And then if we jump, we set the address to the load address. Jump if not 0, we check the Z flag, and we load the load address if the Z flag is 0. Otherwise we just go to the next address. With call, what we do is we push the next address onto the stack. And then the next address is where you want to go to. And return is pretty much the opposite. Op jump is simply index into the op jump table, whatever M is, and that's your next address. And that's pretty much it. For reset, I have, of course, set the call stack pointer to 0. And then during the clock phase, on the next clock edge, we set the address, the call stack pointer, and the call stack to whatever it is we wanted to set up. And that's all there is to it. Let's see. So let's take a look at the test program. The test program has been extended somewhat. So obviously for the test case, we now want to specify the Z flag and the M value. I'm also using this C++ construct that I don't have to keep typing micro address counter, micro address types, colon, colon, command type all the time. So now I'm basically aliasing that to command type. I've modified the test cases. So basically I'm testing increment, reset. I'm testing jump. I'm testing another jump. This test right here is to make sure that I only have as many bits as I think I should have. So even though in C, the type of micro address is 16 bits. In my Veralog source, of course, it's only 11 bits. So I just want to make sure that only those 11 bits got set, that I didn't make any mistake in terms of the size of the address. Then I do a reset. I test call. I test return. I test two calls. So I'm testing a call inside another call, making sure that works. I'm testing jump if not zero to make sure that works, both with Z1 and Z0. And then finally I'm testing op jump. I'm testing with m equals one, two, and three. And you can see from the table test, m equals one would go to 1ab. m equals two goes to zero, one, two. And m equals three goes to zero. If I can do those jumps, then it's pretty much assured that I can do all the jumps. And that the hardware is working. Okay, so the structure of the test. So the main looks very much similar. I'm just initializing all the values, setting up the test cases. Here before I do the first eval, I set all the counter inputs to the test case. I strobe the clock, doing an evaluation on each change. And then I call this function, which is just going to test one aspect of the result. Now in this case, I'm only testing one aspect. I'm testing one output. I'm making sure that the expected output, that the test, that the module's output is equal to the expected output. But in other modules, we may have several outputs. And I want to be able to print a nice message for any failures on any of the outputs. So I've extracted this function over here to basically print out, you know, if anything goes wrong. And finally what I've done is I've counted up the number of passes and basically printed how many tests passed out of how many tests. Because once you see all of the tests scroll off the screen, it would be nice to have a summary at the end. So finally what I've done is I've created a makefile because it's kind of a pain to have to type the varilator command in every single time. So I've used some GNU makefile magic here to basically allow you to say, simply make micro address counter and it's up to date now. Of course I'm going to say touch one of these guys and make micro address counter and it basically runs the entire thing all the way down to giving you an executable. And now I can run the executable by saying micro address counter, micro address counter and hey, 22 out of 22 tests passed. That's good enough for me. So I may have to change the makefile a little bit because right now the makefile is generic for any module. And also an important thing to realize is that for every module you have to run a separate varilator pass because varilator only lets you specify one top module and all the other sub modules are basically hidden. So you're only allowed to do effectively end to end tests. In order to do unit tests you will have to run through varilator one top module at a time. So that's why for every module you have to specify every single module and then you can do a make all which will make all of the modules and all the tests for all of the modules. And you specify which modules you have up here on the first line. You also have to specify the type files in the right order so that varilator knows what is being used when. So anyway what I was saying is that I may have to change this makefile a little bit because right now I'm using this one varilator flag, the minus G flag which lets you set parameters for modules. Apparently there is no way in the C code to set a parameter because varilator actually compiles the module with all its parameters. So you could do it this way or maybe you could specify if you wanted to have two modules with two different parameters and test that that worked. You could have a top level module which instantiates the two modules with those parameters being set to different values. And then you would access one module and test it and then you would access the second module and test it. Or I should say the second instance of the module and test it. So anyway I'm just going to stick with that and see if it works with everything else. So that's the micro address counter. Let's talk about now the registers. For registers I've set up a module and I have placed some registers that we're going to need. Here are the M and the V registers which are bytes. There are 8 bits. Here are the op0 and op1 registers. Those are 16 bits. Short end is the way you say 16 bits for two-valued logic. That is a logic that doesn't have X and Z states. And then we have the X register and we have the various pointer registers, the stack pointer, frame pointer, global variable pointer, instruction pointer, and address stack pointer. And those are all address types. Now, how do we write to the registers? Well, reading from the registers, of course, this module outputs the registers at all times. So we can read from the registers whenever we want. But writing to the registers we have to specify which register we want to write to and what is the value of the register to write. So you can see that I have these other four sections. These correspond to our little bits of dedicated hardware. So we have memory hardware, we have the ALU, we have the address ALU, and we also have the variable hardware. Each one of them can select one of the registers to write to and it can specify the data to write. So we have, for example, memory destination select and this is of type name type. And if you look at the register types, I have specified all the different registers that we can write to. There are a few extra ones here which I have put in because sometimes we want to write an 8-bit value to a 16-bit register and sometimes we want to write to the low 8-bits and sometimes we want to write to the high 8-bits. So I've done that for the op 0 and op 1 registers and those are the destinations that we can write to. So we have the memory destination select and the memory data, the ALU destination select and the ALU data and so on. Now, because the memory hardware can only output 8-bits, that would be the mem data, it's only a byte. Same thing with the ALU, it's 16-bits so we write a short end. And the address ALU and the variable hardware both output address types. So here's our usual state machine thing where we have next variables. And we have a combinatorial section up here and down here we have the sequential section where on the positive edge of the clock we write the next value of the register. So the question becomes, well, what happens if two bits of hardware want to write the same register at the same time? Well, we use a kind of priority. So for M and for V, we actually don't allow any of the dedicated hardware to write to it except for the memory hardware. However, for op 0 and op 1, either the memory section can write to it or the ALU section can write to it. So what we do is we say, well, if the memory hardware and the ALU hardware want to write to the op 0 register at the same time, we'll choose the memory output first. In practice, you would write your micro-programs so that that never happens. So we really didn't have to worry about that, but nevertheless we did have to pick some sort of ordering. So what this means, this is kind of a complicated statement. So basically if we want to write to op 0 high, then the next op 0 high, that is the high 8 bits in the op 0 register, we'll get the memory data. Otherwise, if we want to write register op 0, then we're going to set the high 8 bits of op 0 to 0 because effectively what we're doing is we're saying, okay, I've got 8 bits in memory and I want to write it to op 0, but I want to 0 extend it. So that's the difference really between op 0 high and op 0. So for the ALU, of course if we want to write op 0, then of course we want to write the high 8 bits of the 16-bit ALU to the high 8 bits of op 0. And if none of those are selected, then we simply keep the current value. And the same thing with op 0 low, op 0, op 1 high, op 1 low. There's X for X, this is actually kind of a silly way of indenting this. There we go, that looks a little more regular. Oh, performance issues, who cares? So here we can write the X register using either the address ALU or the var hardware. So that's what this does. And then for the other pointers, we only use the address ALU. Now, that could change in the future. Maybe we decide that suddenly we want to write the stack pointer with something from the var hardware, I don't know. Or maybe we have some other bit of hardware that we want to add, in which case we'll just simply change the logic. So that's how the register file works. This is the register type. You can see that we've defined an address type, which is 17 bits. So that corresponds to our address ALU and our var hardware. So let's take a look at the test. It's fairly straightforward, even though it looks quite large. This is pretty much the same template as we used before. I'm shortening the name of the name type. I have a test case, which has a name, and we specify the inputs, which are the destination selects and the data, and the expected outputs, which are all of the registers. Again, you'll note that M and V are 8 bits, Ops 0 and Ops 1 are 16 bits, X through AP are 17 bits, but of course we don't have a 17-bit type, so we're simply using the next highest size, which is 32 bits, and the test cases. So I've just named the test case, you know, the hardware to whatever register, and then I'm setting up the values. And you notice that we have a none, which basically means do not write to any register. And we simply write each register, and we're collecting the values. Here's from the address ALU, and we're collecting more values. Here's the ALU, and here's the var. And remember that I said that we wanted to sort of divide each test into testing just one element or one expected value. This is so that we can write out what we expected and what it actually was in a little nicer way. So for example, for the M register, obviously we've only got two hex digits. For the Ops 0 register, we've only got four and so on. And basically we simply set up the data. For every test case, we set up the data, run eval, strobe the clock, and then we check the expected value. And this is the nice way that we can simply determine whether the entire test has passed or not. And then we collect it up. So if we look at the make file, I have made a slight change here. What I've done is I have set up an extra flags variable. And I have appended the name of the module. So if we have a module that doesn't require these parameters to be set, then we simply do not set extra flags for it. And that's what this special incantation does. So let's go ahead and make, let me remove the old one, register file, make register file, it goes ahead and makes it, great, register file, and run the executable, and we have 18 out of 18 tests passed. Great, so that worked. So I think that for the next video, I'm going to go on to the next bits of hardware, which are the ALU, the address ALU, and the var hardware. And after that, we get to something a little more complicated, which is storing the actual micro program.