 Greetings, RISC-V friends. Look at this crazy stuff. I'm building a RISC-V processor, not on an FPGA. So we've gone through the register card, the shifter card, the ALU card, and now we're going to go through the sequencer, which is the thing that orchestrates everything. It pulls instruction after instruction from the memory and execute those instructions. Flipping the signals back and forth in order to make the cards do its bidding, and then it goes to the next instruction, and so on. So let's take a look at what the sequencer might do. So what I've got here is a spreadsheet of the data paths that the sequencer has to activate in order to execute an instruction. So let's take a look at a simple example. We've already seen the op and the op-im instructions. These are the ALU type instructions. So let's suppose we wanted to add two registers. So during machine cycle zero, so this is a read cycle followed by a write cycle, the first thing that we want to do is put the contents of RS1 onto the X bus, the contents of RS2 onto the Y bus. We want to set up the ALU to do whatever operation the instruction says to do, and put the result of the ALU onto the Z bus. Also what we want to do is we want to write the result into the destination register. So the Z bus has to go to the destination register for writing. Now because that's the only thing that the instruction does, the next thing that we want to do is we want to add four to the program counter because remember instructions are 32 bits or four bytes, and we want to add four to the program counter and put that into the external memory address because we're going to be reading the next instruction. So that's why the next memory operation is going to be a memory read. Now up here I put a little note that says that essentially at the beginning of every instruction whatever data is coming out of memory goes into an instruction latch. That way we can decode the instruction and figure out what to do for the rest of the cycle. So that's op. Now op-m is basically the same thing everywhere except that the immediate value that comes out of the instruction is put onto the Y bus. So you know obviously we need some sort of a latch for the instruction. We need some sort of a thing that pulls out rs1, rs2, or the immediate value out of the instruction not to mention the opcode itself so that we know what to do next. Let's take a look at another simple instruction. This is another one machine cycle instruction, LUI. This is load upper immediate. So the idea here is that there is an immediate value but it's only the upper part of the immediate value that gets encoded in the instruction. And what we want to do is we want to put that into the destination register. So what we're actually going to do is we're going to take the value of register zero, which remember is always zero, and we're going to put that onto the X bus. We're going to take the immediate value and put that onto the Y bus. We're going to do an add and put the result on the Z bus, which then gets written into the destination register. Now this has a sort of pleasing symmetry. You might ask why didn't I just take the immediate value and put it on the Z bus so that we could write it directly to the destination register without having to use the adder? Well the answer is that all the instructions basically kind of do the same sort of thing. So we may as well make them look the same. And of course because it's a one machine cycle instruction, right then and there we add four to the program counter and put the result into the program counter and also the next memory address. And of course then we read the next memory address to get the next instruction. So let's take a look at another single cycle instruction. This is AUIPC, which means add upper immediate to program counter. So the idea here is that we're going to take the program counter, we're going to add an upper immediate value to it, and then we're going to put the result in the destination register. And of course we're going to add four to the program counter and put that back into the program counter along with the next memory address. So that takes care of the single cycle instructions. Now there are some instructions which will have to take more than one machine cycle simply because we don't have the resources to do everything in parallel. So for example, let's take a look at the JAL, which means jump and link. So the idea here is that you want to go to, you want to jump the program to somewhere in memory, but you want to remember where you came from so that you can go back to it when you're done. This is essentially like a call and a return except that this is the way the RISC-5 does it. So the idea is that for the very first cycle what we do is we take the program counter plus the immediate value, we add them, put that on the Zed bus, and put that in the destination register. So let's just write that down. The destination register gets PC plus the immediate value. Now this immediate value is a low value, it's also sign extended so we can go forwards in the program or backwards in the program. So then what we do is we do two things at kind of the same time. So the first thing that we're going to do is we're going to take whatever's on the X bus and put that on the program counter and what we put on the X bus is the destination register. So that was this. So that's the address that we're going to jump to which turns out to be the PC plus the immediate value. Now we're also going to replace what's in the destination register with PC plus 4 which is our return address. So here we are in the program. This is our jump and link instruction and let's suppose you know we want to go to I don't know plus a thousand and here is you know PC plus a thousand. So that's the next place that we're going to jump to but when we're done we can go back to whatever's in the destination register which is just where we left off. So that's essentially doing a call. So this has to take two cycles because there aren't enough resources for us to do everything in parallel so we just break it up into two cycles. We could do this in one cycle if we wanted to build a lot more hardware but we're sort of deciding where the line is between the time that we take to execute an instruction and the amount of hardware that we are willing to invest in in order to speed things up. So this is where I drew my line. The JALR instruction is basically the same thing. It also takes two memory cycles except that we take what's in the source register one add the immediate value and that's the address that we're going to jump to. So let's talk about the final two machine cycle instruction. This is the branch instruction. So the idea here is that we are going to look at the two source registers and we're going to subtract them and then do a comparison. So you know maybe it's less than greater than greater than or equal to whatever whatever that is we're going to call that the condition and we're going to save remember we had these special bits in the ALU when we do a subtraction we have equal less than and less than unsigned. So we can use those in order to execute the various branch opcodes which are things like you know B, G, E, U branch if greater than or equal to unsigned. So of course we can use the equal and the less than unsigned bits in order to compute whether the condition is true or not. So once we do that based on that condition the next thing that we have to do is if the condition is true then we take the program counter add the immediate value to it and put that back into the program counter. So this is our branch. If the condition is false however then we take the program counter plus four and put that into the program counter. So that is basically going to the next instruction because the condition is not true. So again this takes two memory cycles because we have to do two things with the ALU. The first thing is to do the subtraction and the second thing is if the condition is true we need to add the immediate value to the program counter. So we don't have a separate adder in order to do that. Okay now let's take a look at external memory accesses. These are the load and store instructions. So these are a little bit harder to get your head around. So load and store. Let's take a look at load first. So the idea behind load is there are several load instructions. There's load byte and load byte unsigned. There's load half-word and load half-word unsigned and then there's low load word. Now this is a 32-bit word processor because it's an RV32. If this were an RV64 then you would also have load word unsigned and load double word. But we don't. So let's get rid of that. Okay so what does this do? Well load byte, well let's take a look at load byte unsigned first because that's sort of the easiest thing to wrap your head around. The idea is that the memory is 32 bits wide. So you can consider every four bytes of memory like this where this is an address and this is the address plus zero plus one plus two plus three and then of course plus four would be the next one down. So if we want to load byte unsigned address, so that would be say this byte, then the result when extended to 32 bits should be whatever that is. If we want to load address plus one that would be this byte over here and we would stick that into the lowest significant byte of the result and that's all there is. Same thing with plus two and plus three. So there's obviously some shift that you have to do because remember we are going to be operating with a 32 bit wide memory which means that when you read an address you're reading essentially all four bytes at the same time. So which byte you have to select is based on the address that you want to access. Okay so that's load byte unsigned. With load byte signed what you're doing is exactly the same thing except now you're sign extending the rest of the results. So if you had the address plus one and you put that in the lowest significant byte then if the highest, if the most significant bit of that byte was one then you fill in the rest with ones. So the idea here is that if you want to load a negative one byte then you're actually loading ff out of the memory and because you're sign extending it that turns into all f's which is again negative one in 32 bits. So it's the same thing with load half word and load half word unsigned except now you're loading two bytes at a time. Now the thing is that half words have to be loaded at 16 bit aligned addresses. So for example if you want to load the half word at the address plus zero what you would actually be doing is taking the first two bytes plus zero and plus one and putting that into the first two bytes of the result. Now you cannot load address one plus address two because that is not 16 bit aligned. The next half word that you can load starts at plus two and plus three. Now this alignment requirement kind of makes sense because if you think about it if you want to load the half word at address plus three then you would end up having to do two 32 bit reads one to get the plus three and the other to get the plus four which is the plus zero of the next 32 bit section if that makes any sense. And of course load word is the same thing because that's a 32 bit value you're only allowed to load that at 32 bit aligned addresses. So that way when you read all 32 bits from the memory all those 32 bits just go right into the destination. So there's no having to you know read two bytes out of one section of 32 bits and then read the next two bytes none of that actually happens. Okay so how do we actually implement this sort of thing if all we can do is read all 32 bits. Okay so suppose that here is our memory again plus zero plus one plus two and plus three and let's suppose we want to load signed byte from address plus one let's just say all right. So imagine that we go ahead and let's actually use the instruction plan that we've got right over here. So what we're actually doing so the load instruction loads from a memory address at RS1 plus an immediate value. Okay so there's RS1 there's the immediate value we're adding them we're taking the result and we are putting that into the memory address that we want. Now of course again we are going to be doing this 32 bits at a time which means that we're essentially going to ignore the two least significant bits okay and the least significant two bits are the thing that tells you which one of these eight bit segments out of the 32 bit result we actually want to read. So we're going to read that in to a temporary register that we just call data this is the data that we read out of memory so this is data. So the next thing that we do is we shift it logical left by some shift amount. So obviously that shift amount okay maybe not so obviously. So if we're doing a load byte and this is signed what we're going to want to do is shift it over by 16 so that the byte ends up as the most significant byte of the result. Then all we have to do is shift it right arithmetic by 24 which will automatically sign extend and put the byte in the correct place. So for example if we were to oh I don't know read F A out of this so the first thing we would do is put F A over here and of course you know this is going to be some value this is going to be some value and this is going to be some value because we're going to read all 32 bits in. Then we're going to shift it left by 16 bits so that means that we're going to get F A over here some random junk over here a zero over here and a zero over there then we're going to shift it right arithmetically by 24 bytes by then we're going to shift it right arithmetically by 24 bits which means that the F A will end up over here and of course because the high bit is set that will result in all F's followed by F A which is exactly what we want. Now we could do all this in hardware but then we'd have to build a whole bunch of shifters just to handle this case so that's why I was saying that this is where I'm drawing the line between speed and amount of hardware so I'm willing to say that an external memory access for loading will require three machine cycles. Now what about something like loading a word well the same process applies so let's suppose the word that we want to access is you know F A C E right well you can see that first we load it so we get F A C E over here and some random junk over here then we go ahead and shift it left 16 bits so now we have F A C E zero zero and then we shift it right arithmetically by 16 bits so that we have F F F F F A C E so that's 16 bits now and not 24 bits because we're doing a half word load instead of a byte load. Now if on the other hand we wanted to read from address plus two so let's suppose this is F A C E and this is some random junk so again let's do this so that we load the memory all 32 bits into our temporary data register now we're going to shift it left by zero because again we want to sort of use the same plan even if we could skip it just the fact that we have to decide whether to skip shifting left we'll just add more hardware so we shift left by zero and then we shift this is S L L and then this is S R A shift right arithmetic by 16 bits and again we end up with F F F F F F A C E just like we did before except we loaded it from two bytes down for load word well it turns out that we have to shift left by zero and then shift right by zero and you know yes you could say oh well then that means that you only need to use two cycles instead of one because well the first cycle is where you read the data and the second cycle is where you write it into the final destination so that's a minimum of two cycles why can't we just skip the third cycle again the reason is that if we're going to save on hardware then we're going to have to do things in a more consistent way so that's all there is to it so that's load so let's take a look at load again so what we do is we take the source register one and we add it to the immediate value and that goes into the z bus and then the z bus goes into the address great so now we have the address next we take and we do a memory read next what we do is we take the data that we got and we shift it by some shift amount that's going to depend on the size of the load that we're doing and the offset into the 32-bit area that we're doing and then we shift it logical left the result goes into the z bus and that goes into the destination register now we need to shift it right either logically or arithmetically depending on whether we are doing a signed or an unsigned load so we take the destination register shift it by some amount that depends on the size of the load and we do some shift operation maybe it's a shift right logical maybe it's a shift right arithmetic the result is in z we put that result in the destination register and we go to the next address for the next instruction now we can see that store looks pretty much the same way what a store instruction is store rs1 immediate well okay so what does store look like store rs2 in rs1 immediate okay so what this means is we take rs1 we add the immediate value that's the address and then we take rs2 and we put it in that address now again you can store byte you can store word you can store half word you don't store signed because you don't have to store the sign it's already in there so for example if you wanted to if your if your register contained negative one then it's just all f's and if you wanted to store a byte well you store ff which is the lowest significant byte so what we're doing here is we're taking rs1 adding the immediate value to it putting the result in the address and then the next thing that we're going to do is a memory write so there's a write mask which tells us which bytes we're actually writing so again if this is our 32-bit memory area with the address that we're going to write let's suppose we wanted to write a byte into the address plus two in other words the offset into the 32-bit area is two bytes well then we want to essentially mask off the rest and we can do this with the 32-bit memories that we're going to be using so that we only write into this eight-bit segment okay so that's for storing a byte for storing a half word it would be the same thing except we are going to have a mask of either 1100 or 0011 depending on whether we're writing to address plus zero or address plus two and again you cannot do unaligned writes so the second cycle is taken up by taking rs2 shifting it left in order to get it into the correct location and putting the result in our write data register and then the next cycle is taken up with actually writing that data and going to the next instruction so that pretty much covers all of the risk five instruction set with the exception of two op codes and that is misc mem miscellaneous memory and system now in miscellaneous memory there is one op code called fence so the idea behind fence is that if you have a cache what you're basically doing is you're loading a whole bunch of memory into a fast location in your processor where you can do memory accesses a lot faster than say three memory cycles let's suppose you know one memory cycle so you can speed up your your processor or computer by doing that now the thing is that when you write into the cache then the data that you wrote is still in the cache it's not fully committed to the external memory so now if some external thing wants to read the external memory maybe it's another cpu maybe it's a peripheral the problem is that now you have not written that data to the cache and what fence tells us to do is to specifically write that data to the cache so that the next read will read a consistent value okay so because we don't have a cache in our system we don't have to implement the fence instruction it basically does nothing now system has two op codes equal and e break now the definition of e break is break to the debugger okay what does that mean i don't know maybe it means just temporarily halt the processor maybe that's what we'll do we'll just temporarily halt the processor and then you have to press a button in order to get it to continue that seems like a reasonable implementation so maybe that's what we'll do with e break now what e call does is it performs a system call and what that really means is you can program your operating system to have you know call number one maybe output a data byte to the monitor call number two might turn on bluetooth i don't know so these are sort of like uh you know peripheral calls if your peripherals aren't memory mapped or they are particular system calls that might do something else we're not going to implement any of that at least not now so an e call will actually just do nothing and move move on to the next instruction okay so that pretty much covers all of the rv32 i instructions now let's take a look at a diagram of all of these data paths now if you take a look at now if you take a look at the columns here we can see that for example the destination register is always getting its input from the z bus and this makes sense because and we did this cleverly because the register card only reads from the z bus in order to put a value in a destination register so that worked out great we can also see uh for example with x that the x bus gets its value from a few places it could get it from source register one and again this is the way we built our register card that source register one always goes into the x bus we could also read from okay well a destination register this is just a register it doesn't have to be the register one that's in the instruction it could just be any register so in the j a l r and j a l cases we're actually stuffing the destination register number into the value that we want to load onto the x bus we could also put the program counter onto the x bus we could also put register zero which again is just another register onto the x bus and we could put the memory data that we read from the memory onto the x bus so really the things that we want to write onto the x bus are let's suppose this is x so the first one is a register the second one is the program counter and the third thing is the memory data okay so wherever these are coming from we just have to you know set up the multiplexer so that we know what exactly we're putting onto the x bus now if you'll notice there are some places where we don't put anything onto the x bus and that would be during machine cycle two of store and during machine cycle one of branch where the condition isn't true and if again we want to be consistent we can just put anything we want there because we're not going to be using the x bus for anything so you know we could just say oh i don't know rs2 and pc why not it's pretty much the same thing that we were doing now it's the same thing with the y bus we can put a register onto it we can put the immediate value onto it we can put some shift amount onto it and then again there are these blank spots in the table where you know we may as well just put something that we were already doing onto the y bus because we're not using it and if you look at the uh z bus you can see that most often we're putting the result of the alu operation onto it except sometimes we're putting the program counter plus four onto it now you might be saying well we've already got an adder why aren't we just adding four to it well the reason is that whenever we want to put the pc plus four onto the z bus and if we were to use the alu then we would have to use the x bus and the y bus to hold the program counter and four but the problem is that we that resource is being used by rd which needs to go into the program counter so we don't have a direct path between a register and the program counter because remember the registers are stuck on the register card so if we want to put a register onto the program counter we have to put the register first onto one of the buses and then read that bus into the program counter so because that bus is being used we can't use the alu for anything um and again this is why j al and j al are take two cycles because we don't want to add another 32-bit bus to the cpu bus so again um we don't do anything during these cycles so we may as well again do the same thing uh because again we're not actually going to use the result now the result always goes on to the z bus but we don't do anything with the z bus we're not writing it into the destination register so it might look that for pretty much uh actually for almost all memory cycles with the exception of these we're not actually putting anything onto the destination register so anyway um so now you can see that what do we load into the program counter well because most often we're going to the next instruction in memory however sometimes we're going to a different instruction especially for the jumps so that's why we load the x bus into the program counter and sometimes we load the z bus into the program counter when we have the address of the next instruction on the z bus same thing with the memory address sometimes we put the program counter plus four sometimes we put x into it and sometimes we put z into it so these are all these data paths that we have to multiplex back and forth so we can look at sort of a I hesitate to say simplified diagram but this is what the sequencer does this is a diagram of what the sequencer is supposed to do so we can see that for example here down below are the x y and z buses now if you remember there is a program counter so that's this block right over here here let me let me maybe make it a little more obvious by putting a little blue line around it okay so that's the program counter sometimes the program counter is going to get the z bus that is this data path right over here sometimes it's going to get the x bus and sometimes it's going to get the pc plus four now you'll notice that I put a little plus four block in the sequencer this is going to be an adder whose only purpose is to add four to its input and we can do this because it's sort of special circuitry it's not a it's not a more generic alu it's just an adder so that is going to live on the sequence card so what I have here is a multiplexer which shows that we are either going to load the the program counter with z with x or with the program counter plus four because it's up to the multiplexer right over here to decide what goes into the program counter and who decides that this huge now what block so the idea is that here's our instruction right and here is our memory data read latch so this is a latch and this is also a latch the instruction so what happens when we read an instruction from memory is it goes through the memory data register it goes into the instruction latch and then it just sort of gets broken up depending on what the opcode is and all that data gets fed into the now what block and the now what block is going to look at the instruction phase this is the machine cycle and it's going to say oh phase zero opcode you know op right maybe it's one of those single cycle alu operations and it's going to tell this multiplexer over here that the next value the program counter should get is the program counter plus four and that way we can get to the next instruction so that's what the now what block does and what i've done here is i've put signals out of it to indicate what exactly it's going to do it's also going to orchestrate these multiplexers and these buffers over here all of these things over here it's going to decide whether they're on or off or you know which which input it's going to select but also it's going to output signals on to the cpu bus so here we have the signals that we saw before register to x register to y z to register the alu operation that we're going to do and so on here's that shift amount when we're doing loads and stores so that's going to feed into the y bus if we're doing a load or restore and if that is the machine cycle that we're on here are the inputs for computing conditionals here are the inputs which tell us which opcode we're doing and which function it is if it's an op then the funk funk three for example tells us which op it is um and let's see that is almost all of it the other thing that i want to point out is that i added these two extra output signals one says instruction complete and the other says illegal the idea is that if we encounter an illegal instruction then we simply halt the processor and that's it you can't do anything else this is known as a fatal exception the other thing is instruction complete um i just felt that it would be nice to be able to signal every time an instruction was completed so if you look at this diagram over here you can see that our system clock is ticking away this is our phase one and phase two clocks and this is our this is a special signal called machine cycle end which only goes off during the last phase of a machine cycle and if the instruction is complete during that last phase of a machine cycle then we say the instruction is complete so the instruction complete signal will look like the machine cycle end except it's going to be delayed until the end of the instruction you know if it's two memory cycle if it's two machine cycles then during the first machine cycle the instruction is not complete during the second machine cycle the instruction complete signal will go high so that's pretty much it now here's the thing i could show you the code in fact i'm going to show you the code i can't really explain every single line of it and the reason is that it's not finished the thing is that in order to formally verify that code i would pretty much have to hook this up to the rest of the system so there would have to be a register card there there would have to be an alu card there there would have to be a shifter card there and that is what i'm going to do next i'm going to hook it all up with buses and a clock generator and then run formal verification on every single instruction and there's a very specific way of doing that which i'm not going to get into on this video so let's take a look at the code and then we'll wait until the next video to actually start figuring out how to formally verify everything together so here's the code again i won't be going through it in great detail because for all i know this is not going to be the final version of the code because i haven't really done any formal verification on it so the first thing that i did was i set up you know just a few constants what are the op codes what are what do the branches look like you know what do the various load and stores look like here's a little diagram of you know phase one and phase two and here are all the signals so there's machine cycle and the alu stuff you know all the control signals all the stuff that we saw in the diagram okay we also have a whole bunch of internal signals and some of these internal signals like this section over here this is that little part where you saw that it broke apart the instruction into the op code the immediate value you know funct three funct seven the various registers that sort of thing these are some of the registers like you can see the instruction register in there the pc register here's pc plus four that's automatically going to be just whatever the pc is plus four these are those registers that store the condition from the alu and so on and then we've got a lot of other control lines some of those are external and some of them actually go to the multiplexers inside so what i usually do to start things off is i set up a bunch of defaults for all of the combinatorial signals and basically the defaults are all going to be zero okay so then what i'm doing is i'm also going to say well that illegal instruction signal by default it's going to be set to zero and we're going to set we're going to check the illegal signal on every phase two so this is basically twice in every machine cycle okay also i set up that pc plus four signal it's always going to be pc plus four so the next thing that i do is if the instruction phase is zero so if i'm at the very beginning of an instruction then i know that i want to load the instruction and read the instruction from memory so then if i'm on the last instruction cycle then i want to set the instruction complete signal to be the same as the machine cycle end signal you know that little pulse at the end and then this is basically if i'm not actually setting the program counter to either the x or the z buses then i by default just set it to pc plus four so i'm going to the next instruction now i just want to warn you that this code i actually did not have the if statements and in a little while i will show you the result of what happens when those if statements aren't there okay so this next part is just setting up the instruction latch so that i latch the instruction when i read or load an instruction the next section is updates to these registers so these registers always update on phase one going high so on the positive edge or in other words the very beginning of the machine cycle so i update all of the registers the instruction phase the stored conditions the program counter the the external memory address and if we're doing a write the data that we're going to write to memory now the multiplexers uh those are not registers those are just multiplexers they're combinatorial now strictly speaking these are also multiplexers but they're multiplexers with a register on the other side so and the idea behind that is that if none of these conditions are true then the register retains its value so in other words i don't clock the register these multiplexers on the other hand go to the bus so this is going to the x bus this is going to the y bus and this is going to the z bus so again this is just following that that big crazy diagram okay this is the instruction decoder so the opcode rs1 rs2 rd funk 3 funk 7 they are always in the same locations in the instruction which is why i can do this so the next thing is to decode the immediate value now the immediate value can be in different places in the instruction depending on the opcode so let's actually take a look at that uh function so here's decode immediate so based on the immediate format um and where did i get the immediate format from well it's actually going to be based on the opcode so if the opcode format is i then um so if the immediate format is i then i take the immediate from these bits uh and so on so they're also always going to be signed so you know this is just you know shuffling the bits around and then finally based on the opcode uh i want to write some code that handles each opcode so let's take a look oh and of course at the very end if if the opcode is none of the ones that i know about set the illegal flag so let's take a look at handle op okay so this is just straight combinatorial stuff so first of all i set the immediate format to r because that's what it is for for the op opcode and then i just basically set the switches on the multiplexers so uh we're going to load a register to the x bus and this is the x register it's rs one we're going to load a register to the y bus and this is the y register it's rs two the operation that we're going to perform comes out of funk three and funk seven we're going to send the result of the alu to the z bus and we're going to send the z bus to a register and the register that we're going to send it to is rd so this is basically how you set all the little bits and control signals and multiplexer signals to perform an alu operation taking the operands from rs one and rs two and putting the result in rd so this is basically just following um the little uh table that i had of instructions uh let's take a look at op immediate it's basically the same except we send the immediate value to the y bus instead of reading it from a register um here is a ui pc uh it's basically the same thing uh except we're sending the program counter to the x bus the immediate value to the y bus we're doing an ad operation and then storing the result in the destination register i don't actually need that uh because again at the end of every instruction that's what we do by default uh here is l ui it's basically the same thing we take uh register zero which is always going to give us zero and we put that on the x bus uh we put the immediate into y we add the two um and we store the result in rd and that's it we don't need to do that we can take a slightly more uh complex example this is the j a l r instruction jump and link with register um and i think this is true um because that's the french word that this instruction reminds me of and it i think it means i go at least according to the web that's what it means and j a l r is kind of like an i go instruction because i'm going to another yeah okay uh anyway so uh this is a two phase instruction two memory cycle instruction so if we're on the first memory cycle of the instruction so phase zero what we do is we take rs one put it on x take the immediate value put it on y add the two and store the result in rd then we bump the instruction phase so you notice that in the other instructions we didn't bump the instruction phase we just left it at zero which means that by default it's just going to load up the next instruction so here we're going to the next instruction phase so the so you know no memory is read the instruction latch doesn't change the decode doesn't change we're just bumping the instruction phase so in the next instruction phase what we do is we load the rd register onto the x bus um we put the x bus into the program counter we take the program counter plus four and put that into the z bus and what do we do with z nothing apparently so i just moved this line down because i wanted all the x's together so we load the x bus with rd and we put that in the program counter and in the memory address so that's that jump part of it and we also take the program counter plus four which is the address of the next instruction and we send that to z now what are we going to do with z well we should have saved it so i'll just do that um and this is the reason why i can't really fully vet this code uh until i actually execute an instruction and make sure that the instruction that the cpu did what the instruction was supposed to do i can't really test this so you know i'm just sort of eyeballing this so uh so that's the jump and link instruction and i probably made the same mistake in the jal instruction so let's go ahead and fix that really quick there okay so that's uh what i think is supposed to happen and in fact it's in the summary that z should go to rd and i didn't actually do that um so now we can look at an even more complex instruction this is the load instruction when i put a whole bunch comments about how it's supposed to work uh and this is a table of all those you know shift values that we talked about um the first shift is always going to be a shift left the next shift is either going to be an arithmetic shift or a logical shift um and basically this is the template that i'm following and i'm not going to bother to go through this line by line yes i may have made a mistake i will probably catch that during formal verification so this is instruction phase zero this is instruction phase one and then based on the kind of load that i'm doing and the width of the load that i'm doing i set up the shift amount and then during the final phase um i set up the shift amount and either shift right arithmetic or shift left arithmetic so that's basically that there's a store instruction and that's pretty much that now the only thing that i could really do for formal verification is set up a couple of cover statements to make sure that i could actually get something to happen um and also i wrote a bunch of assumes now what these assumptions are these are all the inputs to the sequencer and what i'm saying is uh during during a machine cycle remember there are six phases during machine cycle so during the very first phase um you know things can change but uh for the rest of the machine cycle we're assuming this is the way we're using the machine is that none of these inputs change and what i'm asserting is that none of the outputs change except for that very first uh that very first phase in a machine cycle so by running that i can sort of make sure that the thing isn't you know randomly changing the outputs everything is nice and stable throughout the entire machine cycle so at first glance everything seems okay we can see that BMC passed so obviously all of our uh asserts worked but if we just scroll up to look at the cover statements we can see that uh one cover statement was reached up there uh in line 887 but the other cover statement in line 886 was not reached so let's take a look at that so the cover statement that wasn't reached uh online 886 up there is uh this line right over here so we can't actually get the program counter to go to 100 which is a little odd i mean i know we started at zero uh and if we execute one instruction that doesn't go anywhere it should increment by four but we do have things like branch instructions and jump instructions so why weren't those actually exercised let's see if we can find out so here is the part of the program that handles jump and link uh and we can see in the summary that it should take whatever is in the x register and put it into the program counter now remember because we don't have this hooked up to the rest of the machine we don't actually have a register file formal verification is free to you know put whatever it wants onto the x bus at this point so it should have figured out that it could put 100 onto the x bus and that would be the equivalent of jumping to 100 and that should satisfy the cover statement but that didn't happen so if we go down to here we can see that the instruction phase if it's zero then it does stuff and goes to the next instruction phase um and here um in the next instruction phase we are going to load the x bus onto the program counter so there are a few things that could possibly be the cause the first thing is that the instruction phase never actually goes to one for whatever reason i can write a cover statement to show that that actually does or does not happen so let's do that okay so here is my simple cover statement i just want to know does the instruction phase ever get greater than zero so let's find out and indeed we see because of this statement right over here online 886 was our new cover statement so we know that the instruction phase can actually become greater than zero so that's okay so the next thing is how do we change the program counter well we've got three ways to change the program counter and one way to not change the program counter so pc plus four can go to pc or z can go to pc or x can go to pc so let's write another cover statement that says at some point either z or x goes to pc okay so again here is our simple cover statement we just want to know if either x or z can go to the program counter so let's run it and see okay and we look to see our cover statement which was online 886 and it is indeed reached so we know that at some point either the x bus or the z bus get loaded onto the program counter so how about the program counter does that actually get loaded so with that in mind i started looking at the code and i found a bug so what this says is that um if the last instruction cycle if we're in the last instruction cycle then i kind of assumed that we were sending pc plus four to the program counter well it's true and uh later on we also set x to pc and z to pc but as you can see here what actually happens is that this is an if else statement so pc plus four to pc is actually one which means that that is what takes precedence so that was my bug what i have to do is make sure that i don't load automatically pc plus four to pc if i wanted to send x or z to the program counter so let's fix that okay so what i've done is i've added a couple of if statements to make sure that if we decided not to load the pc with anything else and we're on the last phase of the instruction then you can go ahead and load pc plus four into the pc and the same thing with the memory address if you're not loading anything else into the memory address then go ahead and just load pc plus four into the memory address because that's the address of the next instruction so let's run this and see that our cover statements are actually hit okay and it was definitely hit and let's just make sure that bounded memory that bounded model checking actually works uh which there shouldn't be any reason why it isn't good and let's go up to the cover statement uh it was line 888 right here so let's pull up that trace and see exactly how it was able to get the program counter to be above 100 all right so here is the trace and let's go into the sequencer and first off let's take a look at the phase counters okay looks reasonable uh let's take a look at the instruction complete okay so we got one instruction complete fine let's take a look at the program counter and see what it did let's zoom in a little bit and move over a little bit so we start with the program counter being zero and we ended up with the program counter being two zero zero okay well how did it do that let's take a look at the opcode okay the opcode was six three so i'm just going to quickly look to see what six three actually means um six three is a branch okay so let's take a look at some things that are important to the branch uh let's see the condition is actually important so there was condition and you can see that condition uh was set to true uh and let's see data x in data z in okay and z would be the destination address that was calculated so we can see that when the instruction ends right over here the condition is true and we do a jump to whatever is in uh whatever is on the z bus and that's pretty much that okay so what i've got here is um another simple cover statement what i want to see is whether we can get a store opcode uh to complete and see what the waveforms look like so the cover statement is basically make sure that the instruction is complete so that's that little pulse at the end of the instruction and also that the opcode is store now i do have to admit that i made a mistake um in an earlier edit of this video where i did not have the parentheses around this uh inside and the reason that that didn't work is that logical and or actually bitwise and takes precedence uh over the equality right over here so what so without these parentheses what i was actually trying to do is taking the instruction complete anding it with the opcode and then trying to see if that equal to opcode store which of course it never would because instruction complete is one bit so the result would only be either a zero or a one which is not what opcode store is so anyway let's take a look at how we can get the cover statement to execute so the cover statement was online 888 uh so we are going to be looking at trace number two so let's take a look at that all right and here is trace number two so as usual let's pull up the two clock phases phase one clock and phase two clock uh let's take a look at the instruction complete pulse there it is at the end let's take a look at the opcode all right there it is 2 3 which is store so let's take a look at the program counter sure it wouldn't change because it's just that one instruction that's being executed what we're actually interested in are the memory signals so let's take a look at the memory read and memory write signals there's memory read and there's memory write so we're doing a read first and then we're doing a write and then we're doing nothing because remember store is a three cycle instruction during the first cycle the instruction is read and things are computed during the second cycle we're actually going to do the write and this is the right section right over here um and during the third cycle we're just going to compute the next uh the the address of the next instruction let's take a look at the memory address okay well it was stuck on zero that's fine um and let's see memory data to write fine it was also zero that's also perfectly fine uh we could add some things to the cover statement like you know say that memory data write should be non-zero um and check that but you know for the moment this seems pretty reasonable to me so up until now all i've really been doing is executing a couple of spot checks you know i have an idea that i want to check out and it goes ahead and checks it for me obviously i can't check every single condition and that's the entire purpose of formal verification so these are the asserts and you don't really get that with cover statements so much uh in order to actually formally verify that all of the instructions are working what i have to do is hook everything together and then write some assertions that say okay here's a snapshot of the registers before the instruction here's a snapshot of the registers after the instruction and also here's a list of all the memory accesses that occurred during the instruction now given the definition of a particular instruction if that instruction was executed make sure that the snapshots and the memory traces correspond to what the instruction was supposed to do that's really the way you formally verify the processor so that is a longer process which of course i won't be going over right now because this video is getting pretty long but that's what we will be doing next week we'll be putting together a formal verification framework for the entire processor to make sure that it works so until then thanks for watching and i hope you join me next week see ya