 Greetings, RISC-5 processor friends. So with the register card built and somewhat tested, but not entirely tested, I thought it would be nice to move on to perhaps the next phase in the project, which is talking about arithmetic and logical instructions. So let's set the registers aside for now. And let's talk about the register register instructions. Now I'm going to draw out the R instruction format. And for the register register instructions, the opcode is always 0110011. The destination register is whatever it is, it's five bits. Same thing with the source register. So we've got five bits here and five bits there. And func7, well the lowest five bytes are always zeros. The second to the last byte is a little bit special and the last bit. The last bit is zero. Funk3 tells us what operation we're going to perform. So a 000 is either an add or a subtract depending on the value of this bit right over here. So if it's a zero, then it's an add. And if it's a one, then the operation is a subtract. And subtract means source register one minus source register two. Not the other way around. For 001, this is shift left logical. So we're going to shift it to the left this way. We're shifting in zeros and the bits on the most significant end just sort of fall off. They don't actually get stored anywhere unlike in other processor architectures. And the number of shifts that you do is determined by RS2. So this is RS1 shifted left by RS2. And it's not of course all of RS2. You can only shift 32 positions so we only pay attention to the low five bits of RS2. So the next operation, 010, is set if less than. And what we do is we do a signed comparison between RS1 and check if it's less than RS2. If it is, then we put a one in the destination register, otherwise we put zero in the destination register. And there's also 011, which is set if less than unsigned. So again, we do RS1 is less than RS2, but this is an unsigned comparison. The difference of course between signed and unsigned is that in signed you use two's complement notation and then you get negative numbers and positive numbers. So for example, for set if less than unsigned, if you had, for example, hex FF is less than zero, the answer would be no because that would be 255 is less than zero. However, if you did a signed comparison, FF in two's complement is negative one and negative one is less than zero. So with the same registers, SLT and SLTU will give you different results depending on whether you're doing signed or unsigned. Okay, then we get a logical operation. This is 100 and this is just XOR. So we just take RS1 and XOR it with RS2 and put it in the destination. I guess for completeness, I should just do this. We have 101, which is shift right logical and this is RS1 shift right logical. I'm going to use the Java notation here and what this does is it shifts the bit over to the right and again, we only pay attention to the lowest five bits of RS2. Now what shift right logical means is that we're shifting zeros into the most significant bit and on the least significant bit they just fall off and disappear. There's also shift right arithmetic, which I'm going to symbolize like this and the difference between shift right arithmetic and shift right logical is that the signed bit, which is the most significant bit, is retained. So we're not actually shifting anything into the most significant bit. We're shifting the most significant bit into the most significant bit. So if you have one and then a whole bunch of zeros and then you shift it arithmetically right, you're going to end up with one, one, and then a whole bunch of zeros and that is simply signed division. So it's basically treating this as a signed number. Shift right arithmetic is encoded in the same way as shift right logical and the difference is this bit again. So a zero bit means shift right logical and a one bit means shift right arithmetic. Then we have 110, which is OR, so this is RS1 logical OR with RS2, so it's just bitwise OR and 111 is, you probably guessed it, and this is bitwise AND. Okay, so those are the register register instructions. Taking two registers and putting the result in the destination register. Now one of the interesting things that you might have noticed is that there's no register register move instruction. In other words, I can't just take RS1 and move it into the destination. I can't take RS2 and just move it into the destination. But you might remember that register zero is always zero. So you could in theory move RS1 into RD as long as you say that RS2 is zero. That's the equivalent. And this is one of the things that makes risk five a very risk processor in that they basically said well if you can do something with one of these instructions then we don't need to provide that other something. So we don't need to provide a move instruction, you can simply do it with add. If you wanted to, you could even add R0 to R0 and put the result in R0. Writing to R0 does absolutely nothing and this is a single instruction, well that's a no op. So risk five doesn't provide a no op operation because you can do that with other instructions. Okay, there's also the other format of instruction which is the register immediate instruction. And this is the I format of instruction. So you can see that it is pretty much identical to the R format with the exception there's no RS2 and func7. Instead we replace that with a 12-bit immediate value. And this is always treated as a 12-bit signed immediate value. So the most significant bit is the signed bit. So the opcode for the register immediate instructions is 001, 0011. And if you compare it with the register-register instructions you'll see that really it only differs in this fifth bit right over here, bit five. So the functions are pretty much the same. We do have an add instruction. This one is called add i for add immediate and that just takes RS1 and adds the signed immediate value. Now there is no subtract because of course why would you have a separate subtract instruction when you could simply do the negation of the immediate value yourself. So if you wanted to subtract one well you would just add negative one and that's how you would do that. 001 is again shift logical left immediate. So this is RS1 shifted by the immediate value and again we only pay attention to the lowest five bits. 010 set if less than and again this is a signed comparison between RS1 and the immediate. And then there is 011 which is a set if less than unsigned immediate. Now this is a little strange because now we're dealing with unsigned numbers. So what happens is you take this number as a signed number and then sign extended out to 32 bits and all that means is that you take the most significant bit of that number and you just replicate it all the way to the end. And then you do an unsigned comparison. We also have XOR so this is just RS1 XORed with the immediate value. And again this is sign extended so again you take the most significant bit and you replicate it all the way to the end. We have 101 which is shift right logical RS1 shift right logical and again you only pay attention to the low five bits. We have SRA which is sorry this is all immediate. We have shift right arithmetic so that's RS1 shifted by the lowest five bits of the immediate value and here the encoding is the same for func3 except remember that over here we had the second to last bit determine what it is. Well here we also have the second to the last bit determine what the operation is. If it's a zero it's a logical shift and if it's a one it's an arithmetic shift. And the reason of course we can do that is that we only need the low five bits of the immediate value all the rest are free to do whatever we want with really. And then finally there's 110 which is OR immediate RS1 ORed immediate again sign extended and 111 which is AND I RS1 AND immediate bitwise sign extended. So in terms of this sign extended stuff well you know this number over here would also be sign extended if you want to take a 32 bit number and also add another 32 bit number. So instead of this 12 bit number you have a 32 bit number it's just that you're sign extending this. So those are the register register instructions and those are the register immediate instructions. Now with these I can load some values certainly some low values into a register. So for example if I wanted to load an immediate number into the destination register well I just set register 1 to 0 and that's adding 0 plus the immediate and putting it in the destination. So again you don't have a specific move a constant into a register instruction. We have all of these instructions that we need to implement somehow and the traditional way of doing that is with an ALU an Arithmetic Analogical Unit. Now what I want to do is talk about the shifts separately. So I'm going to leave those aside for the moment and I'm just going to talk about add subtract the less than operations and then the logical XOR or an AND operations. So what I'm going to do is I'm just going to draw the ALU as a box and I'll call it a 32 bit ALU and we're going to have a destination bus which of course can go to the destination register and that's a 32 bit bus so there's 32 bits right there. We also have source 1 which can come from source register 1 and we're going to put that as I'll just call it X and that's also a 32 bit bus. Again we have source 2 which could come from source register 2 or because both of these operations use source 1 this immediate could be put on the source 2 bus. So first of all we're going to want source 2 to go into the other 32 bit input of the ALU and we're also going to want the ability to put and this is just going to be maybe a buffer so this could be you know we'll just call it the immediate buffer and we can put the value of the immediate buffer onto source 2 and where you get the immediate value of course is from the instruction so we know if we look at this that the we have 12 bits here for the immediate value so we could just take instructions 11 to 0 or rather that's not the instruction it's instruction 31 through 20 and that becomes the immediate value 11 to 0 but remember that we're sign extending it so we're simply going to take the 11th bit here immediate of 11 and replicate it so that it's also immediate 31 down to 12 and that's just your immediate value and then of course we have an output enable over here that will be enabled if the format of the instruction is I which will depend on the opcode okay so the next thing that the ALU traditionally needs is a carry in and even though there is no flags register in risk 5 we're still going to want a carry in and I'll get to why in a moment we'll also want an output which is a carry out and again we'll get to why in a moment we also want a function and I'll just call that f and the function is basically just this except that here we have subtract and add look like the same function we're ignoring the shifts but you know maybe because we're ignoring the shifts we could make sub become 0 0 1 in any case these are the instructions or the operations that the ALU performs 1 2 3 4 5 6 7 so we have 7 possible instructions or 7 possible operations that can be encoded in 3 bits so let's just say that that's 3 bits we could expand it if we want later to have other types of operations that are not necessarily encoded in these instructions that can do maybe a little more complex things anyway so what does this mean well carry in is one and carry out is also one we have 32 inputs we have another 32 inputs we have three inputs here and one input over here great so the question becomes how are we going to build this because as far as I know there is no 32 bit ALU chip that I can just throw in and be done with it what we're going to do is we're going to travel back in time and if I had my Lego DeLorean I would do the little Lego DeLorean sweep and sing a little song but since I don't I'll just bring this here so way back in April of 1982 digital equipment corporation came out with the Vax 11 730 mainframe this was a processor that was built totally out of TTL chips here is the central processing unit this is the technical description and I'm just going to skip to the section that describes the ALU so we can see that they programmed their ALU now I should mention that the Vax 11 was a 32 bit machine although interestingly it could take a minimum of one megabyte of memory and only a maximum of five megabytes of memory so the 32 bits actually referred to the data path and also the address path but the Vax 11 had virtual addressing that's what the VA stands for and that meant that the virtual address could be anywhere in that 32 bit address space but again you only really had five megabytes of physical memory okay anyway so this is what the ALU could do it could add it could subtract it could subtract in the other way it could do an or it could do an and it could do an X or well that's actually pretty good because that's what we need with the exception of our less than test it could also complement one of the inputs but we're not doing that so and here's a diagram okay now what you should note is this ALU right over here and what they do is they call this a microprocessor slice so a bit sliced architecture is where you have taken your data path say it's 32 bits and you've split it up into equal portions so in this case they split it up into portions of four bits each so you have eight slices and the logic of each slice is identical and there may be some combining logic that you do you know later on in order to make the final 32 bit result but in general you consider only four bits at a time here is the ALU you can basically ignore everything else but just concentrate on the ALU and we can see that we've got four bits going in and another four bits going in those are our two inputs we have a carry in and a carry out we have a function over here and you can see that they have three bits of function which corresponds to these three bits right over here we have a four bit output and ignore all this stuff a four bit output we have an overflow bit and we have a generate and propagate bit so I'll get to generate and propagate in a moment so what I'm going to do is I'm going to take this 32 bit ALU block and expand it so we now have a four bit ALU and it's going to be repeated eight times now the input to the first one is going to be x three down to zero and y three down to zero this of course is RS one and RS two so you know we could just call this x and y and also the function function we have a carry in and we have a carry out now because the logic is identical so all of these ALU's have to be the same this ALU also has a carry in and a carry out and it takes x from seven down to four and y from seven down to four so in other words it takes the next four bits but it also takes the whole function because it needs to know what to do and so on down the line and the final one is x 31 down to 28 and y 31 down to 28 and the function so all the F's are tied together the source register one is just you know split among all the ALU's and our and source two is also split among all the ALU's now we could just leave it like this and this might be familiar with those of you who are familiar with digital design we could actually just connect the carry out of the previous slice to the carry in of the next slice and just you know carry it through and this would be a ripple carry adder so you would add x and y and carry in and you would output the result here so we'll call this z z and of course this is going to be the destination three to zero and this is the destination seven to four and this goes to the destination bus 31 to 28 and the carry in and carry out just sort of chain together so that way the ALU together basically forms a 32 bit adder also a 32 bit subtractor because a subtractor is nothing but an adder except you've taken the y input and negated it and we could do that negation internally so it would also be the same thing with the logical operations so and or an x or well they don't depend on each other so we could just you know feed the results through and we're done so add and subtract would require this ripple carry and so would comparison because in a comparison we are actually going to end up doing a subtraction so the problem is that if you were to implement this and each of these ALUs has a delay which I'll call delta then from the time that you input everything there's going to be a time delta when you get the result carry out and then you're going to have to wait another time delta from the carry in input to get the carry out output of this ALU and so on and so forth so that by the end you've got what is it eight delta now there's a faster way to do this and that is you don't chain the carry ends you use what's called a carry look ahead unit and what I'll do is I'll just draw it as a large box over here carry look ahead unit and what happens is each ALU generates two signals called propagate and generate propagate and generate propagate and generate and so on propagate and generate so the first thing is propagate and generate are created strictly from the x and y input so in other words they don't rely on the carry in now what is propagate and propagate and generate actually mean if we look at any addition of two numbers x plus y we can say that this will propagate a carry in if x plus y are such that a carry out will also be generated now that can only happen if x plus y is exactly 15 remember x and y are four bit digits if x plus y is less than 15 then it doesn't matter what carry in is because the output can never be greater than 15 which means that the carry out will always be zero so in other words if x plus y is less than or equal to is is less than 15 we will not be propagating carry in otherwise we will be propagating it and a generate means that it doesn't matter what the carry in is whatever x plus y is will definitely generate a carry in of course that's if x plus y is greater than 15 because again it doesn't matter what carry in is you're going to generate a carry out so that's what propagate and generate means now the interesting thing is that carry out is just equal to well if you're going to generate a carry out then that's your carry out or carry in times the propagation bit so in other words if you're propagating a carry then carry out is just going to be carry in and that's all there is to it so from this you can see that we can easily generate this carry out based on x and y and also carry in and you might think well you know no big deal because of course i could generate this carry out from x and y and this carry in but we're going to do this now the propagate and carry bits here again only depend on this x and y not the carry in but to generate the next carry out is going to be equal to g well i'll call the second stage right and then this carry out i'll just call that carry one plus the second stage p2 now if we expand this carry out it's just g2 plus and then g plus c in p times p2 and if we expand that out the carry out is going to be g2 plus gp2 plus c in pp2 and if we go back to our block diagram that means that this carry out depends only on this c in these two propagate and generate bits and these two propagate and generate bits but these two bits only depend on x and y so there's no chaining going on right now there really isn't any advantage between a carry look ahead unit and ripple carry because in order to get to this output here i would have two delta in a ripple carry and in order to get this carry out let's see we would have one delta generating the propagate and carry bits propagate and generate bits one delta over here parallel so overall we have one delta over here and then this would have another delay of delta so that's still two delta over here however that's it all of the outputs of the carry look ahead unit all the way down to the very last carry out will only have this single delta delay plus one delta delay for for these so in other words the very last carry out is generated after two delta and of course these sums are generated after well let's see we generate the propagate and generate bits and you know the the sum although we're not going to pay attention to that yet in delta time then we get another delta delay for all the carry inputs to be generated then we get a third delta to generate the actual sums so each of these is three delta so instead of eight delta we now have three delta so in other words this is a faster way of generating a sum or a subtraction using a carry look ahead unit so let's go back to the vax architecture and in fact that's what they've done in the vax so they've split the alu into eight four bit slices and we have a carry in to the alu and the carry out from the alu they also had intermediate ones because you could have nibbles you could have bytes you could have half words and you could have whole words so they had separate carry out we don't of course need that and they have what they call a carry skipper unit which is a carry look ahead unit they actually had two but we are going to have one so here's our carry look ahead unit and we can see that from each of the alu's we will have two inputs so that's already 16 inputs plus the carry in so that's 17 inputs and how many outputs do we have well we have a carry out what is it eight carry out units eight carry outs so we have eight outputs so we would have some circuit that has 17 inputs and eight outputs and that would just generate all of the carries all at once so that's what our alu is going to look like now we have to talk about how we're going to implement this four bit alu and how we're going to implement this carry look ahead unit now there is a four bit alu chip it's the 74181 and there is a carry look ahead chip 74182 there are a few disadvantages with this though this is a TTL chip and it is only available in five volt TTL same thing with the 74182 second of all the 74181 has something like I think a 500 nanosecond propagation delay or a 400 nanosecond propagation delay and that would really slow down our risk processor it's slow enough because of course it's not on an FPGA I'm building a risk five processor not on an FPGA and there is no as far as I can tell four bit alu in 3.3 volt logic same thing with the carry look ahead unit so what we could do is we could just you know crack open the four bit alu and you know stick down the logical gates and be done with it but that's really complicated and I have a one chip solution so first of all how many inputs and outputs do each of these alus have well there's four here four here that's eight three here and an additional one over here so that's a total of 12 so 12 inputs each and how many outputs well there's four five six seven and I'm also going to add another one and I'm going to call that the less than flag less than and I'll show how we use that later but in any case each of these alus has 12 inputs and eight outputs well well that can easily fit in a 4k by 8 ram or rom but rams are actually faster and in terms of the 17 so this would be 128k by 8 ram we can easily do that these are pretty cheap pretty cheap here is for example a 32k by 8 ram this is a dollar and it's 10 nanoseconds so that's pretty cool here's another one this is a 64k by 16 bit ram that's also about a dollar and it's also 10 nanoseconds so these rams are cheap and we would need eight of these and one of these and even though this is 128k okay let's suppose this is two dollars right so we're talking about a total of 10 dollars for a full 32 bit alu that has a delay of 30 nanoseconds that's pretty good so that's what i'm going to do and the way to do that is you simply write a python simply i say but you write a python program which basically goes through all of the inputs computes what the outputs are supposed to be and then generates a file and then you load that into well what do you load it into you can't load it into ram so this brings me to bootstrapping now bootstrapping is something that they did with the vax the vax was a micro instruction architecture which basically meant that every instruction was a bunch of micro instructions and that definition was actually loaded into the vax when it booted up so that was the bootstrapping mechanism so here we have our 4 bit alu which is actually 4k by 8 so we have eight inputs and we have 12 8 outputs and 12 inputs well what we really want to do is have a rom that's 4k by 8 sitting up here and the data lines get connected to here probably through a buffer or an output enable and we also have the address lines connected to the inputs of the alu also by a tri-state buffer so that means that we could basically you know put the rom in do whatever it needs to and then take the rom out and then we have basically just you know uh i'll call it a loader really all it is is a counter and it controls also these output enables and the right enable pin of each ram so the idea is that when we boot up the loader is enabled the loader sets all of these tri-state buffers and it sets the right enable and it basically reads out the contents of this rom and writes it to all of these rams and of course for the alus you can do that simultaneously so we're going to do this and we're going to do this and so on down to the last alu now the reason that i have buffers in between each alu is that during operation of course you want all of the outputs to be separated however because the logic is identical you want them to all be loaded simultaneously so when you load them simultaneously all of these buffers are on and this rom gets to write to all of these not that fast each of these being 10 nanoseconds you can't actually read the rom out with a cycle time of 10 nanoseconds and besides there's the delay of all these buffers so you would be doing it a little slowly so you know the bootstrap wouldn't be that fast and during operation all of these are turned off so all of these rams are basically separate and you would do the same thing for the carry look ahead unit which is 128 k by 8 and you would have a 128 k by 8 rom over here and you would have the data lines here and the address lines here and these address lines would be connected to these alu's and these data lines would also be connected to the carry ins of the alu's and that's pretty much all there is to it so that's my plan in terms of these rams i'm going to be using ee prams these are electrically erasable prams programmable rams also they are parallel because a lot of e squared prams today are actually serial because they're meant to for example configure an fpga i'm building a risk five processor not on an fpga if i were to use serial rams which are actually cheaper than parallel rams then i would have to put in some shift registers over here and i would just slow the thing down immensely so instead i'm just going to use parallel ones and that's really the plan i think that the next video is going to be about actually building one of these probably building the alu and building two of them maybe in their rom versions and then you know just breadboarding it and making sure that it does what i think it does and then of course building the cards for the alu and carry look ahead unit and that will be the arithmetic and logical operations they're still shifting to be done and that will certainly be the subject of another video as well so until then i hope you enjoyed that and i'll see you on the next video bye i'm building a risk five processor not on an fpga