 Greetings, microprocessing unit friends! I'm back with another apparently pointless project. Well, it's a learning opportunity. I can't learn something unless I work on a project involving what I want to learn, so here we are, hopefully learning something together. We're going to be learning about a new HDL. That just means that it's a new hardware description language that isn't VHDL or Veralog. A new HDL project typically starts when someone gets sick of having to remember how VHDL or Veralog works, or how hard or ugly it is to work with, and why can't it be like a modern language I already work with a lot, like Python or Scala or Haskell, and then they go and invent a new HDL. Anyway, in my head, that's how it happens. Now, check this out. This is my Space Invaders pinball machine. It's the pinball machine that I grew up playing, and I really enjoyed it. This was back in high school. The machine was made by Balli in 1980, and you can see the sort of Infinity Mirror effect that they have going on there. You can see the Alien, and it kind of looks like an HR Giger Alien, and that is not a coincidence, because the movie Alien came out in 1979, just one year before this pinball machine came out. The Alien from Alien was designed by HR Giger, who had this characteristic sort of biological slash machine sort of interconnected creature sort of vibe going on. And Balli actually copied it. Maybe not the entire thing, but they copied the style. And in fact, Giger ended up suing Balli for this design. Balli nevertheless produced about 11,000 of these machines, and they did end up settling with Fox, the movie company that came out with Alien, and Fox received royalties for every one of these pinball machines that was made. And in fact, they ended up getting three of these machines in order to promote the sequel to Alien, Aliens. And here's a bit of the gameplay going on. Back in high school, when I played this machine, I actually managed to turn the score over. The score is six digits, so you can go up to a million, and then it turns over. I actually managed to turn it over a hundred times, because I had figured out exactly what I needed to do in order to make all of the bonuses multiplied together. So the credit score actually ended up being 99. And some of my classmates actually came in, and they couldn't believe that I had actually turned it over a hundred times, and they thought that I just stuffed quarters in it, which of course would have been $25 at the time, because it was 25 cents per play. Let's take a look inside, and that's what it looks like with the back glass taken off. And this is the inside of the back box. You can see on the upper left that is the CPU. It's not actually the original CPU, because the original had a battery backup on it, and that battery always leaked, and most of those boards are pretty much dead now. This is the power supply board over here. This is the sound board. This over here, I believe, is for the solenoids, and this over here is for the lamps. It's the lamp driver. So this is actually a replacement board. It's actually a replacement board made by Altec Systems, and I have one fresh out of the box right here. It actually contains a 6800 microprocessor, which is the original processor used on these machines. In fact, there were many of these different types of pinball machines made by Balli using the same microprocessor board, so the only thing that actually changed was the firmware. Now, this particular board has the firmware for all of those games, and you set up the game that you're configuring the board for using these dip switches down here. And these are for game options such as the number of coins per credit, the number of balls per play, three or five, whether you're on free play or not, that sort of thing. Now, the 6800 CPU started manufacture by Motorola in 1974, but I couldn't find any Motorola data books past 1988 with a reference to the processor, and I've seen 6802 processors with a manufactured date of 1989, and the 6802 is just a 6800 with an internal two-phase clock generator and some RAM. By 1990, it seems that Motorola's 8-bit CPU days were gone, since a backgrounder produced by the research firm Gartner DataQuest was all about the 8-bit 6805 microcontroller, which was a completely different incompatible animal, and their 32-bit 68000 family microprocessors. However, you can still buy 6800 processors from, for example, Jameco, a reputable mail-order electronic supplier in California, and from eBay from perhaps less reputable sellers where you can get 6800s with date codes from 2012. And if you're willing to spend $360 a piece, Rochester Electronics in New York remanufactures 6802s from the original dimasks, most likely to maintain obsolete military equipment. Nevertheless, there may come a day when these processors are no longer available, and then where will we be? In the dark ages, that's where! So the idea here is to implement a 6800 on an FPGA using N-mygen, and then plug it into the pinball machine and see if it works. I've chosen a Lattice Ice-40 FPGA because they're on the less expensive side, around $4 to $6 for the smaller one, but also because there is a really good open-source toolchain for it. This is a development board for the largest Ice-40, which has 8,000 lookup tables or LUTs. Generally, the size of an implementation is measured in LUTs for a given FPGA, so we can size the chip to the implementation. One problem, though, is that most modern FPGAs will go only up to 3.3 volts, and the 6800 is a 5-volt part, which normally wouldn't be a problem, except that the Ice-40's inputs are not 5-volt tolerant. So I want to try some of these high-speed bi-directional logic-level shifters instead of fooling around with resistive voltage dividers. I'm a bit suspicious about voltage dividers because of their effect on signal rise and fall time, and drive currents, and that sort of thing. So we'll see how that turns out. Well, let's get started. Now, we see from the datasheet that inside the 6800 CPU we have two 8-bit accumulators, A and B, a 16-bit index register, X, a 16-bit stack pointer, SP, which allows the processor to maintain a stack, a 16-bit program counter, PC, which tells us where in memory we're executing instructions from, and six single-bit flags. Let's look at the signals for the CPU. So to simplify things, we're only going to pay attention to the signals important to executing instructions one after the other, and not things like peripheral signals or interrupt signals or status signals. We can see that there are two clocks called Phase 1 and Phase 2. Now, here's a little diagram of them showing that they do have a relationship to each other. They're not completely random, and we'll get into exactly what each phase does a little later. We also have a 16-bit address bus that the processor uses to address locations in memory or other peripherals, and we have a bidirectional 8-bit data bus, which is used to read data from memory or other peripherals, or to write data to memory or other peripherals. And that's really all we need to get started. The rest of the signals either help to coordinate what happens outside the core instruction executor of the CPU, or put the CPU into special states, which we won't deal with now for simplicity. So a bunch of registers, two clock signals, an address bus, and a data bus. That is our perfectly round, massless, frictionless CPU. Okay, here's a diagram of the CPU doing a read from memory, and it's our first clue to what the CPU does with the clocks. So here are the two clock phases, 1 and 2. Notice that in this diagram the edges don't coincide. In fact, the negative edge of phase 2 can go right up to the positive edge of phase 1, but it is not allowed to go past. Anyway, here we see the address lines and data lines. We can see that to do a read. First, the CPU changes its address lines to the address it wants to read. The diagram shows that this change is referenced to the positive edge of phase 1. TAD here is the time from the positive edge to the address lines becoming stable. In the 6800, this time is typically 220 nanoseconds, but can be as high as 300 nanoseconds. For our FPGA implementation, of course, this time will be on the order of nanoseconds. So we know that the address lines change as a result of the positive edge of phase 1. Great. Next, the memory responds with the data at that address. This is shown by TACC, the access time of the memory. Then we have TDSR. This is the setup time for the data lines, and it means the amount of time the data lines must remain stable for the CPU to work. For the real thing, this is a minimum of 100 nanoseconds, but our FPGA has setup times of around half a nanosecond in the negative, which just represents the delay from the pins changing to the internal flip-flop inputs. Anyway, notice that the setup time is referenced to the negative edge of phase 2. This means that the CPU latches the data in on this edge. TH here is the whole time for the data lines, which is the amount of time the data lines must remain stable in order for the data to be latched in correctly. Again, for the real thing, this is a minimum of 10 nanoseconds, but our FPGA has a whole time of around 2.5 nanoseconds, which is plenty. Okay, so we have two important pieces of information here. One is that the CPU latches its read address onto the address lines on the positive edge of phase 1, and on the next negative edge of phase 2, the CPU latches the data being read. Okay, let's take a look at writes. We see the same thing for the address where the address lines get set on the positive edge of phase 1. However, it seems that the data to write doesn't come out of the CPU until the positive edge of phase 2? Well, no. There's a signal controlled by the memory, or the peripheral being written to, called DBE, or Data Bus Enable. This simply tells the CPU when it can release the data it wants to write onto the data bus. The actual data in the CPU is ready way back here at the positive edge of phase 1. So, output signals are changed by the CPU on the positive edge of phase 1, and signals are read by the CPU on the negative edge of phase 2. Good enough for the core CPU functionality that we want. Now, let's look at executing an instruction. We know the CPU will do a read from memory for the instruction. This is the fetch cycle. All instructions are one byte long, so once we have the first byte, our destiny is set. The CPU can latch the data into an instruction register, and that tells the CPU what it should be doing on each cycle, that is, from positive edge to positive edge of phase 1. In fact, the data sheet has a handy table of what the CPU does for each instruction. Notice that the minimum number of cycles for an instruction is 2. So, let's look at this first instruction listed in the data sheet, ADC, or add with carry. The address mode is immediate, which means the byte to be added, the operand, is in the next location in memory. This instruction adds that operand to one of the accumulators, and stores the result in the same accumulator. Let's see how that would work. We start by latching the contents of the program counter register onto the address lines. We'll latch the instruction to execute near or at the end of the cycle, but remember that the data is available on the data bus even before then. This means that the moment the instruction stabilizes, we know what we must do on the next positive edge of phase 1. One thing that we always do is add 1 to the program counter and use that as the next address to read. So, we do that during cycle 1. Now, on the next positive edge of phase 1, we transfer the result to the address lines. We can also transfer the result to the program counter. Okay, during cycle 2, we read the operand. Again, we know we are doing an add with carry, we know the source accumulator, and so we can immediately add whatever is on the data lines to the accumulator. Whatever this result is, it stabilizes when the data lines stabilize, which means that on the positive edge of phase 1, we can transfer the result to the accumulator. As for the address, we can transfer the program counter to the address lines even though we didn't change it, and now we can execute the next instruction. Okay, so, here's what I want to build. We have a bunch of 8-bit registers. The 16-bit registers are also divided into two because 16-bit addition will take place using an 8-bit arithmetic logic unit. I'm also including a temporary 16-bit register to hold address calculations and a temporary 8-bit registers for any other temporary data. We also have a transparent latch for data in and a register for data out. You typically don't have bi-directional or tristate logic in the middle of your FPGA logic. That's left for the pin Ios to handle, so that's why we have two registers for the data lines, one for data in and one for data out. We also have a transparent latch to hold the instruction we're executing, and we have an address register for the address lines. I mentioned an arithmetic logic unit, or ALU, so here it is. It has two 8-bit inputs, and it also includes the flags. To feed the ALU, I'm going to have two 8-bit data buses. Any of the 8-bit registers can be connected to any bus. The function that the ALU performs depends on an input that will come from the instruction decoder and sequencer, and its output can go to any of the registers to be latched on the positive edge of Phase 1. We also need a way to add one to the program counter and store it back in one cycle, so we have a 16-bit increment decrement unit. The decrement part will be useful for the stack pointer. To feed this unit, I have a 16-bit bus, and I'm not using the two 8-bit buses for that because I will need to use the incrementer while doing other things with the ALU, so those buses will be occupied. The output of the ink deck unit can go to any of the 16-bit registers and also to the output address latch, also to be latched on the positive edge of Phase 1. And the function that the unit performs is also controlled by an input that will come from the instruction decoder and sequencer. Finally, we have a cycle counter that works with the instruction decoder to control all of these data paths and functions. Let's see how this would work with the ADC instruction we looked at earlier. We're going to assume that just before the first cycle, we've routed the program counter over the 16-bit bus to the address register. Now, when the positive edge of Phase 1 happens, the address register gets loaded with the program counter, and that gets output to the address lines. We're also ready to receive the instruction from memory, so we also enable transparency on the instruction register. This way, whatever appears on the data lines goes right into the decoder. At some point, the ADC opcode comes back from memory. Since the instruction register is in transparent mode, the opcode gets decoded for cycle 1. The decoder says, OK, for this cycle, for ADC, we want to read the next memory location for the operand, so we need to route the program counter through the ink deck unit, set up the unit for increment, and route the output to the address register, and to the PC. This gets us ready to output the next address, which is the address of the operand. Also, we set up the cycle counter to increment. Note that what we've done is perform all the calculations we want, but we have not yet stored anything. Everything needs some time to settle to their final values. At the end of the cycle, we'll get the positive edge of Phase 1, which clocks all the final values into their final destinations. Now, the positive edge of Phase 1 comes in, which commits everything we've done. So we latch the new program counter into the address register and back into the program counter, and we load the next cycle number 2 into the cycle counter. We also turn the data in register transparent since we're about to receive the operand. OK, let's set up everything we want to happen during this cycle for committing at the end of the cycle. Since we want to do an add to, say, accumulator A, we route A to the first 8-bit bus, the data in register to the second 8-bit bus, tell the ALU to do an add with carry, and route the output back to accumulator A. We'll also do the same thing we did before to the program counter. Finally, we'll set up the cycle counter to reset. And finally, the positive edge of Phase 1 comes in. All the data that we set up gets latched, so A gets its new value, the address gets the next address, the program counter gets the next address, we turn the instruction register transparent again, and we're ready to proceed with the next instruction. And that's really all there is to it. I know that I say that facetiously, but it really is just a matter of on a given cycle for a given instruction, set up the routing for the operations that you want to do. Then, on the positive edge of Phase 1, at the end of the cycle, latch everything in, and start again. So those are the basics of what I'm going to be programming. Now, I'm not going to be doing any programming in this video. I think it's sort of gone on long enough, and if I actually start doing any program, the video is going to just last forever, so we'll wait until Part 2 to see that. But in the meantime, if you do want to follow along, I have written sort of a tutorial for nMyGen. It's not quite a tutorial. It's really more a series of comprehensive notes that I wrote to myself, which explain how to install it, the various concepts that it uses, that sort of thing. So that will probably be useful to me to get started and to you if you actually want to install nMyGen and all the things it requires so you can actually start playing with it for yourself. It is fairly easy to use if you know Python. So until then, I think we'll draw this video to a close. Check down in the description for a link to my tutorial slash comprehensive notes, and I will see you on the next video.