Programming the 4-Bit CPU | Big Mess o' Wires (2024)

Instructions for the Nibbler 4-bit CPU come in two types: immediate and addressed. Both types begin with 4 bits of instruction opcode, to identify the specific instruction. Immediate instructions like ADDI include a 4-bit operand value embedded in the instruction, and so are 8 bits in total size. Addressed instructions like JMP include a 12-bit address in data space (RAM) or program space (ROM), and are 16 bits in total size. The instruction encoding is very simple:

The four instruction opcode bits i[0..3] are combined with the ALU carry flag C and equal flag E, as well as the Phase bit, in order to form a 7-bit address for the two microcode ROMs. The ROMs’ outputs form the 16 control signals needed to orchestrate the behavior of all the other chips. The contents of the microcode ROMs are shown in the table below.

Microcode

The first line of the table shows that phase 0 is the same for all instructions. That’s good, because at this point the instruction hasn’t been fetched from memory yet, so the CPU doesn’t even know what instruction it’s executing! Nothing interesting happens as far as the microcode is concerned, while the Fetch register is loaded with the program ROM byte, and the program counter is advanced to the next byte.

Phase 1 is where all the interesting work happens. ADDI is a good example of a typical instruction. Its instruction opcode is 1010 binary, or $A hex. For this instruction, the same control signals will be asserted regardless of the C or E flags. /oeOprnd will be 0, enabling the bus driver to drive the immediate operand value onto the data bus, where it connects to one of the ALU inputs. The other ALU input is always connected to the accumulator register. The /carryIn, M, and S[0..3] control signals will be set to put the ALU into addition mode, with no carry-in. /loadA will be 0, so the ALU result will be stored to the accumulator. /loadFlags will also be 0, so the carry and equal flags will be updated with the results from the addition.

JE (jump if equal) is another good example. Its instruction opcode is 1000 binary, or $8 hex. If the E flag is 1, then incPC will be 0 (PC will not be incremented) and /loadPC will be 0 (PC will be loaded with the new address). So if E is 1, the jump will be taken. If the E flag is 0, then incPC will be 1 (PC will be incremented) and /loadPC will be 1 (PC will not be loaded). So if the E flag is 1, the CPU just skips over the destination address byte and advances to the next instruction, without taking the jump. Jump instructions don’t involve the ALU, so the ALU control signals are unimportant and are shown in gray.

Notice that some gray control signals are labeled “dc” for don’t care, but others have a 0 or 1 value. This is because I’ve chosen specific values for those dc’s in order to accomplish another goal. Look carefully, and you’ll see that /carryIn is everywhere equal to i3, and M is everywhere equal to i2. That means /carryIn and M don’t really need to be in the microcode at all: anything that needs them can just connect to i3 and i2 instead. This idea could probably be extended to more bits, but I stopped at two. If you look at the circuit schematic I posted previously, you’ll see that I’m not currently taking advantage of this, but it could be helpful later if I find a need to add another control signal or two.

Instruction Set

With only 4 bits for the instruction opcode, there’s a maximum of 16 different instructions, so the choice of which instructions to implement is challenging. It would be possible to have more than 16 instructions, but only by shrinking the address size or immediate operand size. Since this is a 4-bit CPU, a 4-bit instruction size just makes sense.

There are five types of jump instructions:

JMP – unconditional jump
JC – jump if carry
JNC – jump if no carry
JE – jump if equal
JNE – jump if not equal

This is a good set of jump instructions for convenient programming. Although they’re convenient, the negative jumps JNC and JNE aren’t absolutely necessary, because you can always rewrite them with a positive jump and an unconditional jump:

JNC noCarrycarry:do stuffnoCarry:do other stuff

becomes

JC carryJMP noCarrycarry:do stuffnoCarry:do other stuff

LD and ST load a nibble from data space (RAM) to the accumulator, or store a nibble from the accumulator to data space.

LIT loads a literal immediate nibble value to the accumulator. It might also have been named LDI.

OUT stores a nibble from the accumulator to one of the eight output ports, while IN loads a nibble from one of the eight input ports to the accumulator. These instructions are intended to be used for I/O.

ADDI adds a literal immediate nibble value to the accumulator, while ADDM adds a nibble in data space to the accumulator. In either case, the carry-in for the addition is 0, and the carry-out and equal flags are set by the results of the addition.

CMPI compares a literal immediate nibble value to the accumulator (subtracts the value from the accumulator), while CMPM compares a nibble in data space to the accumulator. The accumulator value is not modified, but the carry-out and equal flags are set by the results of the subtraction. Because the ALU is performing a subtraction, the carry-in is always 1 in order to make the math work out correctly.

NORI performs a NOR with an immediate value, NORM performs a NOR with a data space value. The carry-out and equal flags are set by the results of the NOR.

Synthetic Instructions

What about the instructions that aren’t here, like subtraction, AND, OR, NOT? Happily, all of these can be synthesized from the existing instructions. The assembler I’m working on includes some of these as macros, so they can be used as if they were built-in instructions.

NOT: NORI #0
ORI x: NORI x; NORI #0
ORM x: NORM x; NORI #0
ANDI x: NORI #0; NORI ~x
SUBI: ADDI (~x+1)

The only common instructions that can’t be trivially synthesized are ANDM and SUBM, because they require the use of a temporary register to hold the complement of the data space value. A dedicated location in data space could be used for this purpose, since Nibbler only has one register.

In many cases, with some thought you can eliminate the need for these synthetic instructions entirely. Take the common case of checking a specific bit in an input. On a CPU with an ANDI instruction, this might look like:

#define BIT2 $4 ; $4 = 0100IN #0 ; load accumulator with nibble from port 0ANDI BIT2JNE bit2is1

With Nibbler, you might think to rewrite this using the synthetic ANDI mentioned above:

#define NOT_BIT2 $B ; $B = 1011IN #0 ; load accumulator with nibble from port 0NORI #0NORI NOT_BIT2JNE bit2is1

But this can be shortened by simplifying the two negatives NORI and JNE into a single positive JE:

#define NOT_BIT2 $B ; $B = 1011IN #0 ; load accumulator with nibble from port 0NORI NOT_BIT2JE bit2is1

More to Come

Next time I’ll post more about the software tools I’ve written for simulating Nibbler and assembling programs. Until then, questions and comments are always welcome!

Read 14 comments and join the conversation