Programming a 2000 Year Old Sandstone Computer

[The eleventh in a series of posts on the evolution of TransForth]

If you think coding with punch cards is old school, you should try using sandstone tablets! Legend has it that members of the Cult of the Bound Variable designed and may have even constructed a city-sized machine powered by falling sand more than two millennia ago; dwarfing our ENIAC of the last century and beating us to an understanding of computation by thousands of years.

What the heck am I talking about?! Learn more here. Really, go check it out right now! But try not to get so sucked into unlocking the Cult’s mysteries that you spend days without sleep and forget to come back here.

So far the TransForth project has taken a top-down approach; starting with high-level F# and stripping it away until we have a VM and a meta-circular interpreter. Now I want to completely switch gears and build things bottom-up, starting with raw machine code. No kidding. We’re going to build our own assembler, use it to write a Forth inner interpreter, manually pack a dictionary to bootstrap an outer interpreter, and finally again reach the nirvana of meta-circularity. Should be fun!

Bring Your Own VM

The Universal Machine is an extremely simple 32-bit RISC architecture, yet powerful enough to run a UNIX-like operating system. Emulating it on modern hardware (still made of sand by the way) is pretty straightforward. Luke Hoban has posted a very nice 40-line implementation in F# and has a VM running in Silverlight. It’s also been done in tons of other languages. The package from CMU includes a SML compiler (um\humlock) and an assembler (um\whistleock) for it and VMs in Awk, C, C#, Haskell, Java, OCaml, Perl, PostScript, Python, Scheme, SML, Tweif, x86 assembly and in UM machine code itself. I’ve also seen it in JavaScript and Ruby.

I’ve been using Luke’s F# version for my exploration. I won’t repost his code here. You can go grab it from his blog. I did have to make a one-line fix to bring it up to date with current F#:

<! let mem = ResizeArray.create 1 p0AsUInts

!> let mem = new ResizeArray<uint32[]>([|p0AsUInts|])

Bring Your Own Assembler

Running the codex and uncovering all the astoundingly wonderful mysteries within is worth a weekend of your life I assure you, but perhaps more fun will be to write our own codex! The eventual goal is a codex image that, when loaded, is a full Forth.

What we’re going to do is build an assembler for ourselves in TransForth, but targeting the new machine. We’ll assemble in memory and spit out to a file from which the Universal Machine can boot. First, let’s add a couple of utility words to TransForth. We’re going to need left/right bit shifts and also add a word to dump a range of memory to a file with the correct endianness for the UM:

let um () =

use file = File.Open("", FileMode.Create)

for i in pop () .. pop () do

let m = mem.[i]

m >>> 24 |> byte |> file.WriteByte

m >>> 16 |> byte |> file.WriteByte

m >>> 8 |> byte |> file.WriteByte

m |> byte |> file.WriteByte and execute () =

match instruction with

| 35 -> dyadic (<<<)

| 36 -> dyadic (>>>)

| 37 -> um ()

primitive "LSH" LSH

primitive "RSH" RSH

primitive "UM" UM

That’s all the F# we’re going to need. Let’s start putting together our little assembler. What we’ll be doing is appending UM machine code in memory and then dumping it out.

\ UM-32 Assembler : ORIGIN 32768 ;


ORIGIN target !

: m, target @ !target ++! ;

: msavetarget @ 1- ORIGIN UM ;

So we can use m, to append as we go and msave to spit out the final file.

Understanding the UM-32

Between Luke’s F# code and the (purposely somewhat obfuscated) spec, you may have already pieced together how the machine works for yourself.

It’s a 32-bit RISC machine with only 14 instructions, 8 registers, dynamically allocated memory and single byte I/O. Upon booting, it loads memory block 0 from a “codex” file and begins executing at address 0.

Instructions may refer to up to three registers (e.g. the addition instruction, a = b + c). You can see in the spec how they’re packed into 32-bit “sandstone platters”.

For example, 00110000000000000000000001010011 adds (0011) registers 3 (011) and 2 (010), placing the sum in register 1 (001). We can build these by shifting and ORing a set of register numbers (a, b, and c) and an instruction:

: instruction,( cbai-m )22 LSH OR3 LSH OR3 LSH ORm, ;

I’m going to start using lowercase names for UM-related words and the trailing comma in the name signifies words that emit instructions to the target area.

The first thirteen instructions follow this format. We can make mnemonic words for these:

: cmove,( abc-m )0 instruction, ;\c = b if a

: fetch,( abc-m )1 instruction, ;\c = b[a]

: store,( abc-m )2 instruction, ;\c[b] = a

: add,( abc-m )3 instruction, ;\c = b + a

: mult,( abc-m )4 instruction, ;\c = b * a

: div,( abc-m )5 instruction, ;\c = b / a

: nand,( abc-m )6 instruction, ;\c = b ~& a

: halt,(-m )0 0 07 instruction, ;

: alloc,(ab-m )08 instruction, ;\new(b) -> a

: free,(a-m )0 09 instruction, ;

: echo,(a-m )0 010 instruction, ;

: key,(a-m )0 011 instruction, ;

: loadjump,(ab-m )012 instruction, ;\load(b), jump(a)

The only operations that index through registers are load and store. All others operate on registers only. Notice however that not all of the instructions make use of all the registers. For unused registers we can embed 0s.

The 14th instruction is for loading literal values; packing a 25-bit value with a register number. For example, 00110010000000000000000000101010 places 42 (101010) into register 1 (001). We can construct like so:

: literal,( va -- m )13 3 LSH OR25 LSH ORm, ;

An interesting aspect of the UM is the dynamic memory system with alloc, and free, . These dynamically manipulate the “array of platters.” The loadjump, instruction seems to be meant for loading programs from one array into the zero array and jumping to a particular instruction within. In fact, this instruction is used extremely frequently in the codex to do regular jumps within the zero array. The VM doesn’t clone anything when copying from zero to zero; only updates the “finger.” As the spec says, the zero “shall be the most sublime choice for loading, and shall be handled with the utmost velocity.” We can make a macro for direct jumps within the zero array:

: jump,( a-m )0 loadjump, ;\jump(a)

Hello World!

Just to make sure everything is functioning, let’s assemble our first UM-32 program. To print “Hello World!” to the console, we’ll have to load a literal, for each character into a register and echo, it. For example, we can print “Hi” by loading and echoing the literal ASCII values (using registers 0 and 1):

720 literal,\'H'

105 1 literal,\'i'

0 echo,

1 echo,


Hosting our assembler in Forth means that we can use its full power at assembly-time to simplify our lives. How about a helper for dealing with ASCII:


This works by parsing the next word and taking the first character (presumably a single character word).

Another advantage of hosting the assembler in Forth is that we can use it as a sort of “macro assembler”; creating words that simplify the generation of common instruction sequences. For example, loading and echoing a literal.

: output,( a-m )0 literal,0 echo, ;

Now we can simply push a value (perhaps an ASCII character with chr) and echo it with a single word:

chr H output,

chr i output,


Or we can further factor to:

: cout, ( '-m )chr output, ;


\ Hello World!

cout, H

cout, e

cout, l

cout, l

cout, o

SP output,

cout, W

cout, o

cout, r

cout, l

cout, d

cout, !



Go ahead and run this in TransForth and you end up with a boot image file for the Universal Machine that prints “Hello World!”. Nice, we have the beginnings of a working assembler!

Missing Instructions

The Universal Machine seems to take RISC to a pretty ridiculous extreme. There is no conditional jump, no subtraction, etc. I’m encouraged by the fact that UMIX runs on this thing. Certainly these limitations can be overcome.

Let’s add macros for a few instructions that would be expected. To start with, I think it will be helpful to assign purposes to a few registers. Having zero constant value always handy will simplify things (other constants such as 1 and –1 would be nice too, but I’m afraid we’ll run out of registers when we try to build the inner interpreter later). Let’s just use the 0 register and since all registers are initialized to zero we don’t need to do anything other than to remember not to touch register 0. We’ll define a convent name for it:

: z 0 ; \ Zero constant register
: t 2 ; \ Temp register

There is a conditional move, between registers but no unconditional move. We can build this by embedding our new constant:

: move,(ab-m )1 t literal, t -ROT cmove, ;\b = a

There’s also no increment instruction. We can build this again by embedding our new constant in an add, .

: inc,(a-m )DUP 1 t literal, t SWAP add, ;\a++

Other notable things missing include bit shifting and bitwise operations other than nand. Everything can be defined in terms of nand just as we did some time ago, but I’d rather do this in Forth-world later. For now, we’ll just need a not, instruction:

: not,(ab-m )SWAP DUP ROT nand, ;\b = ~a

Subtraction doesn’t exist either. In fact, the UM-32 works only with unsigned numbers. After brushing up on our understanding of 2s compliment though we realize that we can represent negative numbers in this unsigned world by flipping the bits and incrementing. We can then define subtraction in terms of addition (with carry overflow) of negatives:

: neg, (ab-m )DUP -ROT not, inc, ;\b = -a

: sub, ( abc-m )2 PICK DUP neg, -ROT add, ;\c = b - a

Decrement can be defined in terms of sub, of course. It will be helpful here and elsewhere to have one of the registers set aside as a temporary:

: t 2 ;\Temp register

: dec, ( a-m )1 t literal, t SWAP DUP sub, ;\a--

Notice that this expands to four instructions at assembly-time. For example, x dec, expands to:

1 t literal,\ t = 1

t t t nand,

1 t t add,\ t = -1

t x x add,\ x += -1

If we wanted to use up another register to maintain a constant -1 value then we could reduce dec, to a single instruction. I wanted to do this but I believe we’ll later run out of registers when we start building the Forth inner interpreter. Still, if you’d like (assuming say, n contains -1):

: dec,( a-m )DUP n SWAP add, ;\a--

This way the same x dec, for example expands to simply n x x add, .

Thinking more about this, we may realize that there’s no need to treat decrementing as generally as subtracting 1. Rather than four instructions, it can be done in three! This is the kind of micro-optimizations that we can have fun with when working close to the hardware (well, the VM). Negative 1 is just all bits on in two’s compliment which we can produce in two instructions, then add that:

0 t literal,\ t = 0

t t t nand, \ t = -1

t x x add,\ x += -1


The jump, macro jumps to an address in a given register. If the address is known at assembly-time then it needs to be loaded into a register first. Let’s make a simple macro that jumps to an address from the assembly-time stack:

: branch,( a-m )t literal, t jump, ;\jump to a

We still need a conditional branch with which to build control flow primitives. Having only conditional move at our disposal, we can do this by essentially unconditionally branching, but either to a given address or just to the next address:

: 0branch,( ab-m )t literal, address 1+ t cmove, t jump, ;\if a = 0, jump to b

That reminds me! We don’t have a no-op instruction. We can build this effectively as “branch never”. An old assembler I used to use for the 6809E actually did have a BNE instruction as a silly no-op. We could do it as a 0 0 0 add, or something else but whatever:

: noop,( -m )address 1+ t literal, t jump, ;\skip

Unconditional branch, and conditional 0branch, can be used to create all types of control flow as we built back when we did IF, ELSE and THEN and later when we built BEGIN/UNTIL, WHILE/REPEAT, LEAVE, DO/LOOP.


So far our assembler give us mnemonics for the UM instruction set, it gives us extremely powerful macros in the form of full Forth at assembly-time, and we’ve built enough up to really get started coding for this machine. The one thing missing that assemblers provide is labeling. All of our branching is to addresses on the assembly-time stack but how do they get there?

We can calculate the current address relative to the origin of the image very easily:

: addresstarget @ ORIGIN - ;

Given this, we can mark positions in the code by pushing the current address to the stack just before emitting the next instruction. We can then proceed with the code; leaving the address for much later use by a branch, or 0branch, . For example infinite looping words could easily be defined:

: loopaddress ;

: again,branch, ;

If you’d rather, you can store the address in a variable: VARIABLE my-label address my-label !

This all works great for branches back to previous addresses. But what about branches that make forward references to future addresses? We can handle this similar to a single-pass assembler by tracking the forward branch points and later patching the address once it’s known:

: forwardtarget @ 0 ;\leave target address on stack for later patching

: tohereDUP@ address ORSWAP ! ;\patch previous forward branch,

Remember that the branch, and 0branch, macros begin with a t literal, representing the address. The phrase forward branch, for example will push the current target address to the stack (notice, not the UM address relative to the image origin, but the actual location of the literal, instruction being assembled) and then will emit a branch (temporarily to zero). The zero is later patched to the real address with tohere which goes back and ORs in the current address at this point.

That’s it; single-pass label resolution for both forward and backward references.


In 15 lines of F# and 40 lines of Forth we’ve built a pretty much complete single-pass macro assembler for the UM-32. It is very Forth-like; using the stack for anonymous labels (or named variables if you like) and using the full facilities of Forth for assembly-time macros.

Have fun with it!

More to come...

We’ll be using it to build a Forth inner interpreter so that we can move away from this low-level assembly. However, we will always have the ability to define new primitive Forth words in terms of raw assembly. This ability is one of the things that give Forth amazing breadth; abstracting far from the machine while never losing the option of dipping back down to bare metal.


Next >