# proj-plbook-plChMinLangs

and esoteric langs

## LOOP

http://web.archive.org/web/20120311032544/http://loopgotowhile.eugenkiss.com/

## GOTO

http://web.archive.org/web/20120311032544/http://loopgotowhile.eugenkiss.com/

## WHILE

http://web.archive.org/web/20120311032544/http://loopgotowhile.eugenkiss.com/

## BASIC

Because of its simplicity and historical popularity, BASIC gets its own chapter.

See chapter on BASIC.

Tutorials:

## Hypercard

#### Logo

some notes on this history of logo are in the middle of:

http://web.augsburg.edu/~crockett/210/210%20Lab%20Set%204/Reed_OOP_Epistemology.pdf

#### Jones' A Minimal CISC

http://david.carybros.com/html/minimal_instruction_set.html

8 instructions:

• NOP: No operation.
• DUP: Duplicate the stack top. This is the only way to allocate stack space.
• ONE: Shift the stack top left one bit, shifting one into the least significant bit.
• ZERO: Shift the stack top left one bit, shifting zero into the least significant bit.
• LOAD: Use the value on the stack top as a memory address; replace it with the contents of the referenced location.
• POP: Store the value from the top of the stack in the memory location referenced by the second word on the stack; pop both.
• SUB: Subtract the top value on the stack from the value below it, pop both and push the result.
• JPOS: If the word below the stack top is positive, jump to the word pointed to by the stack top. In any case, pop both.

#### Jones' The Ultimate RISC

http://homepage.cs.uiowa.edu/~jones/arch/risc/

## Minimal Forths

http://stackoverflow.com/questions/407987/what-are-the-primitive-forth-operators?rq=1

http://newsgroups.derkeiler.com/Archive/Comp/comp.lang.forth/2005-09/msg00337.html

" A long time ago, I had a book called "Threaded Interpretive Languages", published I think by Byte, that discussed how to implement a Forth-like language (I don't think they ever called it Forth) in Z80 assembly.

You may not have a Z80 handy, or want one, but the book might be instructive.

+1 for Loeliger's book. The book NEVER said the language was FORTH. Between that book and the Ritter & Walker paper in Byte around the same time, I learned enough to build a reentrant direct-threaded code interpreter for a test set project a few years later at General Dynamics. – John R. Strohm Dec 13 '09 at 4:41

+1 for that book which I love, it was the book which hooked me on the idea of inventing and studying languages – Andy Dent Apr 18 '10 at 10:16

I picked up that book used based on this recommendation. It's great! – Barry Brown

"

## Var'aq

A 'Klingon' language. Stack-based.

4 data types:

• number
• function
• list
• string

Interesting stack operations:

• REMEMBER: place an opaque 'flag' on the stack
• FORGET: Clears the stack down to the flag and pops the flag.

Interesting control flow and variable operations:

• by default, names are lookup up if defined, like LISP. There is a quote operator that allow the name itself to be specified. 'variables and procedures live in a common namespace, since the act of pushing the content of a variable is essentially the same as executing the variable's name.'
• SET: assign to a name
• EVAL
• IFYES
• IFNO
• RETURN
• REPEAT

Interesting list operations:

• SHATTER 'Reduces a list to its component elements and pushes them on the stack in order'
• CONSUME 'Pops all objects on the stack down to mark and returns them in a list.'
• empty? 'Examines a list on the stack and returns 1 if its value is null (pagh), a 0 if it contains anything.'
• split 'Pops a list off the stack and returns the first item and the rest of the list.'
• cons 'Takes an object and adds it to the head of a list. Equivalent to the LISP cons operator. '

Interesting string operations:

• strtie 'Concatenates the top two strings on the stack into one.'
• compose 'Pops objects (executing proc objects if necessary) off the stack until a marker (placed by qaw) is hit and combines them into one string. '
• streq 'Pops the top two strings on the stack, compares them, and returns 1 if identical, 0 if not. '
• strcut 'Pops two values and a string and returns the section of the string between character startval and character endval.'
• strmeasure 'Pops a string off the stack and returns its length in characters. '
• explode 'Separates individual "words" in a string by whitespace.'

---

## BF

The real name is Brainf*, a curse word.

There are 8 commands (from http://web.archive.org/web/20060202010638/http://www.iwriteiam.nl/Ha_BF.html ):

• '>' : move the memory pointer to the next cell,
• '<' : move the memory pointer to the previous cell,
• '+' : increment the memory cell under the memory pointer,
• '-' : decrement the memory cell under the memory pointer,
• ',' : fills the memory cell under the memory pointer with the ASCII value of next character from the input,
• '.' : writes the contents of the memory cell under the memory pointer as a character with the corresponding ASCII value,
• '[' : moves to the command following the matching ']', if the memory cell under the memory pointer is zero, and
• ']' : moves to the command following the matching '[', if the memory cell under the memory pointer is not zero.

## Thue

http://esolangs.org/wiki/Thue

## MicroMini VM

Stack machine.

https://github.com/jarcane/MicroMini/blob/master/docs/MicroMini.txt

Instructions:

• NOP - No OPeration
• HLT - HaLT?
• DATA num
• AND, OR, XOR, NOT
• EQ?, LES?, GRT?
• PUSH num
• PUCA - PUsh CArry
• PUTI - PUsh TImer
• POP
• JIF address - Jump IF
• RET - Return
• TRMI, TRMO - TeRMinal? Input/Output

---

## related to the York Reduceron

### F-lite

F-lite is a subset of Haskell.

It is lazy. It is untyped. Constructor expressions like 'Cons x xs' can be used, but Cons does not have to be defined first.

It supports:

• function application and (non-anonymous) function definition
• uniform pattern matching. "Uniform" means that the order of equations doesn't matter. Patterns can be:
• expressed in a partial function definition style or a case expression style.
• nested
• incomplete
• note: "case patterns containing more than one constructor or wild-card matches", are sugar for nested case expressions, "each dealing with a single constructor component" [1]
• 'let' expressions (but with only variables, not patterns, on the LHS); multiple bindings within a let expression are permitted
• cyclic data structures
• finite precision integer and string literals (strings are Nil-terminated lists of characters)
• the binary functions + -, and the binary boolean functions <= == /= (/= is inequality). These operators must be written in prefix form and cannot be partially applied.
• the function 'emit' to print a character to the console and 'emitInt' to print an integer
• 'if' syntactic sugar for a 'case' expression with True and False cases

Semicolons are required to separate equations. The function 'main', which returns an integer, is run.

Code examples:

```  append Nil ys = ys;
append (Cons x xs) ys = Cons x (append xs ys);

zipWith f Nil ys = Nil;
zipWith f (Cons x xs) Nil = Nil;
zipWith f (Cons x xs) (Cons y ys) = Cons (f x y) (zipWith f xs ys);

init (Cons x Nil) = Nil;
init (Cons x (Cons y ys)) = Cons x (init (Cons y ys));

pow Nil = Cons Nil Nil;
pow (Cons x xs) = let { rest = pow xs } in
append rest (map (Cons x) rest);

repeat x = let { xs = Cons x xs } in xs;

negate n = (-) 0 n;

fromTo n m = case (<=) n m of {
True -> Cons n (fromTo ((+) n 1) m);
False -> Nil;
};

sayHi k = emit 'h' (emit 'i' (emit '!' k))
```

Full F-lite program to display the 10th fibonacci number:

```  fib n = case (<=) n 1 of {
True  -> 1;
False -> (+) (fib ((-) n 2)) (fib ((-) n 1));
};

emitStr Nil k = k;
emitStr (Cons x xs) k = emit x (emitStr xs k);

main = emitStr "fib(10) = " (emitInt (fib 10) (emit '\n' 0));
```

F-lite has been extended with generics to form G-lite.

### The reduceron's "template code" (its ISA)

[2] section 2.7 page 8

Example from [3] page 33 PDF page 47:

Flite append function translated from Flite to Template code

Figure 3.3:

```append xs ys = case (xs) of {
Cons x xs -> Cons x (append xs ys);
Nil       -> ys; }
main = append Nil Nil;

(0,[FUN 2 1,CON 0 1, PTR 0],[[CON 0 1]])
(2,[ARG 0, TAB 2, ARG 1],[])
(2,[ARG 1],[])
(4,[CON 2 0, ARG 0, PTR 0],[[FUN 2 1, ARG 1, ARG 3]])
```

" The template number zero (0,[FUN 2 1, CON 0 1, PTR 0],[[CON_0_1?]]) , is the main program, of arity zero. The atom FUN 2 1 indicates a function of arity two and index one, FUN 2 1 is applied to two arguments CON 0 1 which is the encoding for Nil . The atom PTR 0 is a reference to the off-spine application number zero, which contains the second argument of the function append.

Now, the template number one is the function append of arity two (2,[ARG 0, TAB 2, ARG 1],[]) . In the spine application ARG 0 is the case subject xs , the TAB 2 atom is a reference to the case alternatives, for example if the case subject ARG 0 evaluates to Nil then the second template will be evaluated. Otherwise, the third template will be evaluated ((i don't understand this, why shouln't 'CON 2 0' (cons) cause the first one to be evaluated, and 'CON 0 1' (nil) cause the second one to be evaluated?)). The atom ARG 1 is the free variable ((ys)) used in all case alternatives.

The template number three is the recursive case alternative (4,[CON 2 0, ARG 0, PTR 0], [[FUN_2_1,_ARG_1,_ARG_3?]]) . The arity of this template is four, the reason is the special encoding which includes the two arguments ARG 0 and ARG 1 representing x and xs , an argument for case table and one free variable. The first atom CON 2 0 is a constructor which expects two arguments and the index of this constructor is zero. The first argument is ARG 0 and the second is a pointer the application [FUN 2 1, ARG 1, ARG 3] . This application corresponds to the recursive application at the high-level (append xs ys) . "

Types of atoms:

```data Atom
= FUN Arity Int   // Function with arity a and address i
| ARG Int         // Reference to the i-th the function argument
| PTR Int         // Pointer to the i-th application
| CON Arity Int   // Constructor of arity a and index i
| INT Int         // Integer literal
| PRI String      // Primitive function (named)
| TAB Int         // Case table
```

[4] and [5]

"Each template can be viewed as a low-level encoding of a let expression. The let-bindings are encoded as off-spine applications and the body of the let as the spine-application . Each template ( arity;app;apps ) in Reduceron template code is a tuple of 'arity', spine application 'app' and off-spine applications 'apps'. Instead of names the results of the off-spine applications are referred to by position (eg. PTR 0 for the first off-spine application)." [6]

relating to the CON (constructor) atom, "A variant of Scott/Jansen encoding combines data construction and case selection. A constructor of datatype d translates to a function that in addition to components takes as arguments alternative continuations for each possible d-construction." [7].

The rules for how a computation proceeds in this template language are given in The Reduceron Reconfigured section 3 page 4, or The Reduceron Reconfigured talk slides at ICFP 2010, slides starting with 'Reduceron machine state', or Static Methods to Check Low-Level Code for a Graph Reduction Machine by Marco Polo Perez Cervantes (thesis) section 3.4 page 34 PDF page 48, or The Reduceron reconfigured and re-evaluated by Matthew Naylor and Colin Runciman section 3 page 11 (also at https://github.com/tommythorn/Reduceron/blob/666e78d1ec4a9fd42438a6f4f45bd861a5d2c1b0/semantics/Reduceron.lhs#L127).

To summarize, the machine has a program code memory (the templates), a heap (storing various expressions; PTRs are indices into the heap), a reduction stack (the current expression being reduced), and an update stack (stating where on the heap to write back the current expression after it has been reduced). When the TOS (top of stack) of the reduction stack is a PTR, the PTR is dereferenced and the corresponding expression is copied from the heap onto the TOS of the reduction stack, and an entry is pushed onto the update stack to remember to save the result back to that heap location after this expression has been reduced. At each step, we check to see if the TOS of the reduction stack has been reduced, and if so, we write it back to the heap as directed by the update stack. Otherwise, we evaluate primitives, apply functions, and turn case tables into functions to be applied.

---

## Simpl/Com

" The syntax of Simpl is defined by the polymorphic data type ( 0 s, 0 p, 0 f ) com , where 0 s Definition 2.1 Syntax of Simpl I is the state space type, 0 p the type of procedure names and 0 f the type of faults. The constructors are listed in the following table, where c , c 1 and c 2 are Simpl commands of type ( 0 s, 0 p, 0 f ) com and the Boolean condition b and the guard g are modelled as state sets of type 0 s set .

Skip Do nothing Basic f Basic command, where f is a state-update: 0 s ⇒ 0 s Seq c 1 c 2 Sequential composition, also written as c 1 ; c 2 Cond b c 1 c 2 Conditional statement (if-then-else) While b c Loop Call p Procedure call, p is of type 0 p Guard f g c Guarded command, fault f is of type 0 f Throw Initiate abrupt termination Catch c 1 c 2 Handle abrupt termination Spec r Basic (nondeterministic) command defined by the relation r of type ( 0 s × 0 s ) set DynCom? c s Dynamic (state dependent) command, where c s has type 0 s ⇒ ( 0 s, 0 p, 0 f ) com

... Runtime faults are modelled by the guarded command Guard f g c , where g is the guard that watches for runtime faults that may occur in c . If g is violated the fault f is signalled and the computation stops.

The core procedure call Call p is parameterless. Parameter passing is imple- mented in section 2.4 .

Throw and Catch are the basic building blocks to model various kinds of abrupt termination, like break , continue , return and exceptions. In Catch c 1 c 2 the com- mand c 2 can be seen as handler. It is only executed if c 1 terminates abruptly.

Command Spec r defines the potential next states via the relation r . It can be used to model nondeterministic choice or in the context of refinement [ 72 , 28 ] it can represent a specification of a piece of code rather than an implementation.

The dynamic command DynCom? c s allows to abstract a command over the state space. It can be used for di ff erent purposes: To implement scoping, parameter passing, expressions with side-e ff ects or “real” dynamic construct like pointers to procedures or dynamic method invocations. Details follow in section 2.4 .

The set of Simpl commands is not minimal. A Skip can be implemented by Basic ( λ s . s ), the dynamic command DynCom? can be used to implement the condi- tional, and the Spec command can be used to implement the Basic command. This separation reflects the di ff erent concepts behind the commands."

...

2.4.2 Concrete Syntax for Simpl To improve readability of Simpl programs I introduce pseudo-code syntax. First of all let me address the assignment. With the state represented as record, an assignment m = m - 1 is mapped to a Basic command that performs a record update:

Basic ( λ s . s (

 m : = m s − 1 )).

...

concrete syntax abstract syntax SKIP Skip m := e Basic ( λ s . s (

c 1 ; c 2 Seq c 1 c 2 IF b THEN c 1 ELSE c 2 FI Cond { s . b } c 1 c 2 IF b THEN c 1 FI Cond { s . b } c 1 Skip WHILE b DO c OD While { s . b } c TRY c 1 CATCH c 2 END Catch c 1 c 2 g → c Guard False g c {
 m : = e )) P } { s . P }

" Basic models the atomic transitions in the embedded language’s state space, e.g. variable assignment or heap update.

Seq provides sequential composition,

Cond and While conditional and iterative statements respectively. It should be observed that the condition expressions are side-effect free.

Spec can be used to describe non-deterministic transitions, but we do not make any use of this in the following.

Function call semantics map to Call and DynCom? statements.

Throw and Catch give an exception mechanism with which one can model abrupt termination in language statements like return .

Finally, Guard s lead to failure semantics if guard expressions do not hold prior to execution of parameter programs. We drop the χ parameter when describing Guard programs in the following, as from a soundness point-of-view the cause of failure semantics is irrelevant " -- [8]

---

## HITECO

A pedagogical self-compiling compiler.

[Chapter 21, Self-compiling Compilers, of Symbolic Processing In Pascal, by Professor Manfred von Thun

---

## Bcompiler

A pedagogical self-compiling compiler by Edmund Grimley Evans.

https://github.com/certik/bcompiler

There were a number of bootstrapping stages, divided into 5 larger phases labeled HEX1 through HEX5.

In HEX4, there are labels, 'push constant', 'push label address', 'call label address', and the following predefined functions:

• Stack manipulation: drop dup rot pick swap
• Arithmetic: + - * / % 1 log
• Comparisons: < <= == != >= >
• Bitwise logic: &
 ^ ~
• Memory access: @ = c@ c=
• Flow of control, using immediate relative address: branch call
• Flow of control, using stored absolute address: call
• Array support: [] []& []= c[] c[]& c[]=
• Access of arguments and variables: arg arg& arg= var var& var=
• Function support: enter vars xreturnx xreturn0 xreturn1
• Dynamic memory: wsize sbrk / malloc free realloc
• System calls: exit in out

(and jump, which doesn't appear to be in that table; and presumably other stuff)

HEX5 contains (presumably in addition to HEX4): string constants, named variables, if/then/else, while loops with break and continue, and probably other stuff too.

---

## Lox

Pedagogical language from Crafting Interpreters

http://craftinginterpreters.com/the-lox-language.html

C-style syntax. Dynamic. Automatic memory management.

Types: booleans, numbers, strings, nil

Statements and expressions.

Expressions:

+, -, *, /, also - (as arithmetic negation)

< <= > >= == !=

! and or ('and' and 'or' are short-circuiting and so are also control flow operators)

statements:

'print'

blocks {}

grouping by parens ()

variable declaration with 'var'

variable assignment with '='

if, while, for

functions. Define functions with 'fun' keyword. Functions are first-class. Closures.

classes with (single inheritance) subclasses, fields, methods, 'this', 'init' methods, 'super'. The syntax for class declaration is the 'class' keyword. The syntax for construction is just use the classname, eg "var breakfast = Breakfast();" The syntax for subclasses is '<', eg:

" class Brunch < Breakfast { drink() { print "How about a Blood Mary?"; } } "

stdlib: 'print', 'clock'

token types:

" enum TokenType? { Single-character tokens. LEFT_PAREN, RIGHT_PAREN, LEFT_BRACE, RIGHT_BRACE, COMMA, DOT, MINUS, PLUS, SEMICOLON, SLASH, STAR,

```  // One or two character tokens.
BANG, BANG_EQUAL,
EQUAL, EQUAL_EQUAL,
GREATER, GREATER_EQUAL,
LESS, LESS_EQUAL,```
```  // Literals.
IDENTIFIER, STRING, NUMBER,```
```  // Keywords.
AND, CLASS, ELSE, FALSE, FUN, FOR, IF, NIL, OR,
PRINT, RETURN, SUPER, THIS, TRUE, VAR, WHILE,```
`  EOF} " -- [9]`

---

## Lithp

"a basic, tiny LISP implementation. It was created to be a configuration/logic file format for a game I am working on."

• # # foo comment - skipped during parsing
• ; ; foo comment - skipped during parsing
• A (A) returns the value

Variables, Functions:

• set (set A B ... ) sets A to B, evaluating both. Returns B.
• setf (setf A B ... ) sets A to B, evaluating only B. Returns B.
• setq (setq x 4 y 3) same as 'setf'
• defun (defun A B C) defines a function called A with parameter list B and code block C. Returns A.
• (A X Y) Calling the above (if there were two parameters.) Wrong number of parameters returns NIL w/o evaluating

Numbers:

• + (+ A B C) add a list
• - (- A B C) subtract a list
• * (* A B C) multiply a list
• / (/ A B C) divide a list
• % (% A B) After A is divided by B, what's the leftover?
• 1+ (1+ A) returns the number plus one
• 1- (1- A) returns the number minus one

Comparisons:

• < (< A B) returns T if A < B, otherwise NIL
• <= (<= A B) returns T if A <= B, otherwise NIL
• > (> A B) returns T if A > B, otherwise NIL
• >= (>= A B) returns T if A >= B, otherwise NIL
• = (= A B) returns T if A == B, otherwise NIL
• and (and A B) eval's the arguments until it hits a NIL
• or (or A B) eval's the arguments while args are NIL
• not (not A) returns the opposite of A (T->NIL),(NIL->T)
• null (null A) same as 'not'
• if (if A B C) if A is true, then B else C. if there's no C, return NIL
• unless (unless A B C) unless A is true, do B, C and any others.
• when (when A B C) when A is true, do B, C and any others.
• cond (cond (A B C)) if A is true then do B,C... otherwise, try the next set

Evaluations:

• eval (eval (A B)) evaluates (A B) as if it were directly input
• prog1 (prog1 A B C) evaluates all parts, returns the first's return value
• prog2 (prog2 A B C) evaluates all parts, returns the second's return value
• progn (progn A B C) evaluates all parts, returns the last's return value

Lists:

• quote quote (A B) returns the element instead of evaluating it
• ' '(A B) same as 'quote'
• atom (atom E) returns T if E evaluates to an atom, not a list
• equal (equal A B) returns T if A and B have the same structure and atoms
• car (car E) returns the head of list E
• cdr (cdr E) returns all but the head of list E
• cons (cons A B) returns a appended to the head of list B
• list (list A B) returns a list of the elements as passed in

Output:

• princ (princ A B) print out the list entries and atoms
• terpri (terpri) print out a new line (terminate printing)

---

## META II 'order codes'

the following is quoted from [10], following figures 6.1 and 6.2 of [11] D. V. Schorre, "META II: A Syntax-Oriented Compiler Writing Language," Proceedings of the 19th ACM National Conference (1964), ACM Press, New York, NY, 41.301-41.3011.] http://doi.acm.org/10.1145/800257.808896

Below is a list of all of the operators of the META II parsing machine as given in [Schorre64]. These are called order codes just as the assembly language operations of a computer of that era were called. This is the language that appears in the Code window. These are the op codes of the META II virtual machine. These are the objects and operations of the domain of META II compiling. They are complete in themselves. To move Meta II to a different language these and only these are the objects and operations that must be implemented in the target language.

Mnemonic Purpose Actions

• TST 'string': Test for string in input. After skipping initial whitespace in the input string compare it to the string given as argument. If the comparison is met, skip over the string in the input and set switch. If not met, reset switch.
• ID: Identifier token. After skipping initial whitespace in the input string, test if it begins with an identifier, i.e., a letter followed by a sequence of letters and/or digits. If so, copy the identifier to the token buffer; skip over it in the input; and set switch. If not, reset switch.
• NUM: Number token. After deleting initial whitespace in the input string, test if it begins with an number, i.e., a sequence of digits. If so, copy the number to the token buffer; skip over it in the input; and set switch. If not, reset switch.
• SR: String token. After deleting initial whitespace in the input string, test if it begins with an string, i.e., a single quote followed by a sequence of any characters other than a single quote followed by another single quote. If so, copy the string (including enclosing quotes) to the token buffer; skip over it in the input; and set switch. If not, reset switch.
• CLL AAA: Call subroutine. Enter the subroutine beginning at label AAA. Push a stackframe of three cells on the stack containing:
```    label 1 cell, initialized to blank
label 2 cell, initialized to blank
location cell, set to the return from call location```
• R: Return from subroutine. Return from CLL call to location on the top of the stack and pop the stackframe of three cells.
• SET: Set switch. Set the switch to true.
• B AAA: Unconditional branch. Branch unconditionally to the label AAA.
• BT AAA: Branch if true. If the switch is true, branch to label AAA.
• BF AAA: Branch if false. If the switch is false, branch to label AAA.
• BE: Branch to error if false. If the switch is false, report error status and halt.
• CL 'string': Copy literal. Copy the variable length string (without enclosing quotes) given as argument to the output buffer.
• CI: Copy input. Copy the token buffer to the output buffer.
• GN1: Generate label 1. If the label 1 cell in the top stackframe is blank, then generate a unique label and save it in the label 1 cell. In either case output the label.
• GN2: Generate label 2. Same as for GN1 except acting on the label 2 cell.
• LB: Move to label field. Set the output buffer column to the first column.
• OUT: Output record. Output the output buffer with line terminator; clear it; and set the output buffer column to the eighth column.

META II Pseudo Operations (following figure 6.3 of [Schorre64]):

• ADR AAA: Starting location. Pseudo operation that specifies the starting label to call.
• END: End of source. Pseudo operation that specifies the end of input.

---

## VALGOL II 'order codes'

I think VALGOL II is implemented with META II in the paper [12] D. V. Schorre, "META II: A Syntax-Oriented Compiler Writing Language," Proceedings of the 19th ACM National Conference (1964), ACM Press, New York, NY, 41.301-41.3011.] http://doi.acm.org/10.1145/800257.808896 .

NOTE 1: If the top item in the stack is an address, it is replaced by its contents before beginning this operation

NOTE 2: Same as note 1 but applies to the top two items

• LD AAA: LOAD. Put the address aaa on top of the stack.
• LDL NUMBER: LOAD LITERAL. Put the given number on top of the stack.
• SET: SET. Put the integer 1 on top of the stack.
• RST: RESET. Put the integer 0 on top of the stack.
• ST: STORE. Store the contents of the register, STACK1, in the address which is on top of the stack, then pop up the stack.
• ADS: ADD TO STORAGE; NOTE 1. Add the number on top of the stack to the number whose address is next to the top, and place the sum in the register STACK1, then store the contents of that register in that address and pop the top two items out of the stack.
• SST: SAVE AND STORE; NOTE 1. Put the number on top of the stack into the register STACK1, then store the contents of that register in the address which is the next to top term of the stack. The top two items are popped out of the stack.
• RSR: RESTORE. Put the contents of the register STACK1 on top of the stack.
• ADD: ADD; NOTE 2. Replace the two numbers which are on top of the stack with their sum.
• SUB: SUBTRACT; NOTE 2. Subtract the number which is on top of the stack fron the number which is next to the top, then replace then by this difference.
• NLT: MULTIPLY; NOTE 2. Replace the two numbers which are on top of the stack with their product.
• DIV: DIVIDE; NOTE 2. Divide the number which is next to the top of the stack by the number which is on top of the stack, then replace them by this quotient.
• NEG: NEGATE NOTE 1. Change the sign of the number on top of the stack.
• WHL: WHOLE. Truncate the number which is on top of the stack.
• NOT: NOT. If the top term in the stack is the integer 0, then replace it with the integer 1 otherwise, replace it with the integer 0.
• LEQ: LESS THAN OR EQUAL; NOTE 2. If the number which is next to the top of the stack is less than the number on top of the stack, then replace them with the integer 1. Otherwise, replace them with the integer 0.
• LES: LESS THAN; NOTE 2. If the number which is next to the top of the stack is less than the number on top of the stack, then replace them with the integer 1, otherwise, replace them with the integer 0.
• EQU: EQUAL NOTE 2. Compare the two numbers on top of the stack. replace them by the integer 1, if they are equal, or by the integer 0 if they are unequal.
• B AAA. BRANCH. Branch to the address AAA.
• BT AAA: BRANCH TRUE. Branch to the address AAA if the top term in the stack is not the integer 0, otherwise, continue in sequence. Do not pop up the stack.
• BF AAA: BRANCH FALSE. Branch to the address AAA if the top term in the stack is the integer 0, otherwise, continue in sequence. Do not pop up the stack.
• BTP AAA: BRANCH TRUE AND POP. Branch to the address aaa if the top term in the stack is not the integer 0. otherwise, continue in sequence. In either case, pop up the stack.
• BFP AAA: BRANCH FALSE AND POP. Branch to the address AAA if the top term in the stack is the integer 0, otherwise, continue in sequence. In either case, pop up the stack.
• CLL: CALL. Enter a procedure at the address which is below the flag.
• LDF: LOAD FLAG. Put the address which is in the flag register on top of the stack, and put the address of the top of the stack into the flag register.
• R: AAA: RETURN. Return from procedure.
• AIA: ARRAY INCREMENT ADDRESS. Increment the address which is next to the top of the stack by the integer which is on top of the stack, and replace these by the resulting address.
• FLP: FLIP. Interchange the top two terns of the stack.
• POP: POP. Pop up the stack.
• EDT: STRING EDIT; NOTE 1. Round the number which is on top of the stack to the nearest integer N. Move the given string into the print area so that its first character falls on print position N. In case this would cause characters to fall outside the print area, no movement takes place.
• PRT: PRINT. Print a line, then space and clear the print area.
• EJT: EJECT. Position the paper in the printer to the top line of the next page. Read the first n numbers from a card and store them beginning in the address which is next to the top of the stack. The integer N is the top term of the stack, pop out both the address and the integer. Cards are punched with up to 10 eight digit numbers. Decimal point is assumed to be in the middle. An 11-punch over the rightmost digit indicates a negative number.
• WRT: WRITE. Print a line of n numbers beginning in the address which is next to the top of the stack. the integer n is the top term of the stack. Pop out both the address and the integer. twelve character positions are allowed for each number. There are four digits before and four digits after the decimal. Leading zeroes in front of the decimal are changed to blanks. If the number is negative, a minus sign is printed in the position before the first non-blank character.
• HLT: HALT. Halt.

CONSTANT AND CONTROL CODES:

• SP N: SPACE. N = 1--9; constant code producing n blank spaces.
• BLK NNN: BLOCK. Produces a block of NNN eight character words.
• END: END. Denotes the end of the program.

---

## IBNIZ

A VM designed for size-constrained demoscene 'demos'; "a programming language that generates video and audio from very short strings of code".

" two-stack machine somewhat similar to Forth, but with the major execption that the stack is cyclical and also used at an output buffer. Also, as every IBNIZ program is implicitly inside a loop that pushes a set of loop variables on the stack on every cycle, even an empty program outputs something (i.e. a changing gradient as video and a constant sawtooth wave as audio). ... runs the program in two separate contexts with separate stacks and internal registers: the video context and the audio context ... The scheduling and main-looping logic is the only somewhat complex thing in IBNIZ. All the rest is very elementary, something that can be found as instructions in the x86 architecture or as words in the core Forth vocabulary. Basic arithmetic and stack-shuffling. Memory load and store. An if/then/else structure, two kinds of loop structures and subroutine definition/calling. Also an instruction for retrieving user input from keyboard or pointing device. Everything needs to be built from these basic building blocks. And yes, it is Turing complete, and no, you are not restricted to the rendering order provided by the implicit main loop. " [13]

" Word width: 32 bits (arithmetic in 16.16 fixed-point) Address space: 2^20 words (4 megabytes, ~3 of which free user RAM) Video output: 256x256 pixels at 60 Hz, 32 bits per pixel (VVUU.YYYY) Audio output: 61440 Hz mono (30720 Hz stereo), 16 bits per sample " [14]

Instructions (from [15]):

• arithmetic: add sub mul div mod sqrt
• bitwise: and or xor right (ROR) left (<<) neg
• trig: sin atan
• conditional arithmetic: isneg ispos iszero
• stack: dup pop exchange (SWAP) trirot (ROT) pick bury
• data segment: getdata (can be used for reading the data segment sequentially. It fetches the given number of next bits from the data segment) startdata (end code segment, start data segment)
• control if: if else endif
• control loops:
• times (i0 --) loop i0 times (push i0 and insptr on rstack)
• loop decrement RSTACK[top-1], jump back if non-0
• index (-- i) load value from RSTACK[top-1]
• outdex (-- j) load value from RSTACK[top-3]
• do begin loop (push insptr on rstack)
• while (cond --) jump back if cond!=0
• jump (v --) set instruction pointer to value v
• control subroutines:
• defsub (i --) define subroutine (store pointer to MEM[i]). The return stack is used for storing the return addresses when visiting subroutines. Defsub ('{') stores the address of the next instruction to the memory address given by the value on top of stack and then skips instructions until '}' or end-of-code is reached.
• return end of subroutine; pop insptr from rstack
• visit (i --) visit subroutine pointed to by MEM[i]
• control return stack manipulation:
• retaddr (-- val) (val --) moves from rstack to stack
• pushtors (val --) (-- val) moves from stack to rstack
• misc: mediaswitch (switches between audio and video context) whereami (pushes exterior loop variable(s) on stack) userin (get data from input device) terminate (halt)

---

## TIS-100

From the game TIS-100

http://www.zachtronics.com/images/TIS-100P%20Reference%20Manual.pdf

https://unwiredben.livejournal.com/308306.html

https://alandesmet.github.io/TIS-100-Hackers-Guide/assembly.html

---

## EXA

From the game EXAPUNKS

http://exapunks.wikia.com/wiki/EXA_instructions

---

## LC-3

Pedagogical language

Variants:

"The LC-3b ISA describes a modified version of the LC-3 that includes the following changes:

```    The machine's word size remains 16 bits, but its memory is now byte-addressable with the same address space.
The LD and ST instructions (load and store data using PC-relative addressing) have been removed.
Two instructions, LDB and STB, have been added to manipulate individual bytes of memory; the other load and store instructions continue to act on entire words.
The reserved opcode has been converted into a shift instruction, SHF, that supports arithmetic and logical shifts of arbitrary size in both directions.```

" [16]

---

## StoneKnifeForth

https://github.com/kragen/stoneknifeforth

bootstrap interpreter in Python:

(main part of?) StoneKnifeForth?:

"a very simple language inspired by Forth. It is not expected to be useful; instead, its purpose is to show how simple a compiler can be. The compiler is a bit under two pages of code when the comments are removed. " -- [17]

" It’s a dumbed-down Forth with no compile-time execution, in which only the first byte of each word is significant, with some heterodox names for functions. Bytes preceded by ' are literals for their numerical value, e.g. “'A” is 65.

What that means, if you know how to program but don’t know Forth:

Each word of the program is generally a subroutine call. Words are separated by spaces. There are no types. There’s an implicit stack; words pop their input off the stack and push their output onto it. Numbers and literal bytes are pushed onto the stack.

So “2 3 + 4 -” calculates 2 + 3 - 4; “2 3 4 + -” calculates 2 - [3 + 4]. “'Z 'A -” calculates the ASCII value of “Z” minus the ASCII value of “A”, i.e. it calculates 25. [“+” isn’t in the base language implemented by the compiler; it’s a user-defined function. “-” is in the base language.]

“: foo” marks the beginning of a subroutine called “foo”; “;” is the return instruction. So “: . %flush u ;” defines a subroutine called “.” which consists first of calling “%flush”, then calling “u”, then returning.

“var foo” marks the beginning of a variable named “foo”; the word “foo” thereafter will push its address onto the stack. “# 37” includes the number 37 in the program at that point as a 32-bit number, so “var X # 0” creates a variable “X” and puts four bytes of zeroes there.

“@” and “!” fetch from and store to a memory address respectively. So “var X # 0 : dup X ! X @ X @ ;” creates a variable X, gives it 32 bits of storage initialized to 0, and defines a subroutine called “dup”. “dup” takes whatever is on the stack, stores it in those 32 bits of storage at “X”, then fetches it out of X twice, leaving two copies of it on the stack.

Combining all of the above, “var HERE # 0 : forward HERE @ + HERE ! ;” fetches the value at the location “HERE”, adds whatever was on the stack to it, and stores the result back at “HERE”.

Compile-time primitives: ( — defines a comment that extends to the next right-paren v — defines a dataspace label, like for a variable. b — compiles a literal byte, numerically, into data space.

1. — compiles a literal four-byte little-endian number into data space. ^ — the location where the program should start executing [everything else is just definitions] space, newline — ignored
— defines a function [ — begins a conditional block, which consumes the value on top of the stack and is skipped if that value is zero ] — ends a conditional block, matching the latest unmatched [ { — begins a loop } — ends a loop — consumes the value on top of the stack and loops back to the matching { if it is nonzero

Run-time primitives: G — gets a byte from stdin W — given an address and size on the stack, writes the specified number of bytes to stdout Q — exits the program with a return code of 0

• — subtracts two numbers. [Addition isn't a primitive because it’s easy to synthesize from subtraction.] @ — fetches a word from memory ! — stores to a word in memory whose address is on top of the stack ; — returns from a function < — a less-than test m — not used directly since the interpreter doesn’t implement it, this invokes an arbitrary system call with up to three arguments. The compiler’s implementations of G, W, and Q use this. s — stores to a byte in memory whose address is on top of the stack " -- [18]

" The entire “trimmed” source code is 1902 bytes, which is less than half the size of the nearest comparable project that I’m aware of, otccelf, which is 4748 bytes.

As demonstrated by the interpreter written in Python, the programming language itself is essentially machine-independent, with very few x86 quirks:

• items on the stack are 32 bits in size;
• arithmetic is 32-bit;
• stack items stored in memory not only take up 4 bytes, but they are little-endian.

...

Why? To Know What I’m Doing ... "Alan Kay frequently expresses enthusiasm over the metacircular Lisp interpreter in the Lisp 1.5 Programmer’s Manual. ... But if you try to implement a Lisp interpreter in a low-level language by translating that metacircular interpreter into it...you run into a problem. The metacircular interpreter glosses over a large number of things that turn out to be nontrivial to implement — indeed, about half of the code is devoted to things outside of the scope of the metacircular interpreter. Here’s a list of issues that the Lisp 1.5 metacircular interpreter neglects, some semantic and some merely implementational:

• memory management
• argument evaluation order
• laziness vs. strictness
• other aspects of control flow
• representation and comparison of atoms
• representation of pairs
• parsing
• type checking and type testing (atom)
• recursive function call and return
• tail-calls
• lexical vs. dynamic scoping

...

A metacircular compiler forces you to confront this extra complexity. Moreover, metacircular compilers are self-sustaining in a way that interpreters aren’t; once you have the compiler running, you are free to add features to the language it supports, then take advantage of those features in the compiler.

So this is a “stone knife” programming tool: bootstrapped out of almost nothing as quickly as possible.

Why? To Develop a Compiler Incrementally

When I wrote Ur-Scheme, my thought was to see if I could figure out how to develop a compiler incrementally, starting by building a small working metacircular compiler in less than a day, and adding features from there. I pretty much failed; it took me two and a half weeks to get it to compile itself successfully.

Part of the problem is that a minimal subset of R5RS Scheme powerful enough to write a compiler in — without making the compiler even larger due to writing it in a low-level language — is still a relatively large language. Ur-Scheme doesn’t have much arithmetic, but it does have integers, dynamic typing, closures, characters, strings, lists, recursion, booleans, variadic functions, let to introduce local variables, character and string literals, a sort of crude macro system, five different conditional forms (if, cond, case, and, or), quotation, tail-call optimization, function argument count verification, bounds checking, symbols, buffered input to keep it from taking multiple seconds to compile itself, and a library of functions for processing lists, strings, and characters. And each of those things was added because it was necessary to get the compiler to be able to compile itself. The end result was that the compiler is 90 kilobytes of source code, about 1600 lines if you leave out the comments.

Now, maybe you can write 1600 lines of working Scheme in a day, but I sure as hell can’t. It’s still not a very large compiler, as compilers go, but it’s a lot bigger than otccelf. So I hypothesized that maybe a simpler language, without a requirement for compatibility with something else, would enable me to get a compiler bootstrapped more easily.

So StoneKnifeForth? was born. It’s inspired by Forth, so it inherits most of Forth’s traditional simplifications:

• no expressions;
• no statements;
• no types, dynamic or static;
• no floating-point (although Ur-Scheme doesn’t have floating-point either);
• no scoping, lexical or dynamic;
• no dynamic memory allocation;

And it added a few of its own:

• no names of more than one byte;
• no user-defined IMMEDIATE words (the Forth equivalent of Lisp macros);
• no interpretation state, so no compile-time evaluation at all;
• no interactive REPL;
• no do loops;
• no else on if statements;
• no recursion;

Surprisingly, the language that results is still almost bearable to write a compiler in, although it definitely has the flavor of an assembler.

Unfortunately, I still totally failed to get it done in a day. It was 15 days from when I first started scribbling about it in my notebook until it was able to compile itself successfully, although git only shows active development happening on six of those days (including some help from my friend Aristotle). So that’s an improvement, but not as much of an improvement as I would like. At that point, it was 13k of source, 114 non-comment lines of code, which is definitely a lot smaller than Ur-Scheme’s 90k and 1600 lines. (Although there are another 181 lines of Python for the bootstrap interpreter.)

It’s possible to imagine writing and debugging 114 lines of code in a day, or even 300 lines. It’s still maybe a bit optimistic to think I could do that in a day, so maybe I need to find a way to increase incrementality further.

My theory was that once I had a working compiler, I could add features to the language incrementally and test them as I went. So far I haven’t gotten to that part.

...

Related work

```        is a small imperative programming language that is just powerful enough to compile itself (compiler bootstrapping).
has no GOTO, but has while-break-wend and multiple return
has C-like string handling.
is implemented in less then 1000 source code lines for the compiler.
produces real binary programs for x86 processors, not P-code or Byte-Code.
uses the C compiler toolchain (assembler, linker)
uses C library functions like printf(), getchar(), strcpy(), isdigit(), rand() for run-time support.```

Actually it produces assembly, not executables.

Version 0.9 was released 15 Jul 2006. The 1000 source lines include a recursive-descent parser and a hand-coded lexer.

Sample code:

return 1 if ch is in s, 0 else func in(ch: char, s: array char): int var i: int begin i = 0 while s[i] # 0 do if ch = s[i] then return 1 endif i = i + 1 wend return 0 end

FIRST and THIRD, from the IOCCC entry.

Ian Piumarta’s COLA system.

Oberon.

Fabrice Bellard’s OTCC.

F-83.

eForth, especially the ITC eForth.

Jack Crenshaw’s Let’s Build a Compiler. This is a how-to book that walks you through an incrementally-constructed compiler for a toy language, written in Pascal, in about 340 pages of text. The text is really easy to read, but it will still take at least three to ten hours to read. It uses recursive-descent parsing, no intermediate representation, and it emits 68000 assembly code.

Ikarus.

Ur-Scheme.

PyPy?.

Bootstrapping a simple compiler from nothing: Edmund GRIMLEY EVANS 2001 http://web.archive.org/web/20061108010907/http://www.rano.org/bcompiler.html

" -- [19]

" /* StoneKnifeForth? (currently) only uses the following 23 * instructions, in decimal: * * - 15 157 192: setge %al * - 15 182 0: movzbl (%eax), %eax * - 15 190 192: movsbl %al, %eax * - 41 4 36: sub %eax, (%esp) * - 80: push %eax * - 88: pop %eax * - 89: pop %ecx * - 90: pop %edx * - 91: pop %ebx * - 116: jz (short, PC-relative) (1-byte operand) * - 117: jnz (short, PC-relative) (1-byte operand) * - 129 237: subtract immediate from %ebp (4-byte operand) * - 133 192: test %eax, %eax * - 135 236: xchg %ebp, %esp * - 136 8: mov %cl, (%eax) * - 137 229: mov %esp, %ebp * - 139 0: mov (%eax), %eax * - 143 0: popl (%eax) * - 184: load immediate %eax (e.g. mov \$12345678, %eax) (4-byte operand) * - 195: ret * - 205 128: int \$0x80 * - 232: PC-relative call to absolute address (4-byte operand) * - 254 200: dec %al * * Moreover, jz and jnz are always preceded two instructions * previously by a test instruction, and setge is always preceded two * instructions previously by a sub instruction, so of the flags, only * ZF and SF are used, and their state is never saved or restored or * transferred across jumps. * */ " -- [20]

" Register-use conventions: `%eax` — the top of the operand stack `%esp` — normally, points to the rest of the operand stack in memory, in the usual `%esp` way: growing downward, predecrement, postincrement. Temporarily swapped with `%ebp` during `call` and `ret` instructions. `%ebp` — normally, points to the return stack in memory, in the usual `%esp` way. `%ecx` — occasionally used as a scratch register Other registers are only used to pass arguments to Linux system calls. ) " -- [21]

## Inertia register machine

https://github.com/Conceptual-Inertia/Inertia/

https://github.com/Conceptual-Inertia/Inertia/blob/master/docs/instr_file.txt

i am assuming the following details from reading the above-referenced instr_file.txt; i could have misunderstood:

16 instructions: add div mul ltn (less-than) eql and not or inc dec print load goto if (conditional skip) return call

4 registers