Table of Contents for Programming Languages: a survey
In this book we'll focus on instruction sets, addressing modes, etc, rather than on other aspects of processors such as pipelining.
NOTE: regarding this and all of the chapters about assembly language/ISAs, the author has only programmed using 6502 assembly! Everything else here should be taken with a grain of salt.
assembly languages
Close to machine language (bung a 20-bit address, but by using a 16-bit address and 4-bit index. This is the root of the 64k segment size problem that dogs DOS to this day.
Since DOS only ran in 8086 mode there was no easy way to address more than one meg of memory, and various standards were set up to allow access to addresses beyond the 1 meg barrier. The bug in the '286 was useful in that it allowed 8086 mode programs to see the first 64k above 1 meg. It involved some weird messing about with the keyboard controller to toggle the state of the 21st address line, and this is why you still see on some syt not exactly; most assembly languages allow the programmer to define alphanumeric labels for code positions and alphanumeric variable names) Linear imperative sequence of opcodes; statements, not expressions No assignment operator to assign to a variable; the 'alphanumeric variable names' mentioned above just map to a single memory location Registers or stack separate from memory Condition flags Untyped Goto and bne style control flow Addressing modes at least 3: immediate, register (or memory), indirect (tho see Parallax Propeller which uses self-modifying code in lieu of indirect) operations on fixed width data (e.g. "assume these memory locations contain 8-bit ints and add them" or "assume these memory locations contain 32-bit floats and add them") sometimes macros: for generating inline, as opposed to called, subroutines
CPU instruction set architectures
For more inspiration about the sorts of instructions that might go into a VM, one might look at popular CPU instruction sets.
My purpose in including this section is NOT to teach the reader the basics of assembly language and computer architecture; i assume that the reader already knows that. I just want to give you more food for thought about 'minimal' programming languages.
Links:
3 ISA paradigms for processors (RISC and non-RISC)
- Accumulator (certain registers for certain operations)
- stack machine (operations work on the top few elements of the stack)
- (general purpose) registers (GPRs)
the winner was: general purpose registers
Special-purpose registers
- PC (program counter)
- zero register (always 0; a useful constant)
- Condition code/flag registers: Carry, Zero, Negative, Overflow
- Link registers: stores return address of caller; assigned during subroutine call
- base register: added to memory addresses
Register Addressing modes
From http://www.cl.cam.ac.uk/teaching/0405/CompArch/mynotes.pdf :
Classic RISC Addressing Modes:
- Register: Mov r0 <- r1
- Immediate: Mov r0 <- 42
- Register Indirect: Ldl r0 <- Mem[r1] (follow a pointer held in r1)
- Register Indirect with Displacement: Ldl r0 <- Mem[128 + r1]
Less RISCy addr modes (ARM and PowerPC?):
- Register plus Register (Indexed): Ldl r0 <- Mem[r1 + r1]
- Register plus Scaled Register: Ldl r0 <- Mem[r1 + r2*k]
- multiplier k is specified by the programmer; must be power of 2
- use: index into array whose elements are of size k
- Register Indirect with Displacement and Update
- like Register Indirect with Displacement, but composed with C's ++
- two forms, pre- and post-; in C, *(++p) and *(p++)
- use: creating stack (local) variables
CISC Addressing Modes:
- Direct (Absolute): Mov r0 (1000)
- Offset often large
- x86 Implicit base address
- Memory Indirect: Mov r0 <- Mem[Mem[r1]]
- PC Indirect with Displacement: Mov r0 <- Mem[ProgramCounter? + 128]
- wikipedia http://en.wikipedia.org/wiki/Addressing_mode cites some data on the frequency of various addressing modes:
- "^ "Fundamentals of Computer Design" p. 112-113 "Frequency of addressing modes for TI TMS320C54x DSP. The C54x has 17 data addressing modes, not counting register access, but the four found in MIPS account for 70% of the modes. Autoincrement and autodecrement, found in some RISC architectures, account for another 25% of the usage. This data was collected form a measurement of static instructions for the C-callable library of 54 DSP routines coded in assembly language.";
- "Instruction Set Principles: Addressing Mode Usage (Summary)" by Dr. Sofiène Tahar "3 programs measured on machine with all address modes (VAX)": "displacement mode" (bayle: register indirect with immediate offset) and "immediate mode" are used 75% of the time.;
- [1] "Efficient and Language-Independent Mobile Programs" by Ali-Reza Adl-Tabatabai, Geoff Langdale, Steven Lucco, and Robert Wahbe 1995: "79% of all instructions executed could be replaced by RISC instructions or synthesized into RISC instructions using only basic block instruction combination.
Links:
ISA design tradeoffs
regularity vs code density:
- orthogonality vs code density
- fixed width instructions vs code density
Misc
Links