Bayle Shanks's website: proj-oot-old-150618-ootAssemblyThoughts

todos

So what if we say "ALIAS r3 to x[33]; ALIAS r4 to r3; ALIAS r3 to y[2]"; do changes to r4 now affect x[33], or y[2]?
- actually i think this may be resolved now if we use an address-of mode and GETALIAS and SETALIAS, see below
is the 'alias level' identical to the 'meta level' above a 'perspective' (view), or do we need yet another mode for that? i'm guessing it's a field within the metadata in the meta level. But mb the meta level associated with an address-of the cell, not of the value contained at that address.

motivations and goals

The idea here is that Oot might have 3 well-defined compilation stages:

Oot -> Oot Core -> this Assembly/Bytecode -> platform-specific code

"Oot" would be the high-level language. It would be built from Oot Core using Oot Core's metaprogramming facilities.

The rationale for Oot Core is:

to conceptually highlight which high-level operations are important in the language (for example, in Haskell we see that things like partially applied functions, closures (let-bound stuff), variables applied to arguments, and generic thunks are all pretty 'core')
to focus implementation effort around these without worrying about VM impedence mismatch until the next level down

The rationale for Oot Assembly is:

to inspire me with ideas for Oot Core 'from the bottom up'
to make porting Oot easier
to make the implementation of Oot Core easier to read by providing an additional layer decoupling the implementation of Oot Core constructs from platform-specific details

Oot Assembly will make porting Oot easier by:

providing a small, compartmentalized language; all a porter needs to do is (a) implement the interpreter loop, (b) implement each opcode, (c) execute the reference implementation of Oot Core (which will be compiled to Oot Assembly) on top of the VM they've built

Some properties we want Oot Assembly to have:

easy to implement
simple
portable, abstracted from implementation details
a small set of opcodes, to ease implementation and to keep it simple
preserve higher-level intent (expressivness)
highly customizable
cross-platform preprocessing that must be done in any case should be done at a higher level above Oot Assembly; for example, Oot code must be parsed before Oot Assembly is generated
administrative automation such as garbage collection, lazy control flow, greenthread, and copy-on-write should be done at a lower-level beneath Oot Assembly, but should be exposed as primitives via Oot Assembly opcodes
parallelizable decoding (this is why we align to fixed-length chunks)
linear (but perhaps it should support more general graph shapes, too?)
support for annotations
efficient emulation
targetting a world of 'brain-like computers' with tens of thousands or more CPUs (or at least virtual threads or 'kernels'), each of which have a small amount of attached local memory

Some things that many traditional assembly languages do that we don't:

fixed length instructions
purely linear
purely imperative

What do we mean by 'preserve intent', and why do we want that?

What we mean is "don't map a specific operation S to a special case of a general operation G if, by looking at the result in the form of G, you cannot discover that this is actually S without doing non-local analysis".

Some examples:

we use 3-operand instructions, and memory-to-memory (rather than registers), in order to make the code more closely match its meaning (e.g. not obscure "c = a + b" by having MOVs in the middle of it). In particular, we want to make it so that, often, when there is one variable at the source code level, that translates into one particular memory location in the Oot Assembly level, rather than translating into a bunch of different memory locations with MOVs etc in between them (ie need for inessential register transfers).
having a way to explicitly state that an output value is 'discard'ed
forcing jumps to use specific instructions (as opposed to just writing to the PC)

There are three reasons to 'preserve intent':

to defer platform-specific optimization choices (eg if many platforms provide a primitive that efficiently implements a high-level construct, it would be a shame to instead use a slow reimplementation in Oot Assembly code of that same construct; better to have that primitive be a single Oot Assembly bytecode)
to make it easier to write quick-and-dirty program analysis
elegance/readability

Efficiency: We want Oot Assembly to be reasonably efficient to interpret; however, efficiency should not be at the expense of preservation of intent or customizability. We expect that performance-minded platform-specific implementations might compile Oot Assembly into custom high-performance bytecode formats anyways.

Examples of the consequnces of this choice:

use of a variable length encoding (in this case, customizability trumps efficiency)
mostly linear encoding (for efficiency)
separate opcodes for the same operation on different types, rather than polymorphic opcodes and a separate type field (for efficiency)
we don't precompute platform-specific addresses at compile-time, because we want to stay portable (but maybe at link-time?)

(copied from [1])

The idea of 'alias' and 'value' addressing is that each memory cell has two 'levels'; the 'alias' level, which might contain a pointer (or actually, maybe an entire 'alias record', which is a pointer with some attached metadata), or nothing is the cell is not a symlink but an actual value; and the 'value' level, which in the case of non-symlinked cells contains the contents of the cells, and which in the case of the symlinked cells looks like it contains the contents of the symlink target. So when you use value addressing on a symlink cell (that is, one with a non-empty alias level), you are really manipulating the cell that is the symlink target; and then you use alias addressing on this cell, you are reading or mutating its symlink record.

In order to CREATE an alias, a third addressing mode is needed, because you want to take the 'address' of a cell, and place it into the alias level of a different cell. Instead of 2 different alias-and-addressing relating modes, we could have some special instructions. Note that an 'address-of' mode could not be written to, only read, so maybe that one should be omitted?

We also need a sentinel to represent an empty alias cell. We can use 0, which would mean that the PC cannot be aliased, which is no great loss.

Also, is this 'alias level' identical to the 'meta level' above a 'perspective' (view), or do we need yet another mode for that? i'm guessing it's a field within the metadata in the meta level

if we used an 'alias' addressing mode, and a special instruction/opcode for GET_ADDRESS_IDENTIFIER and CREATE_ALIAS then:

If we wanted to look at or manipulate metainformation in the alias record, we would first move the alias record itself into an ordinary value cell, by using issuing a CREATE_ALIAS instruction whose input had alias addressing (r) and whose destination had value addressing (*r) (this create an alias from the destination cell to the alias record which is controlling the input cell; if we wanted to alias a destination cell to the same symlink target as a source cell, we would instead do CREATE_ALIAS with value addressing in the input (the more typical case). (note: if this is what we're doing, tho, then 'alias' addressing in CREATE_ALIAS's output is useless, which seems like a waste). Note: this is irregular in that the CREATE_ALIAS and GET_ADDRESS_IDENTIFIER opcodes would interpret value addressing different from every other opcode.

A more regular solution would be to use an address-of addressing mode, and to have GETALIAS and SETALIAS operations. To create an alias from r4 to r3, you would SETALIAS(input= address-of(r3), destination= address-of(r4)). To create an alias from r4 to whatever r3 is directly aliased to, GETALIAS(input=address-of(r3), destination=r1); SETALIAS(input= r1, destination= address-of(r4)). Etc. I like this better.

A downside there is that address-of can't go in the destination operand. But i guess that's good, it means we have a spare 1/2 addressing mode. hmm, mb too clever, but we could use that in place of SETALIAS... presumably SETALIAS will be somewhat common, but GETALIAS will not. to clear a symlink, we can copy 0 in using this addressing mode

Note that SETALIAS is a special case of setting .__GET and .__SET.

current proposal

Design goals

Ease of implementation

(Massively) parallelizable decoding

supports linear instruction stream; also optionally supports tree structure

Message constituents with a hard upper bound on their length in bits (in this case, 32 bits is the upper bound of werds, and the payload is max length of 24 bits)

supports at least 12-bit addressing (in fact, we support 24-bit addressing, although using some addressing modes on operands of more than 12 bits requires two instructions)

Preservation of HLL intent

Extensibility

note: efficiency is NOT a primary design goal; as with the Dalvik encoding of JVM bytecode, or the LuaJIT? bytecode, we expect that performance-minded implementations may create their own variant encodings. Eg the primary purpose of the 64-bit frame alignment is to support parallelized decoding (although efficiency is a secondary design goal).

Note that in this design, the operation might be indirectly specified via a reference to memory, rather than being immediately specified in the bytecode. This should be useful for encoding of calling first-class functions assigned to local variables.

Syntax

Note that the syntax of Oot Bytecode is a very general syntax that could be used for multiple languages (by varying the operations available, the modalities, the constaints, and the semantics of memory cells). Indeed, within Oot, this syntax is used for at least two 'languages'; one is Oot Instruction Bytecode and the other is Oot Graph Bytecode.

Sentences and phrases: Oot Bytecode is divided into variable-length sentences. Sentences consist of one or more variable-length phrases. Phrases consist of one or more werds

Each werd is either 16 bits or 32 bits. A 16-bit werd consists of an 8-bit header and an 8-bit payload. A 32-bit werd consists of an 8-bit header and a 24-bit payload.

An aside on terminology: The natural language linguistic concept of 'word' is a good fit for what is here called a 'werd' because, like linguistic words, Oot Bytecode werds are composed of a 'root morpheme' (the payload) and other syntactical and modifier morphemes (the header). However, in computing, the term 'word' is already in use to refer to architecture-specific fixed-length 'words', so to avoid confusion, i changed the spelling slightly. If this annoys you, feel free to call them "words"; in fact, in most contexts, i use the spelling "word" myself, and i only use "werd" when i am particularly worried about confusion with architecture-defined words.

Werds are grouped into 64-bit frames. Phrases cannot span multiple frames unless the parentheses construct is used.

Each phrase has one of 8 'roles' within the sentence. Each werd has one of (the same 8) subroles within the phrase. It's possible for multiple phrases within a sentence, or for multiple werds within a phrase, to have the same role or subrole. The ordering of phrases or werds with the same role or subrole is significant. Otherwise, the ordering is insignificant (except that some roles have positional constraints, namely, the first werd of a phrase is always subrole 0, and phrase roles 6 and 7 can only appear as the first phrase of a 64-bit frame).

The bits in the 8-bit werd header are as follows:

1 bit: is this a 16-bit or 32-bit werd (ie, is the payload of this werd 8 bits or 24 bits?)?
1 bit: EOS or BOP
3 bits: role
3 bits: addressing mode

EOS/BOP: if this is the first werd in the 64-bit frame, then its an EOS. Otherwise its a BOP. EOS means that this 64-bit frame is the last 64-bit frame in the current sentence. BOP means that this werd begins a new phrase.

role: if this is a BOP, then this is the role of this phrase within the sentence, and the role of this werd within the phrase is role 0. Otherwise, this is the subrole of the werd within the phrase. The 8 roles are:

0: operation/head/verb; this selects what sort of construct or operation is represented by this phrase or sentence. Somewhat analogous to opcode.
1: input/rvalue/source: Somewhat analogous to input operands.
2: output/lvalue/destination: Somewhat analogous to destination operands.
3: modality: modifications to the way that the meaning is processed. For example, lazy vs strict.
4: constraints ("such that" or "where" clauses): For example, "orange" in "Pick up the orange ball."
5: conjunctions. For example, 'a and b and c' or 'a or b or c' or 'a xor b xor c'
6: alternate formats. For example, can be used to select an alternate, spartan 'packed' format for this frame.
7: grouping constructs that span multiple frames. For example, hints about how long a multi-frame sentence is; parentheses; quoting and antiquoting; annotations

addressing modes:

0: direct
1: indirect
2: immediate
3: immediate, from constant table
4: split; constant[direct]
5: split; direct[immediate]
6: split; direct[constant]
7: split; direct[direct]

the 'split' modes split the payload in half (so an 8-bit payload is split into 2 4-bit payloads, and a 24-bit payload into 2 12-bit payloads), and apply the specified addressing mode to each half, then combine the two in a 'GET' (or index-into) operation. For example, mode 5 retrieves the contents of the memory cell indicated by the first half of the payload (the direct mode), and then finds the index within that data structure indicated by the second half of the payload; for example, if the first half of the payload is '3' and the second half is '5', and memory location 3 contains an array, then the effective address indicated would be the 5th element within this array.

the semantics of memory cells, indirect addressing, and the GET operation are language-specific

more details on some of the roles:

role 0 phrase details

role 1 phrase details

in this role, addressing modes yield the value found at the effective address

werds with subrole 2 within this phrase can be used to pass 'named arguments'; eg the name of the argument would be in subrole 2, and the value of the argument would be in subrole 1 (or implict in subrole 0)

role 2 phrase details

in this role, addressing modes yield an effective address

werds with subrole 2 within this phrase can be used to pass 'named return arguments'; eg the name of the return argument would be in subrole 2, and the lhs expression of the return argument would be in subrole 1 (or implict in subrole 0)

role 3 phrase details

todo: there should be a way to include a single 8-bit modality payload, but also a way to include arbitrary settings of named modality fields to values.

role 4 phrase details

role 5 phrase details

role 6 phrase details

Role 6 is generically defined as anything that can be translated in a context-independent manner (that is, the translator is only allowed to look at a contiguous group of 64-bit frames at a time, and must be stateless in between groupings) into one or more sentences.

When the payload is 8-bits, the 8-bit payload is modality (that is, it is interpreted as if there were a single-werd phrase with role modality, where the werd had role 0, and this was the payload of the werd), and the rest of the 64-bit frame is as follows:

8-bit opcode + 3x4-bit operands
8-bit opcode (stack addressing assumed for all operands)
8-bit opcode + 3x4-bit operands

When the payload is 24-bits, the first 8 bits select the number of 64-bit frames included in the packed segment, the next 8 bits specifies a language-specific packed representation, and the last 8 bits are language-defined. So far this is not used by Oot.

role 7 phrase details

Role 7 reinterprets the addressing mode field to indicate what type of grouping construct is present. All role 7 constructs apply at the granularity of entire 64-bit frames.

0: sentence length; Mandatory at the beginning of multi-frame sentences (if it is not present, length is assumed to be 1). Sentence lengths 0 and 1 are reserved for future use.
1: left parens
2: right parens
3: region annotation begin
4: region annotation end
5: quasiquote begin
6: quasiquote end
7: antiquote

The payload of role 7 is always split; the first half of the payload is how many 64-bit frames are spanned by the construct, and the second half is the 'type' of the construct. Frame lengths 0 and 1 may have special meanings, todo. Note that grouping-ending constructs such as right parens have an identical payload to the matching left-parens; since this includes the number of frames spanned, this makes it efficient to jump from the right parens to the corresponding left parens (or vice versa). The most-significant-bit of the 'type' of the construct is reserved. For quasiquote, this means whether or not there is any antiquote within this quasiquote; the semantics of this bit for other modes is reserved for future use.

As an optional extension, a language may support arbitrary-length constructs. In this case, if the length of the payload is the maximum value (2^24 - 1), then this indicates that the construct is actually arbitrarily longer than 2^24 - 2, and at displacement 2^24 - 2 will be found another construct opening werd. Most languages don't support this arbitrary-length construct feature, in which case payload value 2^24 - 1 is illegal.

some of these constructs are used to embed things from a different 'language' within bytecode of a default language. todo figure out which ones?

todo: can parens only enclose single phrases, or can they enclose whole sentences?

todo: What about EOS bits within these constructs?

todo is region annotation of length 0 a point annotation?

todo: can 'sentence length' also specify a 'foreign language' sentence?

note: the difference between 'foreign languages' in role 7 vs language-specific packed representations is that a role 7 'foreign language' uses the same format/syntax as described here, but varies the semantics (eg list of operations, list of modalities, list of constraints, semantics of memory cells, semantics of indirect addressing mode and GET), whereas role 6 is an extension mechanism to contain segments of arbitrary format/syntax.

languages

a language using this syntax must define:

a list of operations
the semantics of memory cells, including any 'special' memory cells (eg is location 0 the PC?)

and must either define the semantics of or forbid the use of:

a list of modalities
indirect addressing
GET
constraints
conjunctions
the semantics of role 7 constructs

and may define:

limits on sentence length, phrase length, and role 7 construct length, beyond those inherent in the syntax
limits on language-level semantics resulting from length limits
language-specific role 6 packed representations
a list of 'foreign languages' that may be embedded using role 7 constructs

numerology

There are 2^24 accessible memory cells (24-bit addressing). The size of the constant table must be less than 2^24.

Because an 8-bit payload can be split into two 4-bit payloads in split addressing mode, memory cells 0-15 can be accessed somewhat more easily than others. Similarly, constant table entries 0-15 can be accessed more easily than others.

Similarly, the largest memory cell location that can be accessed using split mode addressing is 2^12 - 1 (using a 24-bit payload split into 2 12-bit payloads). Similarly, constant table entries up to 2^12 - 1can be accessed more easily than others.

profiles

There are many aspects of this format which aid extensibility but which may make implementation more difficult. Therefore, although they are described above, the 'default profile' turns them off.

role-6 segments of length more than 1 frame (b/c this makes it hard to tell if you are looking at a role 6 chuck if you plop down at an arbitrary place in the middle of one, which may make massively parallel decoding harder)
arbitrary-length role-7 constructs

todo: let's have short construct length limits by default so that a default-profile interpreter doesn't have to reserve much memory

segment format

todo: need to specify constant table format, format for file containing both constant table and bytecode, etc? or is this implementation-dependent? (i'm leaning towards implementation-dependent, although in that case we should still specify it for the Oot language)

natural languages

some inspiration was taken from the syntax of natural languages (i took a course once in head-driven phrase structure grammar, so i wouldn't be surprised if this turned out to be particularly close to that). In addition, this syntax is probably sufficient for encoding most natural language sentences (assuming you are willing to do things like sometimes use multiple Oot Bytecode sentences to represent one natural language sentence).

Here's how you might encode some natural language constructs:

subject: role 2
object (direct or indirect): role 1
verb, verb phrase: role 0
prepositional phrase: either role 4 (constraint) (eg "flip the switch above the red light" is constraint; the switch such that it is above the red light) or role 3 (modality) (eg "with great urgency" is modality)
verb tense, aspect, mood, conjugation, modality etc: subrole 3 (modality) within role 0 phrase (note that our use of 'modality' is broader than its use in linguistics)
auxiliary verbs, multiple verbs: subrole 3 (modality) within role 0 phrase (eg "We are trying to understand the difference"; the root verb would be 'understand', and 'trying' would be a modality)
noun phrases: role 1 or 2 depending upon whether this is an object or subject
adjectives: something within a noun phrase; involves either subrole 0 (head), subrole 3 (modality) or subrole 4 (constraint) (eg 'Pick up the orange ball'; 'orange' is a subrole 4 (constraint))
adverbs: something within a verb phrase; involves either subrole 3 (modality) or subrole 4 (constraint)
pronoun: At the referent, make an assignment to a memory cell. Then when the pronoun is used, reference this memory cell. When there is a pronoun whose referent is in the same sentence, a separate Oot Bytecode sentence would probably be needed to assign the referent to a memory cell, which could then be referenced in the pronoun-using sentence
interjections: role 6
conjunctions: role 5 or role 7
clauses aside from the root clause: role 7
possessives: usually role 4 (constraint) (Sam's dog = a dog with the constraint that it belongs to Sam), but possibly GET addressing mode (Sam's dog = sams_possessions[dog])
plural, quantifiers: i'm not sure. conjunction? modality?

semantics: Oot Instruction Bytecode

indirect: todo; this probably will have something to do with aliasing or symlinking but it's not certain (see [[ootAssemblyNotes5?]]). In any case, the 'direct' mode is the 'fast' one.

In direct and indirect mode, the werds reference memory cells. A memory cell is considered to hold an entire arbitrarily-sized data structure; that is to say, a memory cell number is semantically more like a local variable than it is like an actual location in memory.

memory cell 0 is the PC (i think 1-3 should be special too, relating to stack(s), todo)

only supports up to 2^12-4 local variables

does not support arbitrary-length role 7 constructs

does not support modules whose AST has more than approximately 2^24 nodes

does not support modules which have more than 2^24 module-level werds (i'm flirting with a 2^12 limit here, although i think that's too low for many existing libraries; but note that if you import/link to external libraries a,b,c, as long as the references to those libraries are hierarchical (a.f, not just f), then 4096 is probably fine)

does not support composite literals with more than about 2^12 AST nodes (eg an array of strings is a composite literal; a string is an atomic literal)

todo