Bayle Shanks's website: proj-oot-ootAssemblyNotes1

note that MOV can accomplish loads and stores by having one operand be a register or stack, and another be register indirect

note: if provide only CALLIF and not BNZ, then need a way to pass 'nil' to the result operand so that the return address is thrown out, so that CALLIF can simulate BNZ

Proposal: allow multiple custom opcode lookup tables in memory. This allows an interpreter to maintain computed goto versions of itself for each custom opcode table, which will execute faster. Now have a command to switch the current table (or, actually, i think this should usually be lexically scoped rather than dynamic?). In addition, have a command that says 'execute the specified command in the context of the specified opcodetable, then skip the next instruction'. This can be used for the effect of executing an instruction which is not currently assigned an opcode. Also, it seems we have an operand to spare there (one operand giving the location of the command to execute (usually the next instruction); one operand giving the desired codetable; and a third operand which would normally be a result, but is not needed now). Perhaps the codetables have a parameter which can be set here? This command does not need to be in the always-present 8-command set because it is just a way of allowing the user to have more than 16 custom commands usable without switching codetables; if the user needs that they can make it one of their custom commands.

should change MOV to CPY. A clearer name, and also it avoids confusion with 'move semantics' in HLLs (since we might actually want a MOV someday too).

special purpose registers: PC, data stack register, instruction stack register, link register? what else? how do we represent these if 'register mode' is just memory? where is the stack stored? is it the same as registers?

current guess: the stack is not the same as/mapped into low memory, it is separate, and only accessible using the stack addressing mode. This seems restrictive but it is not much since each operand always has its own addressing mode, so there's never a time you'd want to write to the stack and then read the value from low memory; the only time you'll hurt is if you write to low memory but then want to access that value using the implicit stack mode of tiny. in this case we can put registers into low memory (i'm thinking to put them above the first 32 bits, because these 32 bits are the only things accessible by 'small' instructions with 1-bit operands)

alternately you could just specify the meta stuff out-of-band by 'registering' a block of code ahead of time. This seems more efficient actually.

note: we can put the scaling parameter for an indexed,scaled addr mode in the shared operand, but then where is it in large instructions? note: we could put parameters like the comparison type for CMP in the shared operand, but then where is it in large instructions? note: maybe junk like that should all go into a register?

addr mode ideas:

would really also like: immediate with parameter register, stack indirect, indexed, memory, memory indirect

memory means that the value may be found/placed at the memory location denoted by the operand (whose interpretation differs depending on the type)

note: GET is used to index into a structure or array (e.g. GET(A, 5) = A[5] == A.__get(5), GET(A, 'b') = A.b == A.__get('b')), but is also used to apply a function (GET(f, 5) = f(5)). So to add two numbers you could do: GET(GET(ADD, 2), 3).

note: we really need ADD itself so that we can increment..

---

Proposal for opcode table: allow multiple custom opcode lookup tables in memory. This allows an interpreter to maintain computed goto versions of itself for each custom opcode table, which will execute faster. Now have a command to switch the current table (or, actually, i think this should usually be lexically scoped rather than dynamic?). In addition, have a command that says 'execute the specified command in the context of the specified opcodetable, then skip the next instruction'. This can be used for the effect of executing an instruction which is not currently assigned an opcode. Also, it seems we have an operand to spare there (one operand giving the location of the command to execute (usually the next instruction); one operand giving the desired codetable; and a third operand which would normally be a result, but is not needed now). Perhaps the codetables have a parameter which can be set here? This command does not need to be in the always-present 8-command set because it is just a way of allowing the user to have more than 16 custom commands usable without switching codetables; if the user needs that they can make it one of their custom commands.

so we need to design a good 8 opcode ISA for the fixed opcodes, and a good 4 addr modes for the fixed addr modes. note that CPY (MOV) can accomplish loads and stores by having one operand be a register or stack, and another be register indirect

special purpose registers: PC, data stack register, instruction stack register, link register, parameter register? what else? how do we represent these if 'register mode' is just memory? where is the stack stored? is it the same as registers?

note: if the PC is an accessible register and if you had a zillion operands, then you don't need BLE and BNE per se, you could have 'if A <= B then put C into D'. But unless it was register 0 or 1, then you couldn't do this in tiny mode, which would cripple tiny mode.

mb add a PC addressing mode? for oot's purposes though, since we dont want to expose linear memory too much, this isn't as useful (for relative jumps, maybe, but not for locating a data storage area relative to the PC)

note: JMP is accomplished by BEQ with the branching operands both set to immediate 0. wrong: BEQ is now relative note: CALLEQ is like BEQ but first pushes the return address (the address of the next instruction) onto the call stack note: multiple stacks in place of registers

---

ok stack addressing mode is really just indirect indexed, where the table to be indexed into is located at the top of the stack, where PUSH is postincrement, and POP is predecrement, and the top of the stack pointer points at the free location above the top item of the stack. So let's replace this with an indirect immediate indexed mode. But now we need postincrement and predecrement, right? I thought addressing modes were supposed to be pure functions without side effects? Well actually the addressing mode for the 3rd, output/target operand (mb that should be first to stick with assembly language convention) can have side effects, as the point of most instructions is to mutate that item anyways. So we can have PUSH. How to fit that into our instruction encoding? Well, as noted above, the IMMEDIATE addr mode is useless on the output, so we can replace that with indirect indexed, postincrement (PUSH). We are still missing POP, but that's OK, because POPing is not pure so we don't want to use it to get inputs in addressing modes.

so in order for this to work, the stack pointer must be accessible. So we put it in a register, which is memory mapped into low memory. Also, if we want to use indirect immediate indexed for relative jumps, then the PC should also be in a register. So lets have memory location 0 be the PC, and locatinion 1 be the stack pointer. Pointers have a standard size of 16 bits. E.g. indirect index mode uses location 1 as the implicit indirect pointer (the jump-off point), and the immediate value as the relative value (the index).

But we'd also like an indirect register index mode, where you use the contents of a register, not an immediate value, as the index.

So, combining these two with the choice of indirect index or index indirect, and also the choice of whether the choice of jump-off point or the choice of index is implicit, we would seem to get 8 possibilities:

(value given is immediate or memory indirect) x (value given is for base or for index) x (indirect index or index indirect)

however, there is no distinction between base and index except for the question of indirect index or index indirect, so we really only have 4 things here.

(value given is immediate or memory indirect) x (indirect index or index indirect)

namely:

(1) i give you a value. Take the contents of the index register, and add the value to that. This is a pointer to the target. (2) i give you a value. Deref the contents of the index register, and add the value to the result. This is a pointer to the target. (3) i give you a pointer. Retrieve the value at that pointer. Take the contents of the index register, and add the value to that. This is a pointer to the target. (4) i give you a pointer. Retrieve the value at that pointer. Deref the contents of the index register, and add the value to the result. This is a pointer to the target.

now we could use a trick from another architecture (i forgot which one), where we assume that there's never any reason for an instruction to read itself, and so if modes #3 or #4 specify the PC as the pointer, instead of returning the right thing, we consider that as a code for immediate mode. In which case #3 and #4 could be used for #1 and #2 by setting the operand to zero; but then you have to have more space for the actual value somewhere.

if we make all four of these modes fundamental, then we also want an immediate mode. So there's only room for 3 custom modes then.

note that an immediate mode is not strictly needed; we can always put a lookup table of immediate values in the code, put the pointer to the table into the index register, and use mode #1. But since our operands are small we'd have to be constantly switching lookup tables, which is annoying.

note that mode #1 can be used for PC-relative pretty easily.

note that memory absolute can be given either by setting the index reg to 0 and using mode #1, or by putting the target into the index reg and using mode #1 with operand 0.

a (possibly very good) alternative to the above would be separate LEA-like instructions for each mode.

---

what is the Haskell memory representation of partially applied function?

here it is:

https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects#Partialapplications

https://en.wikipedia.org/wiki/X86-64#Architectural_features

do we need more stuff to mimic the x86-64 special registers?

how does haskell represent boxed types in memory? see

---

" Vacant codes are also needed by software producers for virtual instructions that can be emulated. Microsoft is using the code C4 C4 in Windows for such a purpose. This code now conflicts with the new VEX instructions, which is the reason why Intel had to disable VEX instructions in 16-bit real and virtual mode. Only two codes are reserved for software emulation. These are called UD1 (0F B9) and UD2 (0F 0B). " -- http://www.agner.org/optimize/blog/read.php?i=25#172

"today, we have pretty much arrived at a point where all types of data in a machine use the same size – 64-bit integers, floats, and pointers are the norm." -- http://jakob.engbloms.se/archives/2004

" We view belt machines as a new category of architectures, at rank equivalent to transport-triggered, or superscalar, or dataflow, or accumulator architectures " -- http://www.eetimes.com/author.asp?section_id=36&doc_id=1320257

http://ootbcomp.com/docs/belt/

i haven't watched this yet:

https://www.youtube.com/watch?v=QGw-cy0ylCc

Slides: 2013-07-11_mill_cpu_belt (.pptx)

http://millcomputing.com/blog/wp-content/uploads/2013/12/2013-07-11_mill_cpu_belt.pptx

" The belt machine model is inherently free of update hazards because all operation results go onto the belt by Single Assignment; in other words, once created they never change their value. Belt machines have no general registers and thus no rename registers that physically embody them " -- http://ootbcomp.com/docs/belt/

" By doing away with registers, the Mill avoids the very expensive renaming and managing register hazards in a conventional OOO machine. It is a potential big win in terms of power efficiency, since managing hundreds of rename registers for a user-visible set of a few dozen registers is a major painpoint in OOO processor design. The replacement is a fixed-length FIFO called the Belt (http://www.youtube.com/watch?v=QGw-cy0ylCc). Architecturally, values are read from the belt, fed into functional units, and put at the front of the belt. A bit like a stack machine, but with the ability to read from any value, not just the top of the stack. The Mill also separates between temporary values that are used to implement data flow between functional units, which are put on the Belt, and longer-term storage of values being used many times, which are put in a compiler-managed scratchpad memory. According to prior research, only some 13% of values are used more than once, the rest can easily be handled by temporary storage in a Belt. " -- http://jakob.engbloms.se/archives/2004

(in reference to http://jakob.engbloms.se/archives/2004 )

" Also, the metadata isn't like a None in Python, it's like a Maybe in Haskell! You can string them together and only throw an exception on side effects in a way that makes speculation (and vector operation) in this machine much nicer. "

an interesting thing about nock is thats it has no primitive binary operators which combine information about two inputs to produce an output in one step (e.g. a+b) -- everything is via state

in Liquid Republic, we have 5 branches of 'government':

forum
board
exec
chairs
courts

can we adapt this to a 5-fold conception of oot?

maybe:

forum:data
board:?
exec:control flow
chairs: safety
courts: interpretation, meta

what could the Board be? one thing obviously missing here is 'operations' If we add in 'programming language constructs' that bridges data and control flow, like the Board in LR

try making something 'quinary' like RDF but with more, and symmetrisized for 5

is symbolic thought binary or trinary (RDF)? maybe its quinary. todo; start with +,-,subject,verb,object and make something more symmetric. kant? quantity,quality,relation,modality?

polarity/quantity/(negation?), conditionality (from cs; e.g. if statements), (conditionality maybe or may not be the same as) role/relation, modality?, meta/paradox/escape/transcend (synthesis in thesis/antithesis/synthesis), concreteness/ identity in group theory e.g. successor/constants/projection fns in primitive recursion? and what about the object(s) to be operated on?

want 5 'places' each of which can take on 5 (kinds of?) values

bah, the idea of having each of 5 places having on of 5 values is so binary!

  note theres 5 things in. the primitive recursion defn, should pack more in there, those 5 can be the five values for one of the places

rdf provenance is like mapping onto another graph, configuration

so start with rdf, add provenance, now add one more quotation/meta/escape position?

subject/object/verb/meta/? space/time/matter/energy (or is it mass/energy?)/? configuration? knowledge/viewpoint/perspective? gravity? 0/1/-1/?/?

i guess configuration is like cs conditionality and mb role/relation?

are meta and perspective distinct? meta and configuration (annotation)?

3 = thesis/anti/synth, annotation/quotation, OR verb/role

so do we already get a whole graph structure for 3, as opposed to 'n values in n positional roles' for 2?

then what for 4,5? maybe something more like human language

hmm.. even simple positional assembly/machine language can have a 5_par scheme: opcode, 2 x (operand, addressing mode)

where does probability come in here?

here's one instance of 5s:

if you consider the powerset arising from elements True/False (e.g. a 'relational' logic), {}, {F}, {T}, {F,T}, we get the:

Indian catuskoti: true, false, neither true nor false, both true and false

(the Hasse diagram of First Degree Entailment logic:

   {T}
  /  \{T,F} {} \ / {F}

note that this is not identical to the set inclusion diagram in this case:

   {T,F}
  /    \{T} {F} \ / {}

)

we can add a fifth value, 'ineffable', to form a 5-valued many-valued logic.

If you then take the power set of that, you get plurivalent logic.

Plurivalent logic can be motivated by König’s paradox:

" Ordinals are numbers that extend the familiar counting numbers, 0, 1, 2, etc, beyond the finite. After we have been through all the finite numbers (of which there is, of course, an infinity), there is a next number, ω, and then a next, ω+1, and so on, forever. These ordinals share an interesting property with the counting numbers: for any set of them, if there are any members at all, there must be a least one. How far, exactly, the ordinals go is a vexed question both mathematically and philosophically. Nevertheless, one fact is beyond dispute: there are many more ordinals than can be referred to using a noun phrase in a language with a finite vocabulary, such as English. This can be shown by a perfectly rigorous mathematical proof.

Now, if there are ordinals that cannot be referred to in this way, it follows that one of them must be less than all the others, for that is true of any collection of ordinals. Consider the phrase ‘the least ordinal that cannot be referred to’. It obviously refers to the number in question. This number, then, both can and cannot be referred to. That’s our paradox. And since it cannot be referred to, one cannot say anything about it. So the facts about it are ineffable; but we can say things about it, such as that it is the least ordinal that can’t be referred to. We have said ineffable things. "

and similar problems in philosophy:

" (Kant) distinguished between two notions of noumenon, the realm beyond the senses: a positive one and a negative one. According to him, only the negative one is legitimate. We cannot talk about things of this kind; we just need to be aware of them to mark the limit of what we can talk about. Pardon? In explaining what they do, are we not talking about them? Well, yes, of course we are. "

" Gorampa was troubled enough by the situation that he attempted to distinguish between two ultimate realities: a real ultimate reality, which is ineffable, and a ‘nominal’ ultimate reality, which is what we end up talking about when we try to talk about the real ultimate. But wait a minute – the nominal ultimate is obviously effable: by definition, it is the reality that we can talk about. In that case, if we say that ultimate reality is ineffable and we are actually talking about the nominal ultimate, what we are saying is false. Thus Gorampa’s proposal refutes itself. "

-- http://aeon.co/magazine/world-views/logic-of-buddhist-philosophy/

---

note: the Hasse diagram for the 5-valued logic may be sort of 'pyramidal' if you draw the first four values as a planar diamond and the ineffable value on a different plane from the rest

note: which 5-valued value corresponds to 'neutral' in 3-valued logic? i would say 'ineffable'. So, if the first four values are drawn as a a diamond in the horizontal plane, and the ineffable value is a point on its own in the middle of the diamond but in another plane, forming a pyramid, then 3-valued logic would be a triangle with {T}, {F}, and ineffable as its points, standing vertically and cutting the pyramid diagonally.

---

start with something; add something else to trancend it; take the power set; repeat

so the sequence is:

1,2,4,5,32,33,2^33,...

so more compactly,

1,4,32,2^33, 2^(2^33+1)

as of this writing, this is not in the OEIS.

i guess i like this 5-valued logic as a structure for 5-valued polarity. It matches my "5 = 2^2 + 1", where the +1 represents an 'escape' or 'meta' and the 2^2 represents 2 polarities, then expanded to 2^2 to make a very symmetric object.

so again, some concepts that we have to work with in the search for a useful 'quinary' structure for computation:

5: 5-valued polarity, as above (in which 2-valued or 2-valued polarity may be embedded straighforwardly, as can the powerset of two elements)
5: the 5 axioms of primitive recursion (constant, projection (conditionals), successor, composition, primitive recursion)
4: RDF subject-object-verb, with a fourth term for provenance, e.g. graph annotation. verbs are also like roles or relations or operators, and subject, object are also like the two arguments to a binary operator
the 5th element is like meta,escape,quotation,annotation
4: binary operations on two objects, plus an operation on the operation, in such a way so that the two operations are symmetrical with the two objects
5: wu-xing
5: forum,board, exec, chairs, courts
5: -2, 1, 0, 1, 2
4: space, time, matter, energy
5: space, time, matter, energy, gravity
5: space, time, matter, energy, configuration
5: space, time, matter, energy, scale
4: kant's quantity,quality,relation,modality
scale? configuration? gravity? modality? quality? probability?
- and what about probability amplitudes, heisenberg-ish pairs of attributes, unitary transforms, and wave functions? entropy? temperature?
- in quantum mechanics, we often have two coordinates, real and imaginary, and we get the probability amplitude by taking the magnitude of this complex number
do types come into this at all?
human language stuff: verbs, nouns, modifiers, tense, aspect, mood, conjunctions, prepositions, negation, quantifiers, modality
kant's judgements of relation: is/isa, implication, xor
note: human languages have stuff to do with time, do we want that for computation?

i want the 5 elements to be somewhat symmetrical, if possible

my notion of 'polarity' seems similar to Kant's 'quality'. 'verb'/'relation'/'role' is probably like Kant's 'Relation'. I suppose quotation is like Kant's modality. So we still need Kant's 'quantity'.

Kant's quantity is, on the one hand, universal/particular/singular judgements (all x are y, some x are y, the x is y; singular can also be 'this x is y'; 'This rose is beautiful' is given as an example), and on the other, the categories of totality, plurality, unity, (mb not in that order). Unity has some correspondence with choosing a unit of measurement.

i guess this is kind of like, thinking of an individual object, vs. thinking of a bunch of objects as a plurality, vs 'set theory thinking' e.g. thinking of the set of all things satisfying some property and/or applying an operation to the set as a way of implicitly referencing the members of the set. the latter could relate to the idea of having Oot allow one to override __apply__ to make things like implicit list variables such that operations applied to the variable are actually mapped over the list hmm.. this suggests another route to symmetry.. consider things like the symmetries of functions which are polymorphic in the type of their arguments, and arguments which via things like '__apply__' are sort of 'polymorphic' depending on the function being applied to them

hmm, if we use types, then this is coming together a little in terms of finding symmetries.. types are like quotation, but they are also like lists/sets. computation can be lifted to the type level. if computations polymorph depending on both objects and functions, then those are sort of symmetric. so types, being setlike, are a lot like kant's 'quantity', and being quotation-like, are like his 'modality'.

so we have things like:

subject-object-verb-provenance-type
subject-object-verb-correspondence-type
subject-object-verb-quotation-type
polarity(quality)-quantity(type)-modality(quotation)-relation(verb,operation)-correspondence
polarity-quantity-modality-relation-interpretation

note that correspondence or modality could each take the role of the 'meta', more symmetry.

we know that the OOP stuff also gives symmetries, can we work that in?

also, how does this compare to the 'hypergraph' model?

the ying/yang question of whether visible is paired with active or passive is like a symmetric square, a 4.

is the (traditional or modern) square of opposition related?

http://plato.stanford.edu/entries/square/

hypergraphs.. hmm

subject (starting node)
object (edge that you are following)
are you ascending to the label or not (quotation/LEVEL)
if a mutation, then the element that you are assigning to this location
?

---

5-polarity fits with having relational variable values and operations i.e. like LIBRA

---

see also '5-polarity' way below

maybe need to have 'modifiers' as non-meta-level annotations, e.g. ways to modify nouns with adjectives and verbs with adverbs

human language moods:

https://en.wikipedia.org/wiki/Grammatical_mood

realis/indicative (assertion)

subjunctive (hypothetical)

conditional (true given a condition)

optative (wishes, or wishes capable of fulfillment)

imperative (direct command)

jussive and horatory (one should do this or 'let's do this')

potential (something considered likely in the opinion of the speaker)

inferential/mirative (how direct is your evidence for this statement)

interrogative (questions)

potential subjunctive (assertion that something is possible)

deliberative subjunctive (rhetorical question to self, e.g. what am i to do?)

concessive (concedes a point, e.g. "even though X, he is still a good friend")

-- https://en.wikipedia.org/wiki/Grammatical_mood , http://www.thelatinlibrary.com/101/Subjunctive1.pdf , http://people.bu.edu/tylert/Latin/independentsubjunctives.html

---

an interesting thing about nock is thats it has no primitive (0-5) binary operators which combine information about two inputs to produce an output in one step (e.g. a+b) -- everything is via state

---

also, remember that indirect addressing modes aren't strictly needed, as long as you have commands for LOAD, STO, LEA, and STOI (i think)

how would a biological system do it?

it probably wouldn't have equally-spaced frames, except on lower levels ('symbols'). it would have start and end delimiters.

like DNA.

the start and end delimiters can be absurdly many symbols. e.g. in a book, we have empty lines delimiting paragraphs, and many empty lines delimiting chapters, and empty pages delimiting multichapter parts. this is ok b/c they occur relatively infrequently.

so maybe we can have 'sentences.

views

map

fold

type attributes

oot assembly: two formats: fixed width, and variable width (adjectives, start and end delimiters)

channels

---

the most magical powers of two are given by:

2^1 = 2, 2^2 = 4, 2^4 = 16, 2^16 = 65536, etc

note: a looser set: instead of just this recursive sequence, could also just generate the set 2^(2^x). So 1,2,4,16,256,65536, .... So (within the usefully low range) the difference is just that we've added 256.

---

another instruction format: assume 16 bits (because it is a most magical power of 2). instruction size is 2*4*16 = 128 bits = 8 words of 16 bits each. Now use these as follows to make a 2-instruction VLIW pair by having mostly chunks of 1, 2,4, or 16 bits:

4 16-bit operands (2 input operands for the first op, the second op uses the output of the first op plus one more operand, and one result operand) 4 4-bit addressing modes 2 8-bit opcodes 4 4-bit type specifier (or maybe just type SIZE, in bits, specifier) 16 bits of misc control information: todo

conveniently, that's 2 64-bit words.

note: if 1,2,4,16 are the preferred bitfield size (but see below, in some cases we want to add 256), then a type size specifier that only chooses between those 4 only needs 2 bits. So, for 4 operands, if we want a 4-bit type specifier for each, then for each operand we could have 2 bits to choose the type size, and 2 bits for the type proper. Some types might start at a minimal size, rather than 1 bit, e.g. floating point choices may be: 16-bit, 32-bit, 64-bit, 256-bit. Some types might also substitute the largest choice for an arbitrary-precision format, e.g. instead of 256-bit float, a decimal.

What would our 4 types be? Maybe dynamic, pointer, integer, float? Or maybe dynamic, int, float, unicode?

or, to eliminate those pesky 8 bit opcodes:

4 16-bit operands (2 input operands for the first op, the second op uses the output of the first op plus one more operand, and one result operand) 4 4-bit addressing modes 2 16-bit opcodes+control info 4 4-bit type specifier (or maybe just type SIZE, in bits, specifier)

(but in this format, where do we put the specification of which of the second op's arguments is given by the result of the first op? we wand our 16-bit opcodes+control info to be symmetric, right?)

similarly, instead we could try to do 4-bit opcodes, and 2*4 = 8-bit instruction sizes, but then we can't even fit 3 4-bit fields into the instruction size. So 4-bit opcodes and 16-bit instruction size:

3 4-bit operands 1 4-bit opcode

note that there is no addressing mode in the above, so some of the 16 opcodes would have to include immediate and register addressing mode variants. If we reserve half of those for custom opcodes for the user, then we only have 8 opcodes. Simple but probably inefficient.

opcode ideas: LOADI LOAD STORE ADD SUBTRACT MUL LEA JNEQ

since we have so few, maybe omit MUL and have something else like MOV or JMP instead (or remove LEA and have both of those). Remember, the programmer also has 8 custom opcodes, MUL can be in there. But do we want to reserve an opcode for 'custom opcode table switch'?

there's something about working with only 4-bits that feels very 'assembly language'-ish and clean, which i like. but i guess the problem with having a VM with this few opcodes is that with so few opcodes, the compiler will be compiling complex operations to long sequences of simple opcodes, losing the optimization available by using platform-specific opcodes. Generally speaking, even the 8-bit microcontrollers don't tend to have this few opcodes (instead they have less than 3 operands). So maybe we should leave the land of 4 and go to the land of 16 if we want to have a VM with the union of opcodes available on various systems, instead of the intersection. Still, this makes me think again of the idea of having multiple fixed instruction size formats. But we already have fixed vs. arbitrary sized.. i guess that doesn't burn a bit, though, now that we're using a long string of things to indicate boundaries in the arbitrary-sized one. We'd have a special opcode for the switch, or something like that.

i guess if you wanted these two to be the two fixed sizes, you'd allow each one of the pair of instructions in the 128-bit VLIW instruction pair to be replaced by 3 16-bit instructions and one 16-bit header (of which 8 bits would be unused?!?).

or, instead of having 8 custom instructions in a custom instruction table, could just have 15 normal instructions and 1 custom instruction instruction, which means 'the next instruction, ignore the opcode and use this custom instruction, instead'. And instead of ignoring the opcode, could concatenate it with the 3 operands for the custom instruction instruction, to yield a 16-bit address for the custom instruction's code (e.g. that's 64k custom instructions!). Or, we could use only 1 operand plus the next instruction's opcode (8 bits) to specify the custom instruction, and use the other 2 operands in the custom instruction instruction to double the length of the input operands (concatenate to them) in the next instruction. Otoh if we only get to have 256 custom instructions to choose from then i guess we must have a memory-mapped custom instruction jump table anyway. Btw notice the general pattern here; we're really just proposing that one opcode means 'This instruction takes up two instruction words'. Which is variable-length instruction coding.

note: how long should the opcode field be? If we stick to the magic numbers 1,2,4,16,65536,.., then 4 bits seems too few (only 16 opcodes, less than e.g. the Lua VM instruction set), and 16 seems too many (65536 opcodes). a looser constraint would be just that the width of these atomic fields (composite field widths are sums of the atomic field widths within them) should be a power of 2. So, 2 bits, 4 bits, 8 bits, 16 bits, etc. So the numbers of allowed opcodes are then 2, 4, 16, 256, 65536, etc. So we've gained 256, which is closer to the sweet spot than either 16 or 65536. So use that. So 8-bit opcodes.

this gives one explanation of how the byte became so standard. nibbles can only express 16 values, which is good for small categorizations of things and basic building blocks (minimal basis sets), but not enough to be a set of practical standard building blocks for practical languages, and not enough to be the number of characters in text, which, after punctuation is added in, greatly exceeds 16 (although an artificial simple language with only 16 letters could be envisioned).

The next value of 2^(2^x) after 16 is 256. The next value after that is 65536, which is larger than the number of building blocks you'd want for any language, and is of similar magnitude as an estimate for the average vocabulary of an adult natural language speaker (around 40,000, according to Susie Dent: http://www.lingholic.com/how-many-words-do-i-need-to-know-the-955-rule-in-language-learning-part-2/) (although note that the number of words in dictionaries of natural languages is an order of magnitude greater than this: http://www.lingholic.com/how-many-words-do-i-need-to-know-the-955-rule-in-language-learning-part-2/ ). So from the point of view of an artificial language designer trying to provide only basic building blocks, 65536 is approximately infinity, in the same way that in poetic Chinese sometimes "10,000" was used as a synonym for an arbitarily large number.

So 2^(2^2) is too small, and 2^(2^4) is too large, so we use 2^(2^3), 256 e.g. 8 bits. 256 is enough space to embed multiple 16-value 'minimal basis sets' as well as a bit of each of 'convenience', platform-optimized composite primitives, punctuation, and other cruft.

---

" Haters gonna hate but I kinda like x86-16 althought I like some RISC architectures more. I think that x86-32 and x86-64 are horrible mess. My dream CPU would have following things: -RISC architecture -Flat memory model with logical to real address translation being realAddr=addrBase+logicalAddr and if addrBase+logicalAddr would be more than addrTop it would generate interrupt

addrBase and addrTop are changeable in supervisor mode where logical to real address translation would be realAddr=logicalAddr -Two protection modes (user, supervisor) -Supervisor interrupt (interrupt 0 in interrupt vector) would be run in super visor mode, everything else in user mode -Memory mapped IO

_________________ Using 700MHz Pentium III machine with 64MB of RAM because I feel like it. ed implementation in C: main(a){for(;;;){read(0,&a,1);if(a=='\n')write(1,"?\n",2);}} "

---

EMIT (event), SUBSCRIBE (to stream of events), (and do we need to filter streams separately, or does FILTER do that already?) incidentally, BIND, MAP, FILTER, FILTERMAPFILTER, FOLD

---

in the section starting with 'looks like consumer-priced massively parallel computers are still not available' i give an (unconvincing) argument for 16-bit operands.

so if we have 3 16-bit operands, we have 48 bits, and only 16 bits left until we hit 64 bits. Since we're close, may as well try to stay within 64 bits.

have 8 bits for the opcode, and 2 bits per operand for addressing. Now we only have 2 bits left. Leave one bit for an alternate instruction format/reserved special instructions/etc. So we have only one bit.

note that in this scheme, we dont have room for 4 bits of type info or type size info unless we burn that last bit and reduce the opcode to 5 bits, which is too few (even Lua has more than 32 opcodes), or unless we burn even the reserved bit and have 6-bit opcodes, which is just enough in all dimensions; and we have no custom addressing modes.

otoh if we fix bit width at 16, we don't need 4 bits for types. The CLR (.NET) has 15 primitive types: bool, byte, sbyte, char, decimal, double, float, int, uint, long, ulong, object, short, ushort, string. Omitting size and sign differences, that's: bool, char, decimal, float, int, object, string. So that's 3 bits.

---

Etherium has a good VM language:

A list 56 (as of this writing) opcodes is at https://github.com/ethereum/wiki/wiki/Ethereum-Development-Tutorial (section 'Virtual machine opcodes') ( http://web.archive.org/web/20140820233358/https://github.com/ethereum/wiki/wiki/Ethereum-Development-Tutorial ).

More info:

https://github.com/ethereum/wiki/wiki/Subtleties

also, they use 256-bit width! https://github.com/ethereum/wiki/wiki/Subtleties

i guess they wanted to use a fixed-width number representation but never ever really have to worry about overflow! but 2^64 is already bigger than most things you want (more than a quintillion), but i guess IPv6 is 128-bit so maybe they went one up from that (i almost did that; and maybe i still will). or maybe they chose that for some cryptographic reason that i don't know about (later: yeah, i think so; i think they use 256-bit hashs). also, 256 = ((22)2)2 but i don't see why that would be useful (it's 2(2(22)) for the sequence which says that each quantity can be addressed by a round number of bits in the previous quantity, but that's 65536)

yarg i want to include types because it makes a clean, regular language, even though it is an inefficient waste of instruction length space (since many opcodes don't make sense with many types, it is more efficient to conflate the opcode with the type; similarly to have highly non-uniform instruction encoding), and even if we push fixed-width instruction length over the edge from a reasonable 64-bits to an unreasonable larger length.

also, note that 16 bits addresses 64k, but if the unit of addressing is a bit rather than a byte, 64k bits is 8k bytes (so 1 bit addresses 2 bits, 2 bits address 4 bits, 4 bits address 16 bits, 16 bits address 8k bytes... or 64k 16-bit words, which is 128k bytes).

which adds more weight to 8k as the amount of local memory..

hmm... a similar progression.. 1 bit addresses 2 bits; 2 bits address 4 2-bit words, or 8 bits.. 8 bits address 256 8-bit words, or 2k bits..

the 1,2,4,16 sequence has the additional charm that you could use 2 bits to select which element of this sequence you're in.. and by its nature, you can use the second-to-last-element (4) to select any bit length from 0 to 15. and of course one could use 1 bit to select between bitwise or wordwise addressing (1 bit or 16 bits). and, as usual, since 2 bits seems not so useful, one could use 4 bits to select between 1,4,16,sentinel.

so let's go back to considering:

16 bit width words
types (16 types? or do we not need so many when we have variable bit lengths)
variable bit lengths (1,4,16, or other? or 16 powers of two, 1,2,4,8,16,32,64,128,256,512, 1024,2048,4096,8192,16384,32768 -- that seems excessive)
it's okay to have instruction length >64 bits

So, an instruction has 48 bits for 3 16-bit fields.. 4 bits for types, >=2 bits for variable bit lengths, ?>=5 bits for opcodes, ?>=2*3 bits for addressing modes, that's at least 65 bits, and what else?

what a shame, so close to 64 bits though! maybe just go back to a darn 2-operand register machine.

now we have 8 bits for 2x4 bit operands, 4 bits for types, >=2 bits for variable bit lengths, >=5 bits for opcodes, >=2*2 bits for addressing modes, that's only 23 bits so far. for LOAD/STORE instructions, we need 20 bits of operands though, hmm.

how about: just ditch the variable bit lengths. now we can have:

3x16 3 16-bit operands
2x3 addressing mode bits
6 bit opcodes
4 bit types

yarg it just seems so constraining to have 6 bit opcodes, when you are wasting 4 bits on types

and having registers isn't THAT complicated; if you dont want to optimize, just translate each

C = A + B to R1 = A R2 = B R3 = R1 + R2 C = R3

i suppose one could generalize the 4 bit "type" argument to just a 4-bit immediate argument to the opcode. Which means that it's not much different from having a 10-bit opcode, with a "major" opcode of 6 bits and a "minor" opcode of 4 bits. The 'minor' could be used in regular ways, for example, instead of having AND, OR, NOT, NAND, XOR, etc, you could just have 'bitwise binary operation', of which there are 16 possible ones (4 possible combinations of 2 input bits, and for each such combination, choose what the output will be; that's 2^4 = 16 possible binary ops).

how could we make that actually different from just 4 more opcode bits? we could use indirect addressing, i guess, so that the 4 bit argument is loaded from a register rather than immediate. but we don't have room for 2 more addressing mode bits, so we'd just have to pick one mode (indirect). Which seems silly, b/c usually you'd want immediate for this.

These 4 bits will get used a lot, though. E.g. as a third indexing operand for LEA. Or as a selection of any of the first 16 stack elements to pop; as a selection between comparator operations (LT, LTE, GT, GTE, EQ, NEQ, what else?). If nothing else, this is a nice convention that forces some structure on the opcodes.

(hmm i wonder if addition etc could be represented in a similar way; perhaps find the addition/multiplication/negation ops that correspond to each binary op, e.g. AND is MUL, OR is ADD, NOR is NEG(ADD), NAND is NEG(MUL), NOT is NEG, XOR is MUL(ADD, NEG(MUL)))

(but it would be nice to get division in there too; so mb make up our own scheme; ADD, SUB, MUL, DIV, NEG, NEG(ADD), NEG(SUB), NEG(MUL), NEG(DIV), EXP, LN, NEG(EXP), NEG(LN)... eh these are just a random list of things, nevermind, and anyways we can't fit the negations of each input)

i guess, in justifying not using a register machine, and having types even despite inefficiency, what i am looking for is an assembly language that is expressive in the sense that it reads like the intent of the operation, rather than half the code being some weird intermediate thing about pushing registers around; the latter is dumb because how that stuff should be in order to be optimal is platform-dependent, and so in order to optimize it for a platform with a different architecture you're going to have to try to recover the intent, anyway.

It makes me wonder, though, if we should just give up on fixed-length and have something like Forth that allows functions to take more than 2 arguments. The argument for fixed length is that instruction decoding can efficiently happen in parallel, but this can happen somewhat efficiently (though not nearly as much) anyways with the variable-length format with terminator sentinels; and the fixed-length decoding sounds dangerously close to a platform-specific consideration.

anyway, i want to reiterate the point that i guess a design criteria for Oot Assembly is that it not lose the intent of the Oot program.

What's the point of Oot Assembly anyways? I guess there are four motivations:

1) To learn the basics of computation by looking at each language paradigm, here, the assembly paradigm (which, note, is very imperative) 2) To learn the 'flavor' of assembly 3) To work 'from the bottom up' (from assembly) to create Oot Core, in addition to 'from the top down' (from Oot) 4) To create a portable target language, which could be efficiently run in a VM/interpreter if necessary 4a) minimal primitives, to ease implementation on new platforms 4b) minimal loss of higher-level intent, to ease optimization 4c) cross-platform preprocessing and optimization should have been done already, e.g. parsing, desugaring 4d) code density, and also predictable starting places for the next instruction, both help efficiency

The fixed-length instruction constraint is useful for (2) and somewhat for (4), and is irrelevant for (1), but is probably not necessary for (3). So maybe i should give up on it, as i've already learned a lot about (2), and it's most likely that there won't actually be a cross-platform Oot Assembly beneath Oot Core.

http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-146.pdf argues that new ISAs should support 128 bits, and https://news.ycombinator.com/item?id=8201932 argues about that.

hmm, the more i think about the 4 goals above, the more i think yeah, i should give up on the fixed-length encoding completely. We're supposed to be working towards an abstract Core language, not get bogged down in implementation details, especially when we'll probably be compiling this into some sort of native code anyways.

---

" For example, Motorola's 68300 chips have a cool TBLS (table lookup and interpolate) instruction that can replace hundreds of lines of C code. And it executes in six clock cycles. With one assembly instruction. C compliers never use it.

Hitachi's SH7708 processor has a 3D matrix transform instruction that does a whopping 16 multiplies and 12 adds at once, all with floating-point numbers. It's used for calculating the angle of reflection between a 3D light source and a 3D polygon. Obviously useful for video games. But only assembly programmers ever use it. "

...

" Is it therefore pointless for CPU companies to develop new instruction sets

Yes, that's what this means. Also recognize that most compilers use a tiny subset of the instruction set. And that the general trend is decreasing memory prices so code bloat/code density problems tend to self-correct.

Occasionally a killer instruction comes along -- like MAC -- but that's rare. "

Jim Turley About 60% of all processors sold are 8-bitters; 4-bitters and 16-bitters each make up about 15%, and 32-bit CPUs are the smallest category, at about 9%. (The portions are a bit smaller if you count DSPs.) Of the 32-bit processors, about 75% of them go into embedded systems. And these are new processors. The installed base is even more skewed toward low-end chips.

Embedded processors outsell PC processors by orders of magnitude, and always have. Remember, microprocessors were invented for embedded systems, not for computers.

This information is in my January column in Embedded Systems Programming. You can verify the data from the usual sources, such as WSTS or SIA. "

---

conclusion from ootLowEndTargets initial look at present-day OpenCL? GPGPU copied to here:

so max bit width for fixed length addressing:

8 bits for registers
16 bits for synchronizable shared local memory
8 bits for synchronizable thread number
(arbitarily large) bits for main memory
(arbitarily large) bits for number of non-synchronizable-memory-sharing tasks ("kernels")

note: this argues for:

arbitrarily large (or at least 256-bit) max bit-width for actual word size
arbitrarily large (or at least 128-bit) max bit-width for addressing main memory
arbitrarily large (or at least ??) max bit-width for addressing number of data-parallel tasks
fixed bit width (>=8) for number of registers
fixed bit width (>=8) for thread number
fixed bit width (>=16) for words in synch shared memory
could just make the last 3 all 16 bits to make everything uniform/easy (also i'm not terribly comfortable with putting a max of 256 on number of regs, number of threads, surely there will be weird emulation situations where you want more than the max 'efficient' number in hardware in your software)
i'm toying with the idea of just having the registers memory-mapped into the synch shared memory, e.g. the 16 or so lowest addresses are not really shared. Or, maybe just use the highest bit for this, e.g. 32k registers and 32k words in synch shared memory.

and min that we can rely upon:

4 registers
256 items in shared local memory
4 synchronizable threads (not sure if this is useful, since a parallel program can always be executed in serial, with 1 thread; the max is more relevant, because that's how much contention we have to worry about/the max that it's worth scaling up our synch parallel algorithms to)
256 kernels (actually 2048 is the smallest we've seen, 256 is rounding down from 2^11 to 2^8)
?? main memory (out of scope in this section; but elsewhere in this document, we've frequently seen L1 cache values of about 4k words)

in summary, we can rely on at least approximately:

4-16 registers, per processor (processor-local memory)
2048 data items, total, in shared local memory
8-way within-work-group (synchronizable access to shared local memory) parallelism (alternately; if we settle for 4-registers-per-processor, then up to 256-way; note that 16 regs-per-processor yields 64-way)
2^11-2^17 software-exposed data-parallelism

CONCLUSION/NOTE: BASED ON THESE NUMBERS, IT LOOKS LIKE A CONNECTION-MACHINE LIKE ACTIVE-DATA-STYLE PROGRAMMING COULD BE EMULATED BY PRESENT-DAY CONSUMER GPUs!! Even though there are only 2 cpus in lower-end consumer computers and ~8 gpu units, there's no need to wait for computers with 64k cpuS because, since in the paradigm we are targetting, (virtual) processors only need to synchronize with their immediate neighbors, all 64k of these don't share a synchronizable local memory, so we can emulate them using data-parallelism.

SUGGEST RE-TARGETTING OOT AT OPENCL

---

note: another argument for sticking with fixed-width is not for efficiency or to force design simplicity or to appreciate the character of assembly language design, but just for convenience of implementation. Truly arbitrarily large numbers are annoying to work with. If this is the motivation, then just have bit fields that are ridiculously big; this is way inefficient w/r/t code density and even interpretation speed (needlessly passing too much data), and does not truly support arbitarily large systems, but it is simple to implement. Also, make it so that, via indirect addressing into word locations with arbitarily large words, the system can seemlessly use a larger-width value for anything (up to the word size, which will not be set in the Oot assembly spec).

e.g.:

arbitary (and variable) word size
arbitary memory size? (or, 16-bit memory addressing in constants, but can use indirect addressing to go larger)
16-bit register (or rather, fast/cache/processor-local memory) addressing
- but each processor could of course have attached local memory of arbitrary size in addition to its 'fast' memory
16-bit shared local memory word addressing
16-bit task number constants

if word size is variable and needs to be specified per-instruction, and if we only consider power-of-2, in bit-width, word sizes, then we need at least 8 bits to choose between all sizes up to 256 bits, so i guess we could use 16 bits for that too, although one could argue that by omitting 2 bits and 4 bits, you could go up to 1024 bits which would allow us to be sufficient with 8 bits. But maybe, from the point of view of implementation simplicity although not efficiency, it's nice to have everything be 16 bits :)

---

for arbitrary-precision arithmetic:

ADCX — Unsigned Integer Addition of Two Operands with Carry Flag

Performs an unsigned addition of the destination operan d (first operand), the source operand (second operand) and the carry-flag (CF) and stores the result in the de stination operand. The destination operand is a general- purpose register, whereas the source operand can be a general-purpose register or memory location. The state of CF can represent a carry from a previous addition. The inst ruction sets the CF flag with the carry generated by the unsigned addition of the operands

The ADCX instruction is executed in the context of multi-pr ecision addition, where we add a series of operands with a carry-chain

ADOX — Unsigned Integer Addition of Two Operands with Overflow Flag

Performs an unsigned addition of the destination operan d (first operand), the source operand (second operand) and the overflow-flag (OF) and stores the result in the destination operand. The destination operand is a general- purpose register, whereas the source operand can be a gene ral-purpose register or memory location. The state of OF represents a carry from a previous addition. The instru ction sets the OF flag with the carry generated by the unsigned addition of the operands

The ADOX instruction is executed in the context of mult i-precision addition, where we add a series of operands with a carry-chain. At the beginning of a chain of addition s, we execute an instruction to zero the OF (e.g. XOR).

MULX — Unsigned Multiply Without Affecting Flags

Performs an unsigned multiplication of the implicit sour ce operand (EDX/RDX) and the specified source operand (the third operand) and stores the low half of the result in the second destination (second operand), the high half of the result in the first destination operand (first operan d), without reading or writing the arithmetic flags. This enables efficient programming where the software can inte rleave add with carry oper ations and multiplications. If the first and second operand are identical, it will contain the high half of the multiplication result

hmm... one the one hand there's the idea that a 'memory location' is just a numbered variable that could hold an arbitrarily-large-sized representation of a composite data type such as a struct... on the other hand, if Oot Assembly is to be able to manipulate data structures passed in and out via FFI, then it needs to be able to work with memory as a linear map of bits which can have within it fields of size smaller than the system word length, e.g. single byte fields on systems where the word length is 16-bit. But we can have both of these by passing the entire latter linear map as a struct within a single variable, and then considering manipulations within the linear map as just a special case of operating on an array.

there are all these different kinds of ADDs (eg what do you do with the carry? with the overflow?) and all these different number representations (e.g. various signed integer representations) and different bit widths. Do we represent these all explicitly and ortho? But then each implementation must implement every combo, a chore. Do we pick our own favorite set of them? But then there's an impedence mismatch on systems which support a different set, and also we can't take advantage of optimized primitives for the different set. Do we support only abstract operations such as arbitrary-precision add? Maybe, but then how do we do bitfield manipulation? Literally by only considering each bit an an element in a bit array? Can we at least ask to see that array in a different view, which shows us an array of bytes?

i guess i'd prefer to implement more abstract primitives such as arbitrary precision addition in the core language, and then build the weirder hardware primitives such as add with carry and/or overflow (and/or modular addition) as std library functions that are implemented in terms of arbitrary precision addition; then implementations can override the implementation of those with platform-specific primitives.