in Oot, do we want a base register to make addresses relative?
probably not; that's probably assumed/encapsulated away
---
ppl say that jsr/ret made the JVM verifier much more difficult to write. See [1], section The issue with JSR/RET. Why?
Maybe because ret jumps to an address in a register rather than to a fixed address?
Is jsr/ret the only computed goto in jvm?:
---
timeout bit
---
timeout bit
copied from [2]:
so what is boxing doing now? a lot:
we should also add:
in other words 'boxing' seems to be the default for our inefficienct, high-level, safe, dynamic, lazy, symlink-having language; and 'unboxing' is like a special mode that you can use to do more CTM stuff for tight loops, etc.
so still not clear on:
if we had more flags, we could also have one for how to deal with failure, eg whether to have (a maybe-null, or a straight nullable type), or to raise an exception
---
heck i dont wanna decide which flag bits are important, and decide between 8bit, 10bit, 12bit operands. unlike kant, i'm not confident that i'll think of all the modes i want forever. also it seems like a waste of space to have either large operands or many modes, but i dont know which one b/c that's empirical.
ok so maybe back to the variable-length, natural-languagey model.
design requirements: a variable length encoding that:
so accept variable length instruction 'sentences', where each 'instruction word' is fixed length(?) (and possibly has a marker to say if its 'ongoing' or not)
perhaps 'quoted' should also be part of the marker?
'referent' marker? eg 'this modifier modifies the word tagged 011'?
and/or 16 fixed part-of-speeches? eg destination, destination modifier, operand1, operand1 modifier, instruction, etc
sentence terminator bit, word terminator bit, clause terminator bit?
but if you allow conjunctions, then its already an AST
heck mb go that with anyway
---
yeah i think this sort of thing is the way to go.
---
two kinds of AND; simply parallelizing, and building a big noun (which i'll call 'constructive'). this is like whether to autoveectorize over a list, or to treat the list as a list
---
do we still need an 'oot bytecode' below oot core? probably (for things like COW). is there a need for an 'oot core+' which is like oot core, but more colloquial, and which is what the meetaprogramming matches upon (eg macros would run on oot, or oot core+, or on oot core)
---
kant's table of judgements... but want to generalize that sort of thing
'universal' (perhaps i should say 'generic', because i bet some natural languages dont have all of theese, anyway im constructing a programming language not a natural one languagee features): which-language (is this imperative programming, logic programming, etc?) POS-in-sentence (subject, verb, object.. or, opcode, destination, input1, input2) mode (and tense, etc) words, phrases, sentences (what abouut morphemes, etc?) base word of a phrase, and modifiers (eg adjectives in a noun phrasee vs the noun; perhaps addressing mode vs operand? how is mode distinct from this?) arguments (in bot nat. language and prog. lang) compound phrases, and conjunctions (common ones: and, or, xor) and parens lattice of truth values; not quoting annotation
again, should review natural language declensions and conjugations...
---
what can we do with 4 grammar bits?
sentence terminator, phrase terminator, 2 POS bits, allowing for 4 sentence components, eg, whole sentence, subject, verb, object. but some sentences have 5 components: he gave the apple to johnny in the house so need at least 3 bits for component choice (note: in that examplee, the componeets are like destination, operand, argument2, argument1, mode modifiers)
could impose an ordering rule that components must go in order, or at least, that 0 goes first (or last). this allows us to split sentences at the beginning of phrase 0. now we dont need the sentence teermination bit (at the cost of having to possibly read past the end of the sentence to know that the sentence has ended, if the rule is that 0 goes first). so we neeed 4 bits so far.
could also say that component 'base' goes eeither first or last in phrase; now we dont need to both identify the base, and also to identify phrase termination.
still need to distinguish modifiers from base and addressing modes.
so can do everything with 16 pos: component 0 component 1 base c2b c3b c4b c0addr c1addr c2addr c3addr c0mod c1mod c2mod c3mod annotation quote (more accurately, 'quoteed' or inside a quote) parens or other grammatical symbol
this doesnt quite suffice if component 0 can contain multiple preepositional phrases the preposition in prepositional phrases is like a verb; the rest is a nounphrase-like argument
how about: each component (recursively) has a base, an addr mode, up to 3 arguments, and modes/modifier subcomponents. that's too many, want at most 5. so combine base+addr mode. addr modes are subcomponents.
conjunctions are subcomponents (eg an AND would have base 'AND', a mode/modifier saying whether it's parallel or constructive, and 2 arguments)
---
so we say the base component must come first.
---
it take 3 bits to represent 5, so dont know what to do with the otheer 3 alternatives. One of them can be 'annotation'. Parens could be forms of annotation. So we have two left; could just waste them on parens, including a relative jump to allow a reader to quickly find the matching parens. So that's 3 bits. If we can do with no more than 1 more overhead bit we'll have a nice round 4 bits.
wait dont parens need to say which component they're part of? so we can't have them be their own components
is the last bit some kind of terminator? word, phrase, sentence?
---
mb instead of component i should say 'grammatical role'? grol for short? i think "phrasal category" is the linguistics term [3], but that's too long and the word 'category' has too many other meanings here.
---
so to summarize the current natural language proposal outlined here:
a variable length encoding that:
we have 'syllables' (or should we say grapheme? phoneme?)s which combine into 'words'; 'words' which combine into 'phrases'; 'phrases' combine into 'clauses'; and 'clauses' combine into 'sentences'
in our system, a 'clause' is ('is like'? or just 'is'?) a recursive, embedded, parethesized sentence. It is not a 'sentence' only because it is not toplevel. Semantically, due to quoting and unquoting and assigning first-class-functions to variables, etc, both clauses and non-clauses are similar; however, the language won't DO anything yet if it has just read a clause, because the context that clauses is embedded into might quote it, make it hypothetical, make it conditional, etc.
in our system, 'phrases' can be single-word or multi-word.
a 'phrase' might be an expression; e.g. in "x = 3 + (2*(*y))", "2*(*y)" is a "phrase"
each (word?) has a 4-bit 'morpheme' that give syntactic 'overhead'. 3 bits of this tells what its grammatical role is within its (phrase? clause?):
what we do with the other bit isn't decided yet
the rule is that role 0 comes first in any phrase. so if you see a non-role 0 and then a role-0, that's how you know you crossed a phrase boundary this isnt good enough because two adjacent phrases might both have only role 0s; so we still need a phrase delimiter; so mb use that bit for that
open and close parens contain the relative address of the matching parens
should we also mark when something is part of a quotation, to make it ez to skip without parsing?
should we also mark, for something which is acting like a 'pronoun', where the referent is, and for a modifier, where the modify-ee is?
two kinds of AND; simply parallelizing, and building a big noun (which i'll call 'constructive'). this is like whether to autoveectorize over a list, or to treat the list as a list
phrases can have subphrases whose grammatical role within the parent phrase is the same 5 things
eg
I gave the ball to the person who had given it to me:
"the person who had given it to me" is the indirect object, and in our system would be a subphrase, with argument1 'the person' and modifier 'who had given it to me'; 'who' here would just be an artifact of English, the modifier is that constraint tha 'person' 'had given it to me'.
similarly 'Billy's dad' is a subphrase with argument1 'dad' and (modifier?) Billy's
the 'modifiers' in the above might be 'base' in our system, eg like VERBS in natural language (contrary to the way linguistics treats them)
conjunctions are also subphrases; the 'base' is the type of conjunction, argument2 and argument3 are the two things being conjoined, the modifier specifies eg if this is a parallel or constituitive conjunction. eg "alpha and beta" has base AND, argument2='alpha', argument3='beta', and there is a mode too
modes are assumed to be defaults if not specified
---
rather than tagging the roles, could just say that the 4 bits say:
word ends phrase ends sentence ends (what's the 4th one? something to indicate when we skip a phrase, eg for a sentence with only phrase 0 and phrase 4? how would we do that?)
---
to cut down on subphrase overhead, we could make addr modes mandatory parts of words
---
one issue is that you can have multiple modifier phrases under a phrase, so you have to allow multiple phrase 4s
args 2 and 3 are symmetric, so if can have multiple, then why not just combine them, now just 4 choices = 2 bits for grammatical roles (just call these "parts of speech", POS, for short)
so now we're only 1 bit over our supposed budget of 4 bits (eow (end of word) + eop (end of phrase) + eos (end of sentence) + POS tag (2 bits)) = 5
we could eliminate the 'boxed' bit from the addr mode; so that now we would have 3 addr mode bits and 5 grammar bits. If words started at 16 bits long, then 8 bits would be left for the operand.
---
but do multi-word phrases even make sense if we can't markup each word in that phrase with its role in WITHIN THAT PHRASE?
hmm.. how about: if the "new phrase" bit is set, then role of this word within the clause is role 0 (base), and the inflection (2-bit of grammatical role) refers to the role of that phrase within the clause. Otherwise, the inflection refers to the role of that word within the phrase. but actually we want 'end-of-phrase' not 'beginning-of-phrase', right? so put 'base' at the end, then.
we say that 'syllables' (which may or may not correspond to 'words' on the underlying platform; our concept of 'word' is somewhat higher-level) are 16 bits.
so, now i think we have a complete system.
similarly to the fixed system, we can fit 4 fields into 64 bits, except we lost the 'boxed' bit, and we only have 8 operand bits instead of 12:
first 16-bit syllable:
second 16-bit syllable:
third 16-bit syllable:
fourth 16-bit syllable:
(note that there are two role 2s here, because there are two input operands)
note how easy it would be to have longer than 8-bit operands; just unset the EOW bit (in which case 11 of the 16 bits of the additional word are added to the operand, in big endian fashion; long operands still have the other 5 grammar field bits (but should they? we only really need the EOW one..)). So we can easily have a 19-bit operand, a 30-bit operand, etc.
if we need mode/modifier bits, either on the operand or the sentence level, we simply add some role 3s. (do role 3s have addr mode bits? i guess so, just for uniformity). we'll get 8 mode bits. If we need more, just unset the EOW.
---
so now what do annotations look like?
---
we need pronouns and conjunctions, too. i feel like a pronoun is like an addr mode. or do we need them? why not just load a referent into a register? that accomplishes the same thing.
---
'ootby' (oot assembly/oot bytecode) has a positional input argument syntax, is that ok? in that case we need a 'argument not provideed yet' sentinel (of 3 forms ?: the one that creates a function expecting that argument, and the one that assumes a default if there is one, and ow also expects that arg; and one that creates a partial fn that wont execute even if 0-ary)
---
how to give modes of more than 1bit?
should we provide a keyword arg option? (for role 2)
---
parens (subordinate clauses) might not be necessary, bc they can be replaced by previous sentences that load into registers
eg 'The bad man bit the dog':
r1 = 'the bad man' r1 bit the dog
---
even if recursive expressions with pronouns are not allowed, we at least want scoped regs, like haskell's let... in ..., so that we can tell when a temporary register that was just created for one expression is done. For this, however, we could just use an 'UNDEF' instr. i guess it wouldn't hurt to have type declaration annotations at variable initialization too...
---
maybe the concept of 'paragraph', paragraph-local vars
is this part of a general pattern of optimizing non-recursion, similar to link registers which optimize for only one level of subroutine? no, because in fact a parargaph can contain recursive calls, so this is a diffeerent concept
on the topic of link registers, do we want a similar req on fast fns? no b/c then ordinary partially-applied functions can't be used as fast functions.
---
'base' is a bad name, b/c in the case of 'noun phrases' (arguments), it makes you think of the operand, not the addressing mode, but it should be the addressing mode. mb it should be something like 'verb'.
addr mode is verb, not operand. but here we have addr mode/operand pairs, so that's ok. mb call it 'operation'? hmm.. really in general it's not necessarily active, it's just the type of meaning we're assigning to the rest. interpretation? too many other meanings. extensible grammar determinant? too long. root? naw, too man other meanings. 'base' doesnt sound do bad now. but 'operation' is probably the best one so far. maybe 'connective', as in logical connective? too abstract-sounding
so, 'op'.
note that there are 'main ops', which are the ops for top-level role 0, and then there are 'subops', which are ops for role 0 in other top-level roles.
note also that:
---
useful assembler directives:
ORG $400: what is to follow should be located at memory location $400 [http://alanclements.org/68k.html] DC.B label 12: declare a byte of memory, label its address 'label', and initialize it with 12 [http://alanclements.org/68k.html] END $400: "End of program and entry point" [http://alanclements.org/68k.html]
possibly useful instructions: STOP #$2700
"The 68K has an explicit halt instruction called STOP that usually takes the parameter $2700 (this parameter is loaded into the processor status word)." -- [4]
---
let's see again how the semwebish stuff looks in this format:
old fashioned AST: a = 2 + 3: assign(a, +(2,3)) us: +,a,(2,3) the boy is happy: subject: boy, verb:is, object: happy: us: is,boy,happy or: the boy is happy: subject: boy, edge:emotionalState, object: happy: us: emotionalState,boy,happy the boy is happy today: us: same as above, but with additional role3 with 'today' bob knows fred: triple: subject: bob, predicate: knows, object: fred us: knows, bob, fred a graph edge: a--label1-->b: us: label1, a, b or maybe: 'edge', label1, (a,b) hypergraph edge with roles: src1: A, src2: B, dest1: C, dest2: D, label1: E ?us?: 'edge', A, {src2: B, dest1: C, dest2: D, label1: E} ?us?: 'edge', , {src1: A, src2: B, dest1: C, dest2: D, label1: E}
and remember the hidden fifth field; the primary key (id) (address) of this instruction
or, in the case of graphs, we could stuff edge ids into the structure itself:
hypergraph edge with roles: src1: A, src2: B, dest1: C, dest2: D, label1: E ?us?: 'edge', A, {src2: B, dest1: C, dest2: D, label1: E} ?us?: 'edge', id-of-this-edge, {src1: A, src2: B, dest1: C, dest2: D, label1: E}
NOTE THAT HERE WE FIND OURSELVES NEEDING A DICT INSIDE THE EDGE PRIMITIVE
could get around this by putting the dict's keys (the roles) into role 1 (i forgot to switch roles 1 and 2, so currently role1 is still 'dest')
or, could represent hyperedge as a collection of edges:
'hyperedge', id-of-this-hyperedge, (subedge-id-1, subedge-id-2, subedge-id-3, subedge-id-4, subedge-id-5) 'subedge', subedge-id-1, ('src1', A) 'subedge', subedge-id-2, ('src2', B) 'subedge', subedge-id-3, ('dest1', C) 'subedge', subedge-id-4, ('dest2', D) 'subedge', subedge-id-5, ('label1', E)
or if we assume the ids are implicit:
'hyperedge', ?, (subedge-id-1, subedge-id-2, subedge-id-3, subedge-id-4, subedge-id-5) 'subedge', parent-hyperedge-id, ('src1', A) 'subedge', parent-hyperedge-id, ('src2', B) 'subedge', parent-hyperedge-id, ('dest1', C) 'subedge', parent-hyperedge-id, ('dest2', D) 'subedge', parent-hyperedge-id, ('label1', E)
or mb better:
'begin-hyperedge-annotation', ?, (subedge-id-1, subedge-id-2, subedge-id-3, subedge-id-4, subedge-id-5) 'subedge', 'src1', A 'subedge', 'src2', B 'subedge', 'dest1', C 'subedge', 'dest2', D 'subedge', 'label1', E 'end-hyperedge-annotation'
or:
'begin-hyperedge-annotation', ?, (subedge-id-1, subedge-id-2, subedge-id-3, subedge-id-4, subedge-id-5) 'src1-role', parent-hyperedge-id, A 'src2-role', parent-hyperedge-id, B 'dest1-role', parent-hyperedge-id, C 'dest2-role', parent-hyperedge-id, D 'label1-role, parent-hyperedge-id', E 'end-hyperedge-annotation'
can we merge the idea of a 'role' here and our grammatical 'role'? sure, the latter is a special case of the former, if we consider our instructions to be hyperedges with 5 edges (roles 0-3, plus an identifier for the edge).
... out of the above, i think i like the one where in the subedges of the hyperedge, dest=hyperedgeid, and both the role and the destination node were 'input arguments' (role 2)
---
maybe, just as instructions are arranged linearly, we should allow graph edges to stand in a partial order with each other?
not sure if that's a good idea though...
---
"For example, Motorola's 68300 chips have a cool TBLS (table lookup and interpolate) instruction that can replace hundreds of lines of C code. And it executes in six clock cycles. With one assembly instruction. C compliers never use it.
Hitachi's SH7708 processor has a 3D matrix transform instruction that does a whopping 16 multiplies and 12 adds at once, all with floating-point numbers. It's used for calculating the angle of reflection between a 3D light source and a 3D polygon. Obviously useful for video games. But only assembly programmers ever use it.
...
Occasionally a killer instruction comes along -- like MAC -- but that's rare. " -- http://www.embedded.com/electronics-blogs/shop-talk/4024514/Do-processors-and-instruction-sets-matter-
---
instead of 'operation' or 'op', one could say 'construct'. Nameclashes with 'cons' and 'construction' though.
---
role 1 is also 'output params', or lhs(s), and role 2 is also 'input params' or rhs(s)
---
y'know, since we have constants, we don't need address 0 to be a zero register. But we still want a way for an instruction to indicate that it discards its output. We could just omit the output phrase in that case, but i'm guessing we want to reserve that for implicit stack inputs/outputs. So how about we make 0 the PC, and say you can't write directly to the PC, and that any attempted write to the PC means 'discard'?
This seems to fit b/c for the same reason we want to allow an instruction to specify discard (to hint to the compiler/interpreter that the result value is unused), we want to force jumps to be via special instructions, instead of just arithmetic instructions output into the PC.
---
i like how the aesthetic/goals of Oot Assembly/Oot Bytecode are coming together. In the last section, we justified two things (having a way to explicitly state that an output value is 'discard'ed, and forcing jumps to use specific instructions) by expressiveness. Also, the variable length encoding is justified by expressiveness too, as is the use of memory-memory.
the variable length expressiveness also shows that we don't really value performance that much.
Although this is still a linearization of the program, not an AST.
---
need some sort of annotation to say "the next bunch of things are compile-time hints, if you're interpreting this, skip over X instructions" (where X is eg 16)
eg when we reach the end of a scope we may have a bunch of UNDEFs. An interpreter might choose to just skip all these at runtime, especially if it did a scope analysis upon initial load (load time) already.
---
actually since instructions are variable length, shouldn't jumps be specified in terms of actual numbers of bytes in the bytecode, rather than numbers of instructions?
what does the JVM do? (looked it up) yeah i think goto and jsr deal in byte offsets, not instruction offsets
---
interesting idea:
"and a 1-bit SCC field determines whether the condition code is to be updated" -- http://alanclements.org/rischistory.html
i don't think we have the bits for it right now though (without using the extended form). unless this is a bit taken out of the opcode, which it might be (error handling mode)
---
wow, this manual seem like it would be too detailed and irrelevant but i find it inspiring by illuminating the sort of 'intent' that needs to be preserved long enough for low-level optimizers to have a crack at it, as well for illuminating how some of the issues i've wrestled with are dealt with in the recent x86 ISA implementations:
see ootIntelOptimizeManualNotes1.txt
---
the cmov instruction on x86, conditional move, is different from a branch in that BOTH branches are speculatively executed.
so this is like my idea for a neural-net like concurrency where sometime both branches are executed in parallel by different sets of neurons, and then sometimes an actual branch is made.
---
Intel User/Source Coding Rule 17 suggests using the PAUSE instruction inside fast spin locks. This suggests that in Oot Core, either we should have PAUSE or a fast spin lock or something higher than that should be a primitive (i think that something higher than a spin lock should be a primitive).
---
Intel Optimization Manual 3.4.2.1 has some details on micro-ops: "Stores execute internally as two separate micro-ops: store-address and store-data."
there are a few other details on specific micro-ops in this document, but i didn't see a good overview. Maybe there is a table somewhere that says how many microops per instruction.
---
" 3.4.2.3 lea xor xor loop: add sub jnz rdx, buff - 4 rcx, LEN eax, eax eax, [rdx + 4 * rcx] rcx, 1 loop Length-Changing Prefixes (LCP) The length of an instruction can be up to 15 bytes in length. Some prefixes can dynamically change the length of an instruction that the decoder must recognize. Typically, the pre-decode unit will estimate the length of an instruction in the byte stream assuming the absence of LCP. When the predecoder encoun- ters an LCP in the fetch line, it must use a slower length decoding algorithm. With the slower length decoding algorithm, the predecoder decodes the fetch in 6 cycles, instead of the usual 1 cycle. Normal queuing throughout of the machine pipeline generally cannot hide LCP penalties. The prefixes that can dynamically change the length of a instruction include: • • operand size prefix (0x66) address size prefix (0x67) "
---
http://stackoverflow.com/questions/18113995/performance-optimisations-of-x86-64-assembly-alignment-and-branch-prediction says that Intel used to support 'branch hints', presumably that means hints about which branch is more probable. This could be a good idea in Oot, particularly because it fits in with my observation from speech recognition that prior probabilities are key and therefore that UIs should be told how probable the program thinks various UI actions are in this state (to enable UIs such voice recognition, swype, etc to work better). But specifying any more than which branch in a switch table is most probable is probably too many bits for a fixed encoding, and in addition this is only a hint, so this should be in annotations.
---
intel's rule for 'hinting' without explicitly hinting which branch is most probable:
Assembly/Compiler Coding Rule 3. (M impact, H generality) Arrange code to be consistent with the static branch prediction algorithm: make the fall-through code following a conditional branch be the likely target for a branch with a forward target, and make the fall-through code following a conditional branch be the unlikely target for a branch with a backward target.
---
i was wondering how Intel squares variable-length instructions with alignment and efficient JMPing. It doesn't:
intel says:
Assembly/Compiler Coding Rule 12. (M impact, H generality) All branch targets should be 16-byte aligned.
---
" You might not have expected this, but in 64-bit assembly for the AMD64, pointers to code and data whose addresses are within the executable are still only 32-bits. This ties in with the fact that RIP-relative addressing limits the size of the executable to 2GB. Pointers to external addresses, such as functions in Dlls, are 64-bit wide so that the function can be anywhere in memory see call address sizes.
...
RIP-Relative addressingtop Some instructions in the AMD64 processor which address data or code, use RIP-Relative addressing to do so. The relative address is contained in a dword which is part of the instruction. When using this type of addressing, the processor adds three values: (a) the contents of the dword containing the relative address (b) the length of the instruction and (c) the value of RIP (the current instruction pointer) at the beginning of the instruction. The resulting value is then regarded as the absolute address of the data and code to be addressed by the instruction. Since the relative address can be a negative value, it is possible to address data or code earlier in the image from RIP as well as later. The range is roughly ±2GB, depending on the instruction size. Since relative addressing cannot address outside this range, this is the practical size limit of 64-bit images.
Data alignment All data must be aligned on a "natural boundary". So a byte can be byte-aligned, a word should be 2-byte aligned, a dword should be 4-byte aligned, and a qword should be 8-byte aligned. A tword should also be qword aligned. GoAsm? deals with alignment automatically for you when you declare local data (within a FRAME or USEDATA area). But you will need to organise your own data declarations to ensure that the data is properly aligned. The easiest way to do this is to declare all qwords first, then all dwords, then all words and finally all bytes. Twords (being 10 bytes) would put out the alignment for later declarations, so you could declare all those first and then put the data back into alignment ready for the qwords by using ALIGN 8.
As for strings, in accordance with the above rules, Unicode strings must be 2-byte aligned, whereas ANSI strings can be byte aligned.
When structures are used they need to be aligned on the natural boundary of the largest member. All structure members must also be aligned properly, and the structure itself needs to be padded to end on a natural boundary (the system can write in this area). Because of the importance of this, from Version 0.56 (beta), GoAsm?