Difference between revision 17 and current revision
No diff available.in Oot, do we want a base register to make addresses relative?
probably not; that's probably assumed/encapsulated away
---
ppl say that jsr/ret made the JVM verifier much more difficult to write. See [1], section The issue with JSR/RET. Why?
Maybe because ret jumps to an address in a register rather than to a fixed address?
Is jsr/ret the only computed goto in jvm?:
---
timeout bit
---
timeout bit
copied from [2]:
so what is boxing doing now? a lot:
we should also add:
in other words 'boxing' seems to be the default for our inefficienct, high-level, safe, dynamic, lazy, symlink-having language; and 'unboxing' is like a special mode that you can use to do more CTM stuff for tight loops, etc.
so still not clear on:
if we had more flags, we could also have one for how to deal with failure, eg whether to have (a maybe-null, or a straight nullable type), or to raise an exception
---
heck i dont wanna decide which flag bits are important, and decide between 8bit, 10bit, 12bit operands. unlike kant, i'm not confident that i'll think of all the modes i want forever. also it seems like a waste of space to have either large operands or many modes, but i dont know which one b/c that's empirical.
ok so maybe back to the variable-length, natural-languagey model.
design requirements: a variable length encoding that:
so accept variable length instruction 'sentences', where each 'instruction word' is fixed length(?) (and possibly has a marker to say if its 'ongoing' or not)
perhaps 'quoted' should also be part of the marker?
'referent' marker? eg 'this modifier modifies the word tagged 011'?
and/or 16 fixed part-of-speeches? eg destination, destination modifier, operand1, operand1 modifier, instruction, etc
sentence terminator bit, word terminator bit, clause terminator bit?
but if you allow conjunctions, then its already an AST
heck mb go that with anyway
---
yeah i think this sort of thing is the way to go.
---
two kinds of AND; simply parallelizing, and building a big noun (which i'll call 'constructive'). this is like whether to autoveectorize over a list, or to treat the list as a list
---
do we still need an 'oot bytecode' below oot core? probably (for things like COW). is there a need for an 'oot core+' which is like oot core, but more colloquial, and which is what the meetaprogramming matches upon (eg macros would run on oot, or oot core+, or on oot core)
---
kant's table of judgements... but want to generalize that sort of thing
'universal' (perhaps i should say 'generic', because i bet some natural languages dont have all of theese, anyway im constructing a programming language not a natural one languagee features): which-language (is this imperative programming, logic programming, etc?) POS-in-sentence (subject, verb, object.. or, opcode, destination, input1, input2) mode (and tense, etc) words, phrases, sentences (what abouut morphemes, etc?) base word of a phrase, and modifiers (eg adjectives in a noun phrasee vs the noun; perhaps addressing mode vs operand? how is mode distinct from this?) arguments (in bot nat. language and prog. lang) compound phrases, and conjunctions (common ones: and, or, xor) and parens lattice of truth values; not quoting annotation
again, should review natural language declensions and conjugations...
---
what can we do with 4 grammar bits?
sentence terminator, phrase terminator, 2 POS bits, allowing for 4 sentence components, eg, whole sentence, subject, verb, object. but some sentences have 5 components: he gave the apple to johnny in the house so need at least 3 bits for component choice (note: in that examplee, the componeets are like destination, operand, argument2, argument1, mode modifiers)
could impose an ordering rule that components must go in order, or at least, that 0 goes first (or last). this allows us to split sentences at the beginning of phrase 0. now we dont need the sentence teermination bit (at the cost of having to possibly read past the end of the sentence to know that the sentence has ended, if the rule is that 0 goes first). so we neeed 4 bits so far.
could also say that component 'base' goes eeither first or last in phrase; now we dont need to both identify the base, and also to identify phrase termination.
still need to distinguish modifiers from base and addressing modes.
so can do everything with 16 pos: component 0 component 1 base c2b c3b c4b c0addr c1addr c2addr c3addr c0mod c1mod c2mod c3mod annotation quote (more accurately, 'quoteed' or inside a quote) parens or other grammatical symbol
this doesnt quite suffice if component 0 can contain multiple preepositional phrases the preposition in prepositional phrases is like a verb; the rest is a nounphrase-like argument
how about: each component (recursively) has a base, an addr mode, up to 3 arguments, and modes/modifier subcomponents. that's too many, want at most 5. so combine base+addr mode. addr modes are subcomponents.
conjunctions are subcomponents (eg an AND would have base 'AND', a mode/modifier saying whether it's parallel or constructive, and 2 arguments)
---
so we say the base component must come first.
---
it take 3 bits to represent 5, so dont know what to do with the otheer 3 alternatives. One of them can be 'annotation'. Parens could be forms of annotation. So we have two left; could just waste them on parens, including a relative jump to allow a reader to quickly find the matching parens. So that's 3 bits. If we can do with no more than 1 more overhead bit we'll have a nice round 4 bits.
wait dont parens need to say which component they're part of? so we can't have them be their own components
is the last bit some kind of terminator? word, phrase, sentence?
---
mb instead of component i should say 'grammatical role'? grol for short? i think "phrasal category" is the linguistics term [3], but that's too long and the word 'category' has too many other meanings here.
---
so to summarize the current natural language proposal outlined here:
a variable length encoding that:
we have 'syllables' (or should we say grapheme? phoneme?)s which combine into 'words'; 'words' which combine into 'phrases'; 'phrases' combine into 'clauses'; and 'clauses' combine into 'sentences'
in our system, a 'clause' is ('is like'? or just 'is'?) a recursive, embedded, parethesized sentence. It is not a 'sentence' only because it is not toplevel. Semantically, due to quoting and unquoting and assigning first-class-functions to variables, etc, both clauses and non-clauses are similar; however, the language won't DO anything yet if it has just read a clause, because the context that clauses is embedded into might quote it, make it hypothetical, make it conditional, etc.
in our system, 'phrases' can be single-word or multi-word.
a 'phrase' might be an expression; e.g. in "x = 3 + (2*(*y))", "2*(*y)" is a "phrase"
each (word?) has a 4-bit 'morpheme' that give syntactic 'overhead'. 3 bits of this tells what its grammatical role is within its (phrase? clause?):
what we do with the other bit isn't decided yet
the rule is that role 0 comes first in any phrase. so if you see a non-role 0 and then a role-0, that's how you know you crossed a phrase boundary this isnt good enough because two adjacent phrases might both have only role 0s; so we still need a phrase delimiter; so mb use that bit for that
open and close parens contain the relative address of the matching parens
should we also mark when something is part of a quotation, to make it ez to skip without parsing?
should we also mark, for something which is acting like a 'pronoun', where the referent is, and for a modifier, where the modify-ee is?
two kinds of AND; simply parallelizing, and building a big noun (which i'll call 'constructive'). this is like whether to autoveectorize over a list, or to treat the list as a list
phrases can have subphrases whose grammatical role within the parent phrase is the same 5 things
eg
I gave the ball to the person who had given it to me:
"the person who had given it to me" is the indirect object, and in our system would be a subphrase, with argument1 'the person' and modifier 'who had given it to me'; 'who' here would just be an artifact of English, the modifier is that constraint tha 'person' 'had given it to me'.
similarly 'Billy's dad' is a subphrase with argument1 'dad' and (modifier?) Billy's
the 'modifiers' in the above might be 'base' in our system, eg like VERBS in natural language (contrary to the way linguistics treats them)
conjunctions are also subphrases; the 'base' is the type of conjunction, argument2 and argument3 are the two things being conjoined, the modifier specifies eg if this is a parallel or constituitive conjunction. eg "alpha and beta" has base AND, argument2='alpha', argument3='beta', and there is a mode too
modes are assumed to be defaults if not specified
---
rather than tagging the roles, could just say that the 4 bits say:
word ends phrase ends sentence ends (what's the 4th one? something to indicate when we skip a phrase, eg for a sentence with only phrase 0 and phrase 4? how would we do that?)
---
to cut down on subphrase overhead, we could make addr modes mandatory parts of words
---
one issue is that you can have multiple modifier phrases under a phrase, so you have to allow multiple phrase 4s
args 2 and 3 are symmetric, so if can have multiple, then why not just combine them, now just 4 choices = 2 bits for grammatical roles (just call these "parts of speech", POS, for short)
so now we're only 1 bit over our supposed budget of 4 bits (eow (end of word) + eop (end of phrase) + eos (end of sentence) + POS tag (2 bits)) = 5
we could eliminate the 'boxed' bit from the addr mode; so that now we would have 3 addr mode bits and 5 grammar bits. If words started at 16 bits long, then 8 bits would be left for the operand.
---
but do multi-word phrases even make sense if we can't markup each word in that phrase with its role in WITHIN THAT PHRASE?
hmm.. how about: if the "new phrase" bit is set, then role of this word within the clause is role 0 (base), and the inflection (2-bit of grammatical role) refers to the role of that phrase within the clause. Otherwise, the inflection refers to the role of that word within the phrase. but actually we want 'end-of-phrase' not 'beginning-of-phrase', right? so put 'base' at the end, then.
we say that 'syllables' (which may or may not correspond to 'words' on the underlying platform; our concept of 'word' is somewhat higher-level) are 16 bits.
so, now i think we have a complete system.
similarly to the fixed system, we can fit 4 fields into 64 bits, except we lost the 'boxed' bit, and we only have 8 operand bits instead of 12:
first 16-bit syllable:
second 16-bit syllable:
third 16-bit syllable:
fourth 16-bit syllable:
(note that there are two role 2s here, because there are two input operands)
note how easy it would be to have longer than 8-bit operands; just unset the EOW bit (in which case 11 of the 16 bits of the additional word are added to the operand, in big endian fashion; long operands still have the other 5 grammar field bits (but should they? we only really need the EOW one..)). So we can easily have a 19-bit operand, a 30-bit operand, etc.
if we need mode/modifier bits, either on the operand or the sentence level, we simply add some role 3s. (do role 3s have addr mode bits? i guess so, just for uniformity). we'll get 8 mode bits. If we need more, just unset the EOW.
---
so now what do annotations look like?
---
we need pronouns and conjunctions, too. i feel like a pronoun is like an addr mode. or do we need them? why not just load a referent into a register? that accomplishes the same thing.
---
'ootby' (oot assembly/oot bytecode) has a positional input argument syntax, is that ok? in that case we need a 'argument not provideed yet' sentinel (of 3 forms ?: the one that creates a function expecting that argument, and the one that assumes a default if there is one, and ow also expects that arg; and one that creates a partial fn that wont execute even if 0-ary)
---
how to give modes of more than 1bit?
should we provide a keyword arg option? (for role 2)
---
parens (subordinate clauses) might not be necessary, bc they can be replaced by previous sentences that load into registers
eg 'The bad man bit the dog':
r1 = 'the bad man' r1 bit the dog
---
even if recursive expressions with pronouns are not allowed, we at least want scoped regs, like haskell's let... in ..., so that we can tell when a temporary register that was just created for one expression is done. For this, however, we could just use an 'UNDEF' instr. i guess it wouldn't hurt to have type declaration annotations at variable initialization too...
---
maybe the concept of 'paragraph', paragraph-local vars
is this part of a general pattern of optimizing non-recursion, similar to link registers which optimize for only one level of subroutine? no, because in fact a parargaph can contain recursive calls, so this is a diffeerent concept
on the topic of link registers, do we want a similar req on fast fns? no b/c then ordinary partially-applied functions can't be used as fast functions.
---
'base' is a bad name, b/c in the case of 'noun phrases' (arguments), it makes you think of the operand, not the addressing mode, but it should be the addressing mode. mb it should be something like 'verb'.
addr mode is verb, not operand. but here we have addr mode/operand pairs, so that's ok. mb call it 'operation'? hmm.. really in general it's not necessarily active, it's just the type of meaning we're assigning to the rest. interpretation? too many other meanings. extensible grammar determinant? too long. root? naw, too man other meanings. 'base' doesnt sound do bad now. but 'operation' is probably the best one so far. maybe 'connective', as in logical connective? too abstract-sounding
so, 'op'.
note that there are 'main ops', which are the ops for top-level role 0, and then there are 'subops', which are ops for role 0 in other top-level roles.
note also that:
---
useful assembler directives:
ORG $400: what is to follow should be located at memory location $400 [http://alanclements.org/68k.html] DC.B label 12: declare a byte of memory, label its address 'label', and initialize it with 12 [http://alanclements.org/68k.html] END $400: "End of program and entry point" [http://alanclements.org/68k.html]
possibly useful instructions: STOP #$2700
"The 68K has an explicit halt instruction called STOP that usually takes the parameter $2700 (this parameter is loaded into the processor status word)." -- [4]