Part of Oot Details.
Read Oot first.
Invalid behavior
These provide some clear examples of what we mean when we say that Oot does not prioritize performance; in (almost?) every case, we choose the inefficent, safe/non-surprising solution.
Whenever possible, we eschew 'undefined behavior' and instead specify that invalid code or behavior leads to either: a compile time error, a runtime exception, or returning an undefined value
- however, for many cases there is a facility for the programmer to assert that certain things (such as initialization of a variable before use) can be statically assumed, and runtime checks could be omitted (eg that bounds checking can be omitted; that an integer value will always be below 232; etc). If compiled in unsafe "trust the programmer's assertions" mode, then an optimizing compiler MAY trust these assertions and omit checks that would otherwise be required.
- even in these cases, we prefer (returning an undefined value OR trapping/crashing) over 'undefined behavior'; again, though, possibly there is some compiler flag that tells the compiler that it may replace (returning an undefined value OR trapping/crashing) with 'undefined behavior'; this would allow truly performance critical code which can be trusted to a high degree (eg the implementation of the self-hosting Oot compiler, and the Oot standard library) to be maximally optimized; this also allows for a clean way to make more assumptions when compiling intermediate representations within the compiler (after type checking)
- however, even for that case, there is still an annotation 'this is a sanity check, do not optimize it away under any circumstances' so that sanity checks aren't considered 'dead code' paths and removed
now, onto the specifics:
- potential use of uninitialized variables is a compile-time error
- runtime check for integer overflow upon all arithmetic
- if disabled via annotations and unsafety, then two’s complement wrapping behavior at the bitwidth of the promoted type
- The result of any signed left-shift is the same as if the left-hand shift argument was cast to unsigned, the shift performed, and the result cast back to the signed type
- runtime check for oversized or negative shifts
- runtime range checks upon array accesses (note: this means that the internal representation of arrays must include their range)
- the compiler cannot move a side-effecting operation across any object pointer dereference that cannot be proven by the optimizer to be non-null
- NIL references are impossible (of course, you can wrap a reference in an Option type, however)
- no unsafe typecasting (internal representations are opaque and cannot be directly accessed except via '(un)serialize' or '(un)marshal')
- there are no 'pointers', only 'references', except in the FFI
- no pointer arithmetic
- structural equality and pointer equality comparisons are allowed on references
- no < comparisons on references/pointers
- masking pointers is not part of the language (reference/pointer implementation is opaque), although perhaps there is a part of the FFI that explicitly masks pointers, if possible on the current platform
- It is unspecified whether or not a reference requires storage (following C++ (§8.3.2; para. 4))
- references to references, arrays of references are allowed; however, there are annotations that can prohibit this to make it easier for the compiler to optimize
- references can refer to anything
- bit fields are not part of the language (except mb the FFI?)
- alignment is not part of the language
- exceptions (and, if enabled, any unsafe trapping behavior) are externally visible side-effects that must not be reordered with respect to other externally visible side-effects (much less be assumed to be impossible)
- by default, memory reads and writes may not be reordered (optional annotations are available to suggest otherwise)
- the result of a data race is as if: threads are compiled independently and then data races have a result that is dictated by the details of the underlying scheduler and memory system. Sequentially consistent behavior may not be assumed when data races occur.
- when using memcpy via the FFI, it's memcpy, not memmove; the destination CANNOT overlap the source. When asked to copy zero bytes, it's a no-op
- when a function returns without returning a value, NIL is returned to the caller.
- values of different types can be compared for pointer equality or for structural equality, in which case the comparison always fails
?not sure, todo:
- Does the value of a reference to an object whose lifetime has ended remain the same as it was when the object was alive (and is inspectable by eg comparing with ==)?
- if you serialize the representation of a pointer and then unserialize it, what happens?
The environment
Variables and labels and functions are all in the same namespace.
Semiglobals
note: semiglobals are like the 'environment' variables in GNU/Linux; eg:
$ echo $hi
$ bash
$ export hi='there'
$ echo $hi
there
$ bash
$ echo $hi
there
$ exit
$ echo $hi
there
$ exit
exit
$ echo $hi
$
.get and .set
.get must not have side-effects (in the current state mask).
.set must be idempotent (in the current state mask).
(why do we say 'in the current state mask'? because this allows for eg profiling or logging code to be executed within a .get or .set)
(todo; should we require that (x.set a; x.get == a)? i suspect no, because that would prevent eg using .get and .set to access locations in shared memory, which could be changed by another process in between the set and the get.)
functions are objects
(todo: ? is there a protocol 'apply' for them? what about closures? do we use Python's func_closure?)
how module loading works
To minimize PATH manipulation, the PATH must be manipulated via the path_insert() and path_remove() functions. To minimize filesystem queries, the contents of filesystem directories in the PATH may be examined and cached at any time (they may or may not be examined lazily rather than at program start); you must call the system path_cache_invalid(subpath) function if you need to discard part of this cache after a possible mid-runtime change in the contents of these directories.
The init() functions of modules are run at some point before the namespace exported by a module is accessed by other modules, but possibly lazily, in parallel, and in any order. A module may import a module that imports it (cyclic imports are allowed). However, the init()s of modules must not cyclicly depend on one another.
If a module is imported by multiple other modules, these may or may not be realized as separate instances of the module.
Authentication
A module is considered 'weakly authenticated' or just 'authenticated' if it is signed by a public key trusted by the implementation (typically this will include keys given on the commandline and keys in a persistent keystore), and if all of its imports are authenticated (or, in the case of cyclic imports, if all modules in the cycle are signed by a public key trusted by the implementation). It is considered 'strongly authenticated' if it is authenticated and all of its non-official import statements include an inline public key or public key hash, and all of its imports are strongly authenticated (or, in the case of cyclic imports, if all modules in the cycle meet these conditions).
Oot implementations may include commandline switches to determine:
- whether to cause an error upon unauthenticated modules or weakly authenticated modules
- whether to prefer strongly authenticated and authenticated versions of modules when different versions with different authentication levels are available
- which unsigned files to exempt from these preferences
misc
- there is a way to programmatically get a list of strings representing input parameter names of a function, and also to get a list of strings representing output parameter names, and the function's type signature