Bayle Shanks's website: proj-oot-ootMainSyntax

Read Oot first.

Details of Oot Syntax

Here we describe, in detail, the syntax of Oot.

(todo: some of this isn't just syntax)

General points regarding Oot syntax

Parsing occurs as a separate stage prior to the rest of Oot compilation/execution. There are scoped metaprogramming constructs that allow custom parsing of individual, clearly-marked strings, lines, or blocks within code, and there is a per-file 'source filter' preprocessing facility, but there are no metaprogramming constructs that can alter the behavior of Oot parsing in a non-local way. This guarantees that if you are reading Oot code outside the scope of the above-mentioned metaprogramming constructs, you can be assured that it parses in a standard way.

The precedence of operators is determined by which symbols they are composed of; although users can define custom binary operators, you never have to look up a function definition just to see how to source code will parse.

Echoing [1], one might say that Oot's parser has syntax defined in terms of characters, and the Oot language has syntax defined in terms of symbols, graphs, and literals. The parser is available to the user via the function ootparse, which reads the next form from a stream, and returns the objects represented by that form.

Any Oot program can be expressed as a single line, by using ';'s in place of newlines, by closing all parentheses explicitly instead of using paragraphs, and by using "\n"s in place of multiline strings.

Each line is either:

an assignment
a mutation
an expression on the last line of a block
an assertion (that is to say, lines which are just expressions, not side-effectful procedure calls, nor expressions, and which are not at the end of a block, are automatically interpreted as assertions)

. (period) is a shortcut for attribute access and pipelining/reversing the flow of information in a function composition

todo i think this is out of date: (a) no longer for reversing, (b) .. is not pipeline, infix

Periods are syntactic sugar for various things.

One period is a shortcut for applying a function on the left to a keyword on the right, without having to put the keyword in uppercase. When the thing on the left is a data structure, the function is the 'getter' for the data structure, and this acts as 'attribute lookup', for example:

P employeeRecord.address == P (employeeRecord ADDRESS)

todo does this affect the way setters work when on the LHS? Eg does 'a.b.c = y' mean something different than 'a b c = y', perhaps relating to whether the setter first called is the one for 'a' or the one for 'c'; eg do you actually run "a b c" and then call the setter on the result, or does the entire expression "a b c" have a special meaning b/c its on the lhs?

todo does this still work when '.' is unattached?

todo somewhere explain that 'attribute access' eg getters is just ordinary function application, and 'setters' is unified with function application on the LHS

Two periods are a shortcut to 'pipeline' the application of a function on the right to the data on the left, for example:

doSomething (sort (myFilter (x))) == x..myFilter..sort..doSomething

In the above example, '..' is binding tightly, because it is being used infix attached. If it is unattached, it binds loosely:

[3 2 5] .. sort .. map timesTwo == ([3 2 5])..(sort)..(map timesTwo) == map timesTwo (sort [3 2 5]) == map timesTwo ([2 3 5]) == [4 6 10]
map timesTwo [3 2 5] .. sum == (map timesTwo [3 2 5])..sum == sum (map timesTwo [3 2 5]) == sum ([6 4 10]) == 20
3 + [5 2]..max == 3 + ([5 2]..max) == 3 + (max [5 2]) == 3 + 5 == 8

: (colon) is for implicit grouping

A suffix colon indicates an implicit parenthetical grouping starting to the right of the identifier to which the colon is attached, and going until either the end of the containing grouping, or the next suffix colon (on the same level, ie not in an (explicit or implicit) parenthetical or block sub-grouping), even over multiple lines. In addition, the suffix colon indicates a parenthetical grouping starting right after the identifier to which the colon is attached, and going until either the next prefix colon (on the same level) or the end of the containing grouping or line.

A prefix colon indicates an implicit block starting to the right of the identifier to which the colon is attached, and going until either the next prefix colon or the end of the containing grouping, even over multiple lines, and additionally indicates a keyword argument, functioning similar to '/'; the identifer to which the colon attached is the keyword (the left hand side of the '/'), and the block to the right of it is the argument.

For example:

if: condition
  :then
    first
  :else
    second

is equivalent to:

if (condition) then/{first} else/{second};

Note that these constructs implicitly nest, provided that the prefix and suffix colons are alternating. For example:

if: condition
  :then
    if: condition2
      :then 
        handle-true-and-true
      :else
        handle-true-and-false
  :else
    handle-false

is equivalent to:

(if (condition) then/{(if (condition2) then/{handle-true-and-true} else/{handle-true-and-false})} else/{handle-false});

Infix colons create implicit parentheses on both the lefthand and the righthand of the colon, until the next infix colon or the end of the containing group or the next implicit semicolon. Infix colons have higher precedence than ('bind tighter than') prefix or suffix colons or implicit semicolons. Multiple infix colons do the same thing, but have lower precedence the more of them there are.

For example,

a b : c d

is equivalent to

(a b) (c b);

Another example:

a b :: c d : e f 
g h : i j

is equivalent to:

(a b) ((c d) (e f));
(g h) (i j);

todo: what do commas do? i think: on the rhs: implicit data constructor; on the lhs: bind multiple return values

todo: is there something that does Haskell's a b $ c d $ e f == a b (c d (e f))? (ps is that even how $ works in haskell?)

// is for things like cond statements

places an implicit parentheses around everything to its left, to the beginning of the line or to the end ofthe containing grouping, and also places an implicit block around everything to its right, until the next line (on this grouping level) containing a , or the end of the block (much like a prefix colon). Then it functions like a '/' between the value to its left, and the block to its right, except that instead of treating the thing to its left as a keyword, it evaluates it (at runtime).

todo: why evaluate the thing on the left? just leave it as a block

todo: is this really a good replacement both as a separator in 'cond' lists, and also for Prolog's ':-'?

Note: 'cond' is like 'switch' in C

For example:

i = 1
j = 2
cond: i
  0     // "zero"
  1     // "one"
  j     // "two"

is equivalent to:

i = 1; (cond (i) (0)$/"zero" (1)$/"one" (j)$/"two");

and

i = 1
j = 2
cond: i
  i == j  // "equal"
  i -= j  // "not equal"

is equivalent to:

i = 1; j = 2; (cond (i) (i == j)$/"equal" (i -= j)$/"not equal");

todo: i no longer understand the dollar signs i put in there

Atomic literals, by example

Integer: 3

Floating point number: 3.0, INF, NAN

String: "Hello!"

String, with interpolation and newline substitution: "Hello {name}!\n"

String, raw: r"Hello, i can include the '{' and \n characters!"

Multiline string ('HERE documents'): """ This is a string named {name} that is 3 lines long. """

Multiline string with custom delimiter: """xyz This is a string named {name} containing """triple quotes""" that is 3 lines long. """xyz

Multiline raw string with custom delimiter: r"""xyz This is a string containing """triple quotes""" and a { and a \n and which is 3 lines long. """xyz

In addition to 'r', string delimiters may be prefixed by '#' and then a string, which indicates that the string is a literal that is to be passed to a macro.

symbol: THIS-IS-A-SYMBOL

A symbol literal is just like a string literal, except (a) you don't have to enclose it in double-quotes, (b) the implementation is encouraged to represent it internally as an integer to aid performance, (c) translation tables can't contain mappings for keywords. Although you can try to coerce symbol values to strings and print them out, this is really only for debugging and most implementations will just print out the internal integer representation unless this is a debug build.

nil: NIL

booleans: FALSE, TRUE (synonyms: F, T)

Unicode, internationalization and translation

If your Oot implementation supports Unicode source files, non-ASCII characters are ONLY allowed within strings.

Oot has a facility to attach a separate file providing translations of strings. So, for instance, in the source code you could write (x = "Hello!"), and then in the translation file for French you could map "Hello!" to "Bonjour!", and then at runtime, if Oot is in a French locale, x will be set to "Bonjour!" when the line (x = "Hello!") is encountered in the source code.

Alphanumerics-with-dashs (generalized identifiers) and case: identifiers, symbols, annotations

The meaning of a string consisting of alphanumerics-with-dashs depends on its case:

Lowercase: an ordinary identifier

Capitalized: an annotation

Uppercase: a symbol literal (see above)

mixedCase: the upper-case parts are actually macro operators (we call these 'attached macros') operating on the surrounding lowercase words. eg "mixedCase" would be read as a macro 'C' applied to 'mixed' and 'ase'. (todo: do longer-than-one-letter such 'macros' exist? if not, then just the first letter, reading from the left, is the operator, right?)

Single-letter identifiers (or identifiers with only a single letter followed by a number) are considered to be uppercase, not capitalized; that is, annotations must have at least two letters at the beginning.

Any alphanumerics-with-dashs that BEGINS with dashes has a special meaning, depending on how many dashes there are:

one dash means symbol literal. This is just an alternative to writing it in uppercase, meant to make it easy to type. ootfmt will convert these to uppercase. E.g. '-this-is-a-symbol' means the same thing as 'THIS-IS-A-SYMBOL'
two dashes means 'private' and will not be exported or directly accessible from the containing module (although the containing module can choose to return it from a function called from outside of the module). E.g. --this-is-a-private-identifier (TODO: we should switch this to private-by-default)
three or more dashes means 'reserved' for the use of the Oot language. E.g. ---ROOT. If you see one of these, be aware that although it will parse like any other keyword, semantically the language might treat it specially in some way.

To prevent confusion, although the case of a string is significant, there is only one namespace; you are not allowed to, for example, have an ordinary identifier, and an annotation, with the same name except that one is capitalized and one is not.

Prefix and postfix

When you see alphanumerics-with-dashs smashed together with other punctuation character(s) without intervening spaces, then the punctuation characters act as operators or modifiers acting upon the alphanumerics-with-dashs they are attached to. A given (string of) punctuation character(s) may have three DIFFERENT MEANING depending on whether:

freestanding: it is standing alone, not attached to an alphanumerics-with-dashs. Eg "a + b"
prefix (leading): it is attached to the left of an alphanumerics-with-dashs, without any intervening space. Eg "a+".
postfix (trailing): it is attached to the right of an alphanumerics-with-dashs, without any intervening space. Eg "+a".

Generally the prefix and the postfix meanings are related; they are usually rough inverses of some sort, where prefix can be imagined as going 'up' (constructing) and postfix goes back 'down' (deconstructing) in some very abstract space. Sometimes the composition of a postfix with the corresponding prefix is an identity. For example, y = &x takes a pointer or reference to x, and y& defererences the pointer in y; (&x)& === x.

Strings of commas and strings of puntuation containing =s operate the same whether they are attached or unattached to their neighbors.

Attached punctuation binds tighter than unattached operators.

Whitespace

Any from of whitespace separates non-whitespace. Note that the meaning of punctuation may change dependent on whether it is attached to something, or separated from it by a space.

Newlines

Oot usually inserts semicolons at every newline. There are two exceptions:

if there is a '\' (backslash) at the end of the line (ie after all non-whitespace characters; the backslack is effective even if there is whitespace between it and the newline character)
if the line contains opening parenthesis which have not been closed by subsequent closing parenthesis, or if a previous line did and these have not been closed yet

In debug mode, it is possible to mark a certain dynamic scope of the program so that expressions in that scope which were delimited by newlines, as opposed to explicit semicolons, automatically print out the value of the expression on that line.

Blank lines and paragraphs

A region of text which does not contain blank lines and which is surrounded by either one or more blank lines and/or the boundaries of a block is called a 'paragraph'. A blank line is two newlines with no non-whitespace characters in between them (excluding comments). The only function of paragraphs is that, upon reaching the end of a paragraph, any levels of parenthesis within the current block scope which have not yet been closed are implicitly closed (just as if the blank line contained a string of closing parenthesis of length sufficent to balance the parenthesis in this block); and a semicolon is inserted at the end.

For example, each of the following is equivalent to "print (1+1);":

print (1+1)

print (1 +
1)

print (1 +
1

Comments

Two or more adjacent semicolons, followed by a space, has the effect of a newline and begins a comment that continutes until the end of the line. Two or more adjacent semicolons, followed by a non-whitespace string, begins a comment that continutes until the same delimiter is encountered, even over multiple lines; note that no newline is inserted. For example:

print 1+1  ;; nice day today
print 1+1  ;;;;; yay!
print 1+  ;;xyz one ;;xyz 1
print 1+  ;;xyz man this is
                quite a long
                comment
                ;;xyz 1

If a block has some lines with comments started with three or more semicolons, and some lines without that (with no comments, or with comments started with only two semicolons), then this is a hint to the reader that the lines with three or more semicolons are the more important ones, containing the 'happy path' or the 'main idea' of the block, in comparison to the other lines, which are mere 'details' such as error handling. IDEs are encouraged to have modes that collapse the less important lines, highlight the important lines, etc.

At the beginning of the file, if one or more lines begin with the character '#', then these lines are also treated as comments (to facilitate the Unix shebang convention on the first line, as well as build systems etc that can use the following lines to annotate).

Comments are removed at an early stage of parsing and have no further effects besides those mentioned above. Comments within strings, or within raw text being fed to metaprogramming facilities, are not removed.

Grouping

Aside from whitespace, there are four grouping constructs:

(): parentheses affect the order of evaluation of expressions (and also enter code context from within a data constructor, see 'Data context and code context' below)
{}: curly braces construct first-class representations of expressions and lists of statements, and in addition serve as a scope for variables and certain metaprogramming facilities. Note that the boundaries of a block always constitute 'paragraph' boundaries and hence unbalanced parentheses are implicitly closed at the end of each block (see 'whitespace', above)
[]: square brackets are data constructors
{{}}: double curly braces are 'regions' of code or data. Code regions are used for things such as transactions.

Regions

The following region is associated with the keyword value HELLO:

v = 3
HELLO{{print "Hi"}}HELLO

If no value is given, the associated value defaults to NIL:

v = 3
{{print "Hi"}}

Every region must be terminated in the sense that every region opening double braces must correspond to exactly one point in the same lexical scope as the opening at which the region has closed; however, regions do not have to nest in lexical scopes.

So one might see the following:

REGION1{{
  if x > y: {
    do_something
    do_something_else
    }
  else: {
    do_a_third_thing
  }
}}REGION1

But the following is illegal (assuming this is not an excerpt but the entire file), because REGION1 is never closed:

REGION1{{
  if x > y: {
    do_something
    do_something_else
    }
  else: {
    do_a_third_thing
  }

Because regions don't have to nest lexically, the same region-opening might be closed by two or more places, providing that these places are not lexical ancestors of one another. For example, one might see:

REGION1{{
  if x > y: {
    do_something
    }}REGION1
    do_something_else
    }
  else: {
    do_a_third_thing
    }}REGION1
  }

And similarly for region openings. For example, one might see:

  if x > y: {
    do_something
    REGION1{{
    do_something_else
    }
  else: {
    REGION1{{
    do_a_third_thing
  }
}}REGION1

But the following would be illegal, because REGION1 is never terminated in the ELSE branch:

REGION1{{
  if x > y:
    do_something
    }}REGION1
    do_something_else
  else:
    do_a_third_thing

Similarly, the following is illegal, because REGION1 is terminated both in the first branch of the if, and also at a later point in the lexical parent of this branch:

REGION1{{
  if x > y:
    do_something
    }}REGION1
    do_something_else
  else:
    do_a_third_thing
}}REGION1

Regions associated to different values do not (syntactically) affect one another, and can overlap:

REGION1{{
REGION2{{
  if x > y:
    do_something
    }}REGION1
    do_something_else
    }}REGION2
  else:
    }}REGION2
    do_a_third_thing
    }}REGION1

A region can be closed and then re-opened later, or can be opened multiple times, but these closings and openings must 'nest' in the sense described above.

The following is legal:

REGION1{{
x = 3
}}REGION1
y = 4
REGION1{{
  if x > y: {
    do_something
    REGION1{{
    do_something_else
    }}REGION1
    }
  else: {
    do_a_third_thing
  }
}}REGION1

These restrictions ensure that:

for each region, each lexical location (and location within the source code) can be uniquely associated with a non-negative integer, showing how many times that region is nested at that location
for each region, each lexical location (and location within the source code) is either associated with a nesting integer of 0 for that region, or there is a point in the current block or in a lexical ancestor of that block before which the nesting integer was strictly smaller, and at which either there was a region opening at that point or there was a net region opening in every lexical lexical line of descent from that point. (and similarly, symmetrically, for region closings)

todo regions in graphs

Expressions

Every line and block is an expression in Oot. The value of a block is the value of the last expression in it. The value of a conditional control construct such as 'if' is the value of the last line on whichever part of it executes.

Assignment statements return the value being assigned (ie the 'rhs', the right hand side).

Operators and macros

Operators are unary if and only if they end with the character '-', except that the operator '-' is unary negation if it is an attached prefix, and otherwise it is the 2-ary subtraction operator. Unary operators are attached prefixes.

The following characters can be used in names for binary operators: todo

todo: what is the operator to negate a number or boolean or to invert a function?

Data constructors (also called graph constructors or node constructors)

todo: in many (but maybe not all) ways, the earlier G-constructor syntax (see below, in OLD section (later: did i mean ootOldMain.txt?) is better than this, merge these 2 proposals

Oot has one primitive structure/composite data constructor, []. It is used to construct Oot Graphs. Oot Graphs can be used as lists, as associative arrays/dicts/hashes, as trees (acyclic graphs), or as (potentially cyclic) graphs. Examples:

a list:

l = [1 2 3]; l 1 == 2;

an associative array with string keys:

d = ["apple"="red" "pumpkin"="orange"]; d "apple" == "red";

an associative array with keyword keys:

[APPLE="red" PUMPKIN="orange"]; d APPLE == "red";

an associative array with variable keys:

key1 = APPLE; val2 = 'orange' [key1="red" PUMPKIN=val2]; d APPLE == "red";

a tree (edges by node syntax):

tree1 = [SALLY=[$BOB, $ALICE]; BOB=[] ALICE=[$SALLYS_GRANDDAUGHTER] SALLYS_GRANDDAUGHTER;]; tree1 SALLY ALICE 0 == tree SALLYS_GRANDDAUGHTER

In the previous example, we used prefix dollarsign (eg '$BOB') to indicate that, eg instead of SALLY's first child being the keyword BOB, rather SALLY's first child is the node whose LABEL is the keyword BOB. Prefix dollarsigns are resolved within the context of the currently-in-scope data context.

alternate syntax for declaring edges (edgelist syntax):

t2 = [SALLY BOB ALICE SALLYS_GRANDDAUGHTER SALLY/BOB BOB/ALICE ALICE/SALLYS_GRANDDAUGHTER]; tree2 == tree1;

using newlines:

tree3 = [
SALLY BOB ALICE SALLYS_GRANDDAUGHTER
SALLY/BOB
BOB/ALICE
ALICE/SALLYS_GRANDDAUGHTER
]; tree3 == tree2;

a cyclic graph:

cg = [A B C  A/B  B/C  C/A]  cg A B C == cg A

a graph with a self-loop:

sl = [NODE1 = [--SELF]]; sl NODE1 0 == sl NODE1;

a graph with an edge to the graph itself:

gwaettgi = [NODE1= [--ROOT]]; gwaettgi NODE1 == gwaettgi;

a graph with an edge whose target is another edge, rather than a node:

gwet = [NODE1  NODE2  NODE1/NODE2  NODE2/(NODE1 NODE2). ); gwet NODE2 0 --SRC == gwet NODE1

the same graph, with the addition of a NODE3 which is a node representing a reified edge:

gwet = [NODE1  NODE2  NODE1/NODE2  NODE3=(NODE1 NODE2).  NODE2/NODE3 ); gwet NODE2 0 --SRC == gwet NODE1; gwet NODE2 0 == NODE3;

Here the a postfix '.' means 'the last edge along the path just given'. For example, 'x y Z.' would refer to the edge whose source is the result of evaluating x y, and whose label is Z.

Graphs can contain 'metadata', which are invisible to ordinary graph operations, but accessible using the 'metadata' view of the graph. Metadata is given using the '^' prefix operator. For example, here is a graph with two nodes, labeled APPLE and PUMPKIN, with metadata indicating sharding:

key1 = APPLE; val2 = 'orange' [key1="red" PUMPKIN=val2 key1^[shard=1253] PUMPKIN^[shard=543] ]; d APPLE == "red";

Function application

Functions associate to the left. 'Multiargument' functions are curried. Function application is by juxtaposition. Eg:

sqrt 4 == 2
add 2 3 == (add 2) 3 == 5

Graphs can be accessed as functions:

d = ["apple"="red" "pumpkin"="orange"]; d "apple" == "red";

Defining functions

function_name parameter1 parameter2 = function_body

f x = x + x

The stuff to the left of the equals sign ('function_name parameter1 parameter2' above) is called an 'argument specification' or argspec.

Curly braces ('{}'s) must be used to enclose the function body if it has multiple statements (and even if the function body is a single statement, if that statement includes an equal sign ('='), that equals sign and the function-defining equals sign must be separated by some sort of grouping; {} is recommended but () will also work; this is because two '='s which are syntactic siblings are the 'named return argument' construct, see below). Eg:

add_3 x = {k = 3; x + 1}

Function definitions are also expressions which return the function that was defined. For example, the following defines a function named 'f' with one parameter, and then assigns this function to the variable 'a':

a = (f x = c)

Note that surrounding the function definition with some sort of grouping is mandatory, because otherwise this would be recognized as the 'named return argument' construct, see below.

Anonymous functions

Anonymous functions are written with a '=>' to separate the parameters from the function body. {}s can be used for functions with multiple lines, eg:

x => x + x
x => {y = 3; x + y}

0-ary functions can be defined using this syntax by having no parameters, eg:

=> 3

Named functions are just syntactic sugar for anonymous functions. 'f x = body' is syntactic sugar for:

f = (x => body)
(f@meta).name = 'f'

todo change the previous line to use the real View syntax (instead of @) and the real 'get string name of' meta syntax (instead of 'f')

todo can we omit this if we use identifiers like _1, eg is "_1 + _1" equal to "x $ x + x"?

Named and multiple return values

Functions with named or multiple return values (multiple return values require named return values) are defined using the syntax:

named_return_value1, named_return_value2 = function_name arg1 arg2 = function_body

The motivation for using an '=' on the left here is so that you can easily copy-and-paste from the function definition to the places where you are writing code to use the function. For example, in the above you can copy 'named_return_value1, named_return_value2 = function_name arg1 arg2', and paste that into the call site, then edit it to be what you need. The motivation for using an '=' sign on the right is so that function definitions without multiple return arguments are still very easy to type, and so that they look like ordinary variable assignment/binding.

To call a function and bind multiple return values, use:

return_value1, return_value2 = function_name arg1 arg2

Note that this function call looks somewhat like a function definition. The way to tell them apart is that the LHS (LHS means left-hand-side of an equals-sign) of a function callsite binding named return values has a comma in it, eg 'r1, r1 = f x'; by contrast, a function definition would look like 'f x = body', where 'f' and 'x' are separated by spaces. Note that commas on the LHS are significant; when seen on the LHS, they are NOT a shortcut for [return_value1 return_value2]; '[return_value1 return_value2] = function_name arg1 arg2' is NOT equivalent (that would be a destructuring bind on the primary return value).

If you need multiple statements in the function_body, you must make it a block (ie surround it with curly braces ('{}')).

The value of the last line of the block is called the 'primary return value' and the others are called 'secondary return values'. The function body gives the primary return value on the last line of the function, or by using 'return'.

Note that having two equal-signs ('=') like this does NOT mean a chained assignment (chained assignment is a construct that some other languages have; Oot does not have chained assignment); it is a special form for a function definition including named return values. On the other hand, if the two '='s are not syntactic 'siblings', but rather are are explicitly grouped separately via parentheses (or similar, todo decide exactly what is meant here), then they are treated as two separate =s. For example, 'a = (f x = c)', which we saw above.

Examples:

r1, r2 = f x = {r2 = 3; x + x}
primary_result, z = f 1      ;; accessing the r2 result by position
primary_result, r2/z = f 1   ;; accessing the r2 result by name
primary_result == 2
z == 3

An anonymous function with named return values can be defined using the same syntax, but with '_' in the place of the function name:

anonymous_fn = (named_return_value1, named_return_value2 = _ arg1 arg2 = function_body)

'return'

'return' works similar to languages like C and Python. When encountered within a function body, it causes the function to exit, returning the value given to <=. For example:

division_except_divide_by_zero_is_zero x y = {if y == 0 (return 0); div x y}

(note to self: 'return' is alphabetic because it is not needed in Oot Core because it is semantically equivalent to throwing an exception which is caught at the function level, and replacing the last line of the function with a construct that returns what was caught by such an exception, if there was one, or by the result of the expression on the actual last line otherwise; of course for performance reasons the Oot implementation will have to treat return specially, unless the compiler is smart enough to handle it as an exception but then to recognize that this exception is always caught within the function and optimize it to a goto, or even an actual return if we can)

Summary listing of syntactic contexts

ordinary expression/'code context'
graph constructor/'data context'
LHS of variable binding/assignment
argument specification of a function definition (LHS in ordinary function definitions; middle of the two =s in named argument function definitions;

Expression evaluation

Expression evaluation is non-strict but not necessarily lazy. I'm not quite sure exactly how this will work, but right i'm thinking that the requirement will be that the program should not diverge or give an error unless a lazy evaluation strategy would also diverge or give an error; EXCEPT if the entire expression being evaluated was (a) created at the behest of the current module AND (b) not marked as lazy.

In terms of implementation, each thunk will be marked as to the modules in the call chain at the time of its origin, and marked if it is 'lazy' or not. If it originated in the current module and it is not marked lazy, it is evaluated eagerly. Otherwise, it is evaluated lazily yet with some heuristically bounded speculative eager evaluation, perhaps involving the maximal lexical nesting depth of the current module (static, i suppose, lest some macro make a really deep depth?); but if an error is generated, this error is not delivered until the value would have been demanded lazily.

# is for footnotes

Footnotes are a special form of compile-time macro.

'#x', where x is a 'footnote identifier' is a 'footnote' (or more formally, a 'footnote invocation'). A footnote identifier has the syntax of either an identifier or an integer constant (the purpose of allowing integers as identifiers is to allow/encourage authors to not waste mental time/space in coming up with a meaningful identifier). A footnote captures and replaces everything to its left within the current explicit group (ie until the previous EOL or open parens or open braces or open bracket). Within the footnote definition, the ## is replaced by the captured text.

The captured block is not 'run' unless/until requested by the footnote.

Footnote x is 'defined' using the special syntax '#x= ...', where ... is a placeholder for an expression within which '##' refers to the captured text. Note that the '=' is postfix attached; '#x = ...' is not the same.

The 'definition' of the footnote identifiers comes AFTER their invocation. The same footnote identifier may be defined multiple times; the rule is that the definition that comes lexically soonest after the invocation is the one that is used.

Data context and code context

Within the scope of a data constructor (that is, after an unquoted [ but before the closing ]), we are said to in 'data context'. Otherwise, we are said to be in 'code context'. Some syntax, and the operation of macros, differs depending on whether we are in data context or code context.

Parenthesis within data context enters code context. A nested data context can be entered from this code context, etc. Eg:

[(1 + 1), ([1 2] + [3 4])] == [2 [1 2 3 4]]

Base context and metacontext

Base context is the program that is being executed. Metacontext is annotations and other data attached to parts of the program being executed. For example, static type annotations are in metacontext. Annotations (metacontext) may be attached to parts of the program within code context and also to parts within data context.

From base context, metacontext is entered via the prefix ^^. From metacontext, base context is entered via postfix ^^. As a shorthand, any capitalized word is automatically placed in metacontext. Eg:

Int i = 3 is shorthand for ^Int i = 3

To place multiple space-separated words into metacontext, use ^^ with parenthesis, eg:

^(List int) l = [1,2,3]

To place multiple lines into metacontext, use ^^ with blocks, eg:

^{
Wrapper1
List int
}

Both base and metacontext can have both data context and code context within them.

Assertions

An AssertionException? is raised if any expression in the block, aside from the last, evaluates to FALSE. Eg in the following block, the line 'y = 5' will never be reached, because execution will terminate with an AssertionException? when the line 'x == 4' evaluates to FALSE:

x = 3
x == 4
y = 5
y

' is for errors

' involves passage between a mode of error-handling in which exceptions are handled by immediately raising the exception, but variables are guaranteed not to contain nils, and a mode in which exceptions are captured by Fail values using Option types, but exceptions are always caught.

'x evaluates x, and, if this evaluation raises an error, it catches that error and puts it into the errorneous case of an Option type. If x does not raise an error, it puts the result into the successful case of an Option type.

x' takes an Option type and tests if it is an erroneous case or a successful case. If it is erroneous, it raises the contained error. If it is successful, it returns the contained result value.

x'e is like x' except that if x is an erroneous case of an option type, it returns not the contained error, but rather the result of applying the function e to the contained error.

For example:

'(1/1) == Succeed 1
'(1/0) is Fail DivideByZeroError?
(Succeed 1)' == 1
(Fail DivideByZeroError?)' raises the contained DivideByZeroError?
(Fail DivideByZeroError?)'(fn e Exception("my error")) raises the value Exception("my error")

(todo is the syntax right to terminate the optional arguments of the DivideByZeroError? constructor here?) (todo is the lambda function syntax right here?)

is

'is' is a boolean operator. 'subject is predicate' returns True (T) if any of:

subject == predicate
bool(predicate subject) (todo should this be hasattr(subject, predicate) and bool(subject.predicate)?)
isa subject predicate
issubclass subject predicate

and False (F) otherwise.

TODO see the longer list in ootSyntaxThoughts and add it here

TODO mb 'is' should NOT check predicates, because then you cant test the identitfy of the predicate itself via 'is'? or mb thats an unimportant special case?

truthiess

Some Oot values can be coerced to booleans using the function 'bool'. Oot values that can be coerced to bools are called 'boolable'. Values are boolable when one of the following is true:

the value is in the domain of either the 'len' function
the value is in the domain of either the 'nonzero' function
the value is an Failure (see Error Handling)

bool(x) is:

TRUE if: len(x) >= 0
TRUE if: nonzero(x) == TRUE
FALSE otherwise

Autocoercion

Autocoercion of subset types

Some types are subsets of another type; for example, the integers are a subset of the reals (the integers are called the 'subset type' and the reals are called the 'superset type'). A function that takes an integer value and returns the corresponding real value is called an 'embedding'. Oot provides a facility for declaring subset types and embeddings.

When a chain of embeddings can be found, the smaller type will autocoerce to the larger (for example, integers will autocoerce to reals) but not the other way around. For example, "sqrt 3" is legal even though 3 is an integer, because this will be coerced to 3.0. For example, corresponding values in the subset and superset types will compare as equal (eg 3 == 3.0). Note that a value of the subset type will acquire all of the per-element type attributes of the superset type.

There are certain requirements on embedding functions:

they must be pure functions (deterministic and side-effect-less)
they must be injective: there must not be any two distinct values in the subset type for which an embedding function returns the same value from the superset type
an example of an 'inclusion path between embedding functions' is integers < rationals < reals. If there are embedding functions provided for each step along an inclusion path, then this implies that their composition is a valid embedding function; for example, if an embedding function is provided from integers to rationals, and another embedding function is provided from rationals to reals, then this implicitly defines an embedding function from integers to reals.
any two embedding functions between a given subset and superset type must yield exactly the same result on every input
if there is an embedding function in both directions between two types, then these types must be equivalent

Intuitively, these requirements will always be met as long as the values in the subset type are actually thought of as members of the superset type (as opposed to just things that have some special correspondence to members of the superset type; eg integers are not actually strings, even though 0 can be mapped to 'zero', 1 can be mapped to 'one', etc).

todo: adjust language; in oot-speak, embedding functions are on 'representations', not 'types', because in terms of 'types', the value in the subset and superset type are the same value

todo: we dont use the word 'subtype' because (a) these are 'representation' and (b) i think ppl sometimes use subtyping for things where the subtype isnt ontologically the same thing as the supertype, although i cant think of an example right now.

todo See also 'inclusive' vs 'coercive' subtypes in https://en.wikipedia.org/wiki/Subtyping#Subtyping_schemes . http://www.slideshare.net/RalphMohr/subtypes-vs-roles suggests that 'inclusive' subtypes are when one object can be a member of a supertypes and of more than one of its subtypes at the time time; and also that in practice these things are often really orthogonal 'facets' that should be thought of as roles.

todo see also 'type inclusion' in Oberon http://www.excelsior-usa.com/doc/xds/o2rep12.html#47 , which has been eliminated as of Oberon 2007 ( http://www.inf.ethz.ch/personal/wirth/ProjectOberon/PO.System.pdf )

(lack of) Other autocoercion

Aside from embeddings, there is no 'autocoercion' in Oot. However, some of the builtins do coerce their inputs:

'if' applies 'bool' to its condition input
'print' applies 'str' to the input to be printed

builtin exceptions hierarchy

todo

i generally have the idea that instead of just having one 'nil' value, like Python's None, we'll have many, to disambiguate.

sentinel which is assigned to defaultable arguments which are not overridden, meaning 'use the default value for this argument if there is one'; can be explictly passed to eg allow a chain of functions that call each other with default values where the actual default value is only specified in the innermost function
what else? see below:

" The problem is that the null reference has been overloaded to mean at least seven different things:

    a semantic non-value, such as a property that is applicable to some objects but not others;
    a missing value, such as the result of looking up a key that is not present in a key-value store, or the input used to request deletion of a key;
    a sentinel representing the end of a sequence of values;
    a placeholder for an object that has yet to be initialized or has already been destroyed, and therefore cannot be used at this time;
    a shortcut for requesting some default or previous value;
    an invalid result indicating some exceptional or error condition;
    or, most commonly, just a special case to be tested for and rejected, or to be ignored and passed on where it will cause unpredictable problems somewhere else.

Since the null reference is a legitimate value of every object type as far as the language is concerned, the compiler has no way to distinguish between these different uses. The programmer is left with the responsibility to keep them straight, and it’s all too easy to occasionally slip up or forget. " -- Anders Kaseorg, http://qr.ae/CS2A6

i would also consider breaking the following into two: "a semantic non-value, such as a property that is applicable to some objects but not others;"; in true/false/nonsense, 'nonsense' is sort of a 'non-value', but this could sometimes be different from a property that is sometimes inapplicable?

also, true/false/maybe might be distinguished from true/false/don't know (eg dont know means i refuse to take any position, maybe means both are reasonable guesses), and true/false/nonsense, and true/false/neither (implying there is some other truth value besides true or false which is applicable here, eg in logics that admit other qualitatively different ones)

to deal with distributed systems:

maybe have two basic error conditions, for three result conditions in total:

(1) success (2) clean failure (3) success or failure unknown; also, partial success possible (4) corruption detected (eg ecc fail error)

so, if a side-effectful RPC was sent but there was a network failure that prevents us from knowing if the RPC request was received or not, (3) would be returned.

also this is a joke but it points out some common types of errors that need to be distinguished: " Most famous is JavaScript’s? implementation of arrays, which erases a litany of bugs by returning undefined if the provided index is out of bounds. Universal values like undefined and null are useful values to return if you want to indicate system failure, out of memory error, user error, invalid memory location, timeout, nothing, something, anything. "

todo; also, this isnt syntax

object protocols

--GET and --SET

todo; also, this isnt syntax

Destructuring bind

todo

? is for patterns and logic

Oot contains constructs to transform between imperative programming functions and declarative values.

These values are called 'patterns' (reads, possible non-deterministic reads from an external data source) and 'facts' (writes, possibly side-effectful writes to some external data store).

A statement like '(x => filter (mod x 2 == 0))' can be thought of as a little computer program, or it can be thought of as a query pattern. A query is the application of a pattern to some data source.

Patterns are first-class in Oot.

TODO graph representation of patterns. Note that we need to represent forall, joins, etc

TODO of course this section doesn't actually say what the question mark does. i think ? as a prefix sigil will indicate that the identifier is some sort of logical keyword, eg a quantifier, a modal logical symbol, etc. Eg ?forall, ?exists, ?necessary, ?possible.

TODO and what about meta-variables? mb it should be forall? exists? necessary? possible?, and ?metavariable; eg prefix is a metavariable, postfix is a logic something-or-other.

TODO mb we should switch $ and ?; ? seems more appropriate for metavariables, doesn't it? otoh i guess it's easy to remember that '?' is for semantics that is not quite defined in the Oot language

TODO couldn't/shouldn't we just use KEYWORDS for stuff like forall?

! is for impurity and side-effects

A call to a function which interacts with the state of the environment (called an 'impure' function) must be postfixed with a '!'. There are exceptions: the interactions of a function can be associated with 'state domains', and state domains can be 'masked', meaning that side-effects associated with those domains are implicitly allowed. For example, the state domains 'logging', 'caching', 'debugging', and 'console' (eg 'print') are masked by default (you may want to unmask 'console' in programs for which correct interaction with the console streams is important).

An assignment to a reference variable, or to an object with a side-effectful 'setter', is considered a side-effectful operation, and must be postfixed with a '!' as above, unless the reference variable or setter is in a currently masked state domain.

For example:

delete-file!('testfile')
x =! 3

Adding unnecessary postfix '!' to pure functions is harmless (except for getting in the way of some compiler optimizations; and except for marking the parent function as impure).

In the absence of specific 'non-sequential' delimiters, all lines within a block containing !s are considered to be sequenced with respect to each other. For example, in the following, 'file deleted' will not be printed before 'delete-file' is executed:

delete-file!('testfile')
print! 'file deleted'

Note that the one-letter macro 'P' is short for 'print!', not just 'print'.

todo: does prefix and infix and standalone ! mean anything?

Something that looks like an assignment but with an expression attached to/prefixing the '='s is an in-place assignment

For example:

x = 2
x += 3
x == 5
}}

In general, 

something (function)= argument1 argument2

(and there might be more than 2 arguments) 

is equivalent to:

something = function something argument1 argument2

However, it also hints to the implementation that this assignment should be done as an in-place mutation on the current value of 'something', if possible, rather than by creating a new value and then replacing the old value with it. (if the implementation is using copy-on-write under the hood, this DOES NOT mean that other supposedly separate copies of this value will be affected by this; the implementation is responsible for transparently detecting and avoiding that situation)

todo: if the value assigned to something is a reference, ie changes to it are SUPPOSED to have side-effects, then this also indicates that this is such a change (not just a copy of the value). I'm not quite sure of the syntax for reference variables yet so i havent provided an example here yet.


todo: do we need this if we have statement prefix "$" for "$ f; == ($ = f $;)"?




=== _ (underscore) is for dummy results, partially applying to non-leading postional parameters, and marking argument positions ===

When unattached on the LHS (left-hand-side), '_' is a 'write-only' identifier that can be assigned to unused or 'dummy' values. For example:

_, maxPos = max([3,2,4])
maxPos == 2

Unattached on the RHS, _ implicitly creates a lambda function (another way to say this: it marks a parameter to be left open while partially applying other parameters). Eg:

{{{
halve = div _ 2
halve 10 == div 10 2
halve == x => div x 2

Multiple uses of _ creates an anonymous function taking multiple separate variables:

plus == _ + _ == x y => x + y

Attached as a suffix, _ can also be used to 'mark' argument positions for use by certain other operations. For example, the 'inverse' function will, if an inverse has been defined, return the inverse of a given function with respect to the marked argument:

inverse (pow(e, x_)) == log

If multiple positions must be distinguished, a secondary identifier may be provided after the _ (so in this case the '_' is infix attached); for example:

special_operator_thingee f x_2 y_1 z_3

--cwn-- is not the current working directory

Although at the beginning of the program, --cwn-- is implicitly set to the current working directory, altering the current working node does not have the side-effect of changing the current working directory.

Precedence

When we speak of an 'operator', we mean a token with punctuation in it which corresponds to an ordinary function. The purpose of using operators instead of just ordinary alphanumeric identifiers using the ordinary function application syntax is just to make it easier to read expressions by reducing the need for parentheses with syntactic rules.

Infix attached operators group tightest; you can think of something like '3+5' being equivalent to '(3+5)'. Unattached operators have varying precedence. Unattached operators are never unary; all unary operators are prefix (in their punctuation form; if you want to use them 'alone', just use their ordinary alphanumeric identifier). In all cases, the precedence of an operator can be discerned from its leading (eg first few characters of) punctuation.

The following table gives precedence. When two instances of things with the same precedence level follow each other, there are only four possibilities: (1) the table below lists them as left associative, (2) the table below lists them as right associative, (3) the table below lists them as associative (eg multiplication is associative) (if different associative operators are intermixed, the composite is non-associative; only when the same operator follows itself does associativity count), (4) the expression is non-associative; explicit parentheses or some other equivalent grouping method must be used to say which has higher precedence.

From highest precedence (tightest binding) to lowest:

any operator, when it is infix attached
function application (left associative)
operators starting with '*' (associative)
operators starting with '+' (associative)
todo (cons, string concatenation) (non-associative)
== != < <= > >=, in/elem, <...

...> (trinary ops), &&,

(non-associative)

. operator (right associative)
= (not really an operator)

note: the subtraction operator, '-', is treated specially; 'a - b' is treated as a macro for 'a + -b', so 'a + b - c + d' is associative even though + and - are intermixed, because it is treated as 'a + b + -c + d'.

todo: should fn application have preceddeence overbinary operators, like in haskell? what about the oneletter macros?

todo: more ideas on precedence in ootSyntax, ootSyntaxThoughts

Verification

! at the beginning of a statement is for assertions that you want the compiler to prove:

n = 1 + 1
! n == 2

!! is for assertions that you want the compiler to accept on faith:

n = 1 + 1
!! n == 2

!? is for pre-conditions (a pre-condition is some property that should hold before a function is called; this should be assumed when proving things within the function, but then must be proven at the callsite of the function):

!? n == 1
n = n + 1
! n == 2

Loop invariants are just expressed by putting an assertion within the loop.

To show termination of loops, often an expression must be given which strictly decreases upon each loop iteration. This can be done with '!<', which binds to the enclosing loop:

i = 5
while i>0 {
  !< i
  i = i - 1
}

If the compiler can't prove the !< but you want the compiler to take your word for it, use '!!<' instead of '!<':

i = 5
while i>0 {
  !!< i
  i = i - 1
}

Sometimes in assertions and preconditions you may need to assert some condition which is only true in some circumstances, or over some range. To do this you may need to introduce a variable in the condition; you can do this with '@'; '@' is read 'forall' (it is an example of a 'quantifier') and '@i condition(i)' means 'for all i, condition(i) is true'. You can use '->' for boolean implication (which is a short-circuiting operator, so the thing on the right side is not evaluated when the thing on the left side is false). So, for example, to assert that all items in an array named 'arr' are != 0, you can quantify over indices into the array:

! @i (0 <= i) & (i < len(i)) -> arr[i] != 0

Since quantifying over all valid indices into a collection is common, you can also use '%%', the 'in' operator in a 'forall' variable introduction:

! @x%%arr x != 0

In proof mode, by default Oot will attempt to prove termination of loops. If you want Oot to take your word for loop termination, use '!! ^terminating':

while i>0 {
  !! ^terminating
  i = i - 1
}

If you want to mark something as non-terminating:

while i {
  !! ^nonterminating
  i = i
}

If you don't want to decide whether something is terminating or not and you don't want Oot to try to prove termination:

;; collatz conjecture
while x != 1 {
  ^mbterminating
  ife !(mod x 2) {x = x % 2} {x = 3*x + 1}
}

Functions which call non-terminating loops are themselves marked as non-terminating, and functions which call mbterminating loops are themselves marked as mbterminating (unless overridden with !! ^terminating).

If you want to find all the places where you told the compiler to 'take your word for it', you can just search the source code for the substring '!!'.

Control structures

if {i > 0} {dosomething}

  or equivalently:

if i > 0 // {dosomething}

ife (if with else):

if {i > 0} {dosomething} {dosomethingelse}

  or equivalently:

if i > 0 // {dosomething} {dosomethingelse}

while:

while {i>0} {i = i - 1}

  or equivalently:

while i>0 // {i = i - 1}

whilst: this is like 'while' except that the first iteration is executed regardless, without doing the test:

whilst {i>0} {i = i - 1}

or equivalently:

whilst i>0 // {i = i - 1}

switch:

switch {x} {
  1 // pr "x is 1"
  2 // pr "x is 2"
}

cond:

cond {
  x > 2 // pr "x > 2"
  x > 1 // pr "x is > 1 but <= 2"
  // pr "x is <= 1" ;; default
}

goto:

place = LABEL2
goto place
LABEL1: pr "reached 1"
LABEL2: pr "reached 2"

for (foreach)

break

continue

Metaprogramming

"" (two double quotes next to each other) is the literal for an empty string when unattached, but when prefix attached to something, involves the passage from literal strings in the source code to identifiers. ""xyz is shorthand for 'xyz'. ""[a b c] is shorthand for ["a" "b" "c"]. xyz"" means to lookup the value of variable xyz at the time of evaluation, and then to substitute this string into the source code in place of xyz"". For example: a=3; xyz = 'a'; xyz"" == 3;

^ (caret) involves the passage from ordinary code to metadata and annotations about that code (see 'ordinary context and metacontext').

Attached macros can only 'see' the things they are attached to. They see the raw string text of those things, however, and can use postfix "" to get the variables if they want, or they can just use them as strings (eg the way that the unit abbreviation is just passed to a lookup table for U). Users can define custom attached macros, but their names must be more than one character long. Note that attached macros can only be used with an identifier or data constructor (not an operator) on their left, to allow them to be distinguished from capitalized word (annotations) and from keyword arguments to operators (todo is that exactly correct?).

`` (backquotes/backticks) is a form of quoting that indicates that refers to the AST generated by parsing the thing quoted. For example, the following assigns an Oot Core AST representing "a > 3" to the variable ast-for-a-greater-than-3.

ast-for-a-greater-than-3 = `a > 3`

There are (at least 3) 'phases' of code:

text string (the source code)
AST
opaque blocks

The operation of going from a string of source code to an AST is called 'parsing' (this can be broken down into lexing and 'parsing proper'). The operation of going from an AST to an opaque block is (somewhat confusingly) called 'compiling'. The difference between the AST phase and the opaque block phase is that metaprogramming constructs such as macros can look inside an AST and rewrite it, but they cannot look inside an opaque block; the implementation is free to represent a block however it wishes and to optimize it.

To construct a string of Oot source code, use ""s (double quotes) (just like other strings). To construct an AST, use ``s (backticks). To construct a block, use {}s (curly braces).

For example, code to assign a first-class function to the variable 'increment' might be written as a block:

increment = {x => x + 1}

or it might be written as an AST:

increment = `x => x + 1`

The difference is that in the latter case, a macro or other metaprogramming construct could look inside the value of 'increment' and see what it is doing, whereas in the former case it cannot (but otoh using the former case might give the implementation many more opportunities for optimization). The AST form might in addition be useful when you are writing code for some purpose other than directly executing it in this Oot process (eg you are writing an expression to be used as part of a query that will be sent to a database server; the server may or may not be implemented in Oot).

To change code into lazy return values and a lazy list ('stream') of events (reads and (writes, or 'side effects')), use 'eval'. To execute code or a lazy list of events in the context of ("against") an environment, use 'exec'.

todo must ASTs be explicitly 'compiled' before being executed, or does executing an AST implicitly compile it if needed? i'm thinking the latter

---

so {} are like "quote". is "!" like "unquote" or is it "pass on the sideeffects"? in some sense i guess sideeffects are themselves "quoted" in a functional reactive stream. preefix vs postfix "!" to apply to the thing itslf, or to its first return value? what about impure reads (not sideeffectful)? what about sideeffecttful reads, the rest of the stream cannot be produced until these are executed against some environment? hmm.. "!" as meaning "excute against the current environment"; mb "f!e" for "exec f against environ e"; exec means not just reading from the environ, but also writing to it.. environ can "catch" these writes (and reads!) if it wants, in order to produce a functional reeactive stream instead ! =s like eval, not anttiquote; and this is not quite like quote/eval b/c blocks are opaque, they are not ASTs (and not source code text strings) that are preserved until runtime (or th end of this 'stage' at least, in the case of g compiletime macros) for us in metaprogramming

---

Modulenames and filenames:

The filename must match the modulename (except that on platforms that support it, an '.oot' extension must be added at the end), except on weird platforms where this isn't practical. Modulenames must start with a letter and contain only ASCII lowercase letters, numbers, and underscore. The length of every import path must be less than 256 (so modulenames should be well under 256). Motivated by Windows compatibility [2], the following module names are forbidden: CON, PRN, AUX, NUL, COM1, COM2, ..., COM9, LPT1, LPT2, ... LPT9.

---

todo describe different kinds of macros depending on what compilation stage they interferee with?

todo source filters (file scope)

line macros: In general, identifiers that consist of one lowercase letter repeated exactly twice are macros that operate on the raw source code string thru the earlier of either the end of the line, or the first ';' character not escaped by a preceding backslash.

block-scoped macros (given parsed representation, return new parsed representation)

racket-like syntax metaprogramming in block scope?

todo is some sort of a metaprogrammy thing

todo first-class macros, fexprs (call-by-text) (are those the same thing?)?

todo need complete syntax/punctuation table in tutorial