proj-oot-old-150618oot

Project status

I am just writing down ideas. I have many pages of notes on things i may or may not want to put into Oot. Please note that my philosophy is, whenever i take notes for myself, if there is no reason to keep them private, to put them on my website; these notes are therefore here, not because they are intended to be readable by anyone other than me, but just as notes to myself. Over time i hope to evolve them into something readable and after that i'll remove this disclaimer.

At this point, the design is unfinished; there is no coherent concept of a language called 'Oot', just a bunch of ideas towards such a proposal. There is no timeframe to actually implement this, and it most likely will never get finished.

To be clear, i repeat myself: there is no such thing as Oot. Oot has neither been designed nor implemented. These notes about 'Oot' contain many conflicting proposals.

This document is very out of date and no longer serves as a good introduction to the evolving language proposal.

You may want to look at proj-oot-whyOot.

There are a bunch of other files in this folder with more details and notes on various parts of Oot: proj-oot.

Oot

Oot is a readable, massively concurrent, programmable programming language.

Oot is a general-purpose language that particularly targets the following application domains:

Why not Oot?

Oot aspires to combine the readability and ease of Python, the functional laziness of Haskell, the commandline suitability of Perl, the straighforwardness of C, the metaprogrammability of Lisp, and the simplicity of BASIC (well, not quite), with massive concurrency.

Hello world

print "Hello world!"

Overview

In this section we briefly explain the basics of Oot. Our purpose is to give you a sense of Oot, and also the background you'll need for understanding the examples in the Tour.

Overview of syntax

Initial gotchas of Oot syntax

If you are coming from another language, you may be surprised by the following aspects of Oot's syntax and builtin functions:

General points

Unlike e.g. Python, the AMOUNT of whitespace does not matter in Oot; but the PRESENCE or ABSENCE of whitespace, and the TYPE of whitespace (whether it is a space, a newline, or a blank line) does matter.

Only punctuation affects the way things are parsed; there are no alphanumeric 'keywords'.

A person reading code does not have to look up any function or macro definitions to see how things will parse (except of course within metaprogramming constructs that essentially pass a string to something like a custom 'eval').

Identifiers and uppercase/lowercase

Identifiers can consist of alphanumeric characters and dashes. The functions of a string of alphanumeric characters and dashes varies depending on case:

Grouping and precedence

Grouping is supplied by parenthesis (), blocks {}, and graph constructors [].

Blocks are constructor for first-class (lists of) expressions.

The precedence of infix operators can be determined by the characters in them (eg you don't have to look up the function definition of custom infix operators in order to determine how they parse).

Whitespace: spaces, newlines, and blank lines; prefix and postfix punctuation

Any Oot program can be expressed in a single line. Newlines usually represent implicit semicolons (except when the line ends with a '\', or when all parentheses have not yet been closed). A region of text bounded by square brackets or blank lines is called a 'paragraph'. Paragraphs implicitly close open parentheses. Eg:

print 1 + 1
print (1 +
1)
print (1 +
1

print 2

is equivalent to:

print 1 + 1; print (1 + 1); print (1 + 1); print 2;

In Oot, it sometimes matters whether an operator is separated from its argument by a space (freestanding), or whether it is 'attached', and if it is attached, whether is appears on the left or the right side of its argument. Often prefix versions of operators are something like a 'constructor', and the postfix version is the corresponding 'destructor'. For example:

&x   ;; like C's &x
x&   ;; like C's *x

Expressions, function application, and assignment

Everything is an expression. The last line of a function is its (primary) return value. The last line of a branch within an 'if' statement is its return value.

To apply the function 'add' to 1 and 'x' and store the result in 'result':

result = add 1 x

(function application is by juxtaposition)

Partial function application is achieved just by not giving all required arguments:

increment = add 
result = increment x
result == increment x == (add 1) x

(functions are curried and left-associative)

Functions can have keyword arguments:

result = bake-n-pies 3 variety="pumpkin"

Defining functions:

TODO

Overview of data

Atomic Literals

Atomic literals consist of ints (3), floats (3.0), strings ("hi"), booleans (TRUE), and NIL.

Some Oot implementations may support Unicode in source files. In this case, Unicode is permitted only within string literals.

Graph constructors

TODO

Overview of functions

Overview of the type system

Overview of metaprogramming

Lisp is a homeoiconic language built upon lists. Oot is a near-homeoiconic language built upon labeled graphs.


Tour

In this section we briefly highlight some of the features of Oot.


Syntax

Here we describe, in detail, the syntax of Oot.

General points regarding Oot syntax

Parsing occurs as a separate stage prior to the rest of Oot compilation/execution. There are scoped metaprogramming constructs that allow custom parsing of individual, clearly-marked strings, lines, or blocks within code, and there is a per-file 'source filter' preprocessing facility, but there are no metaprogramming constructs that can alter the behavior of Oot parsing in a non-local way. This guarantees that if you are reading Oot code outside the scope of the above-mentioned metaprogramming constructs, you can be assured that it parses in a standard way.

The precedence of operators is determined by which symbols they are composed of; although users can define custom binary operators, you never have to look up a function definition just to see how to source code will parse.

Echoing [1], one might say that Oot's parser has syntax defined in terms of characters, and the Oot language has syntax defined in terms of symbols, graphs, and literals. The parser is available to the user via the function ootparse, which reads the next form from a stream, and returns the objects represented by that form.

Any Oot program can be expressed as a single line, by using ';'s in place of newlines, by closing all parentheses explicitly instead of using paragraphs, and by using "\n"s in place of multiline strings.

Atomic literals, by example

Integer: 3

Floating point number: 3.0, INF, NAN

String: "Hello!"

String, with interpolation and newline substitution: "Hello {name}!\n"

String, raw: r"Hello, i can include the '{' and \n characters!"

Multiline string ('HERE documents'): """ This is a string named {name} that is 3 lines long. """

Multiline string with custom delimiter: """xyz This is a string named {name} containing """triple quotes""" that is 3 lines long. """xyz

Multiline raw string with custom delimiter: r"""xyz This is a string containing """triple quotes""" and a { and a \n and which is 3 lines long. """xyz

In addition to 'r', string delimiters may be prefixed by '#' and then a string, which indicates that the string is a literal that is to be passed to a macro.

keyword: THIS-IS-A-KEYWORD

A keyword literal is just like a string literal, except (a) you don't have to enclose it in double-quotes, (b) the implementation is encouraged to represent it internally as an integer to aid performance, (c) translation tables can't contain mappings for keywords. Although you can try to coerce keyword values to strings and print them out, this is really only for debugging and most implementations will just print out the internal integer representation unless this is a debug build.

nil: NIL

booleans: FALSE, TRUE (synonyms: F, T)

Unicode, internationalization and translation

If your Oot implementation supports Unicode source files, non-ASCII characters are ONLY allowed within strings.

Oot has a facility to attach a separate file providing translations of strings. So, for instance, in the source code you could write (x = "Hello!"), and then in the translation file for French you could map "Hello!" to "Bonjour!", and then at runtime, if Oot is in a French locale, x will be set to "Bonjour!" when the line (x = "Hello!") is encountered in the source code.

Alphanumerics-with-dashs (generalized identifiers) and case: identifiers, keywords, annotations

The meaning of a string consisting of alphanumerics-with-dashs depends on its case:

Lowercase: an ordinary identifier

Uppercase: a keyword literal (see above)

Capitalized: an annotation

mixedCase: the upper-case parts are actually macro operators (we call these 'attached macros') operating on the surrounding lowercase words. eg "mixedCase" would be read as a macro 'C' applied to 'mixed' and 'ase'.

Any alphanumerics-with-dashs beginning with one dash is 'private' and will not be exported or directly accessible from the containing module (although the containing module can choose to return it from a function called from outside of the module). E.g. -this-is-a-private-identifier

Any alphanumerics-with-dashs beginning with two dashes are 'reserved' for the use of the Oot language. E.g. --ROOT. If you see one of these, be aware that although it will parse like any other keyword, semantically the language might treat it specially in some way.

Prefix and postfix

When you see alphanumerics-with-dashs smashed together with other punctuation character(s) without intervening spaces, then the punctuation characters act as operators or modifiers acting upon the alphanumerics-with-dashs they are attached to. A given (string of) punctuation character(s) may have three DIFFERENT MEANING depending on whether:

Generally the prefix and the postfix meanings are related; they are usually rough inverses of some sort, where prefix can be imagined as going 'up' (constructing) and postfix goes back 'down' (deconstructing) in some very abstract space. Sometimes the composition of a postfix with the corresponding prefix is an identity. For example, y = &x takes a pointer or reference to x, and y& defererences the pointer in y; (&x)& === x.

Strings of commas and strings of puntuation containing =s operate the same whether they are attached or unattached to their neighbors.

Attached punctuation binds tighter than unattached operators.

Whitespace

Any from of whitespace separates non-whitespace. Note that the meaning of punctuation may change dependent on whether it is attached to something, or separated from it by a space.

Newlines

Oot usually inserts semicolons at every newline. There are two exceptions:

In debug mode, it is possible to mark a certain dynamic scope of the program so that expressions in that scope which were delimited by newlines, as opposed to explicit semicolons, automatically print out the value of the expression on that line.

Blank lines and paragraphs

A region of text which does not contain blank lines and which is surrounded by either one or more blank lines and/or the boundaries of a block is called a 'paragraph'. A blank line is two newlines with no non-whitespace characters in between them (excluding comments). The only function of paragraphs is that, upon reaching the end of a paragraph, any levels of parenthesis within the current block scope which have not yet been closed are implicitly closed (just as if the blank line contained a string of closing parenthesis of length sufficent to balance the parenthesis in this block); and a semicolon is inserted at the end.

For example, each of the following is equivalent to "print (1+1);":

print (1+1)

print (1 +
1)

print (1 +
1

Comments

Two adjacent semicolons, followed by a space, has the effect of a newline and begins a comment that continutes until the end of the line. Two adjacent semicolons, followed by a non-whitespace string, begins a comment that continutes until the same delimiter is encountered, even over multiple lines; note that no newline is inserted. For example:

print 1+1  ;; nice day today
print 1+  ;;xyz one ;;xyz 1
print 1+  ;;xyz man this is
                quite a long
                comment
                ;;xyz 1

Comments are removed at an early stage of parsing and have no further effects besides those mentioned above. Comments within strings, or within raw text being fed to metaprogramming facilities, are not removed.


=== Grouping ===
Aside from whitespace, there are three grouping constructs:
* (): parentheses affect the order of evaluation of expressions (and also enter code context from within a data constructor, see 'Data context and code context' below)
* {}: curly braces construct first-class representations of expressions and lists of statements, and in addition serve as a scope for variables and certain metaprogramming facilities. Note that the boundaries of a block always constitute 'paragraph' boundaries and hence unbalanced parentheses are implicitly closed at the end of each block (see 'whitespace', above)
* []: square brackets are data constructors

=== Commas (convenience only; not in Oot Core) ===

A single comma has the lowest precedence aside from = and multiple commas, and has the effect of placing an implicit data constructor around its area of precedence, and surrounding each side of itself with implicit parens, unless this area is itself delimited on both sides by an explicit data constructor. In addition, within this implicit or explicit data constructor, it serves as a shorthand for binding each comma-delimited item to a node labeled by its ordinal position within the comma-delimited list, starting with 0. For example, "a + b, c = f x" is the same as "a + b,c = f x" is the same as "[(a + b), (c)] = f x"; and all are the same as "[0=(a+b); 1=(c);] = f x".

A string of multiple commas has different effects depending on whether it is found in code context or in data context. In code context, it does nothing except by virtue of its precedence (a double comma's precedence is one step lower than a single comma, and each additional comma is one step lower); eg "f x ,, f y == (f x) (f y)"; eg "f x ,, g z ,,, f y == ((fx) (gz)) (f y) == f x,,g z,,,f y"

In data context, a string of multiple commas creates a multidimensional array. The first dimension of the array is delimited by single commas, the next dimension by double commas, etc.


todo:

commas for arrays is shorthand for:
[a,b] is [0=a; 1=b]

multidim arrays are shorthand for:
[a,b,,c,d] is [0,0=a, 0,1=b, 1,0=c, 1,1=d] is [[0=0,1=0]=a, [0=0,1=1]=b, [0=1,1=0]=c, [0=1,1=1]=d]


todo:

note (and think about) the idea of using node literals as nodenames for multidims


=== Expressions ===
Every line and block is an expression in Oot. The value of a block is the value of the last expression in it. The value of a conditional control construct such as 'if' is the value of the last line on whichever part of it executes.

Assignment statements return the value being assigned (ie the 'rhs', the right hand side).

=== Operators and macros ===
Unary operators start with the character '+'.

The following characters can be used in names for binary operators: todo

todo: what is the operator to negate a number or boolean or to invert a function?


=== Data constructors (also called graph constructors or node constructors) ===
todo: in many (but maybe not all) ways, the earlier G-constructor syntax is better than this, merge these 2 proposals


Oot has one primitive structure/composite data constructor, []. It is used to construct Oot Graphs. Oot Graphs can be used as lists, as associative arrays/dicts/hashes, as trees (acyclic graphs), or as (potentially cyclic) graphs. Examples:

a list: l = [1 2 3]; l.1 == 2;

an associative array with string keys: d = ["apple"="red" "pumpkin"="orange"]; d."apple" == "red";

an associative array with keyword keys: [APPLE="red" PUMPKIN="orange"]; d.APPLE == "red";

an associative array with variable keys: key1 = APPLE; val2 = 'orange' [key1="red" PUMPKIN=val2]; d.APPLE == "red";


a tree (edges by node syntax): tree1 = [SALLY=[$BOB, $ALICE]; BOB=[] ALICE=[$SALLYS_GRANDDAUGHTER] SALLYS_GRANDDAUGHTER;]; tree1.SALLY.ALICE.0 == tree.SALLYS_GRANDDAUGHTER

In the previous example, we used prefix dollarsign (eg '$BOB') to indicate that, eg instead of SALLY's first child being the keyword BOB, rather SALLY's first child is the node whose LABEL is the keyword BOB. Prefix dollarsigns are resolved within the context of the currently-in-scope data context.

alternate syntax for declaring edges (edgelist syntax): t2 = [SALLY BOB ALICE SALLYS_GRANDDAUGHTER SALLY/BOB BOB/ALICE ALICE/SALLYS_GRANDDAUGHTER]; tree2 == tree1;

using newlines:
tree3 = [
SALLY BOB ALICE SALLYS_GRANDDAUGHTER
SALLY/BOB
BOB/ALICE
ALICE/SALLYS_GRANDDAUGHTER
]; tree3 == tree2;

a cyclic graph: cg = [A B C  A/B  B/C  C/A]  cg.A.B.C == cg.A

a graph with a self-loop: sl = [NODE1 = [--SELF]]; sl.NODE1.0 == sl.NODE1;

a graph with an edge to the graph itself: gwaettgi = [NODE1= [--ROOT]]; gwaettgi.NODE1 == gwaettgi;

a graph with an edge whose target is another edge, rather than a node: gwet = [NODE1  NODE2  NODE1/NODE2  NODE2/NODE1.NODE2. ); gwet.NODE2.0.--SRC == gwet.NODE1

the same graph, with the addition of a NODE3 which is a node representing a reified edge: gwet = [NODE1  NODE2  NODE1/NODE2  NODE3=NODE1.NODE2.  NODE2/NODE3 ); gwet.NODE2.0.--SRC == gwet.NODE1; gwet.NODE2.0 == NODE3;

Here the a postfix '.' means 'the last edge along the path just given'. For example, 'x.y.Z.' would refer to the edge whose source is the result of evaluating x.y, and whose label is Z.


=== Function application ===
Functions associate to the left. 'Multiargument' functions are curried. Function application is by juxtaposition. Eg:

{{{
sqrt 4 == 2
add 2 3 == (add 2) 3 == 5

Graphs can be accessed as functions:

d = ["apple"="red" "pumpkin"="orange"]; d "apple" == "red";

The 'dot' operator is a synonym for this that allows tight binding through attachment:

d."apple" == red
d2 = [A=[0=$B] B=[1=$C] B=[2=$D] D ]]
d2.A.0.1.2 == d2.D == d2 A 0 1 2 == (((d2 A) 0) 1) 2

todo: Perhaps the dot semantics will differ from ordinary function application when used on the lhs of an assignment? Perhaps something about testing for existence at each step, although i think we've got that covered with '(d2 A 0 1 2). Or is this useful at all?

When on the lhs of an assignment, two dots can be used to create desired edge if one doesn't yet exist:

d3 = [D]
d3..A..0..1..2  = d3.D

Note that in this case, the nodes newly created are all labeled NIL.

Three dots can be used to create a first-class path object by defining the starting and ending nodes (todo really?):

d2 = [A=[0=$B] B=[1=$C] B=[2=$D] D ]]
path = d2.A...0.1...2
len(path) == 1

todo

Expression evaluation

Expression evaluation is non-strict but not necessarily lazy. I'm not quite sure exactly how this will work, but right i'm thinking that the requirement will be that the program should not diverge or give an error unless a lazy evaluation strategy would also diverge or give an error; EXCEPT if the entire expression being evaluated was (a) created at the behest of the current module AND (b) not marked as lazy.

In terms of implementation, each thunk will be marked as to the modules in the call chain at the time of its origin, and marked if it is 'lazy' or not. If it originated in the current module and it is not marked lazy, it is evaluated eagerly. Otherwise, it is evaluated lazily yet with some heuristically bounded speculative eager evaluation, perhaps involving the maximal lexical nesting depth of the current module (static, i suppose, lest some macro make a really deep depth?); but if an error is generated, this error is not delivered until the value would have been demanded lazily.

Data context and code context

Within the scope of a data constructor (that is, after an unquoted [ but before the closing ]), we are said to in 'data context'. Otherwise, we are said to be in 'code context'. Some syntax, and the operation of macros, differs depending on whether we are in data context or code context.

Parenthesis within data context enters code context. A nested data context can be entered from this code context, etc. Eg:

[(1 + 1), ([1 2] + [3 4])] == [2 [1 2 3 4]]

Base context and metacontext

Base context is the program that is being executed. Metacontext is annotations and other data attached to parts of the program being executed. For example, static type annotations are in metacontext. Annotations (metacontext) may be attached to parts of the program within code context and also to parts within data context.

From base context, metacontext is entered via the prefix ^. From metacontext, base context is entered via postfix ^. As a shorthand, any capitalized word is automatically placed in metacontext. Eg:

Int i = 3 is shorthand for ^Int i = 3

To place multiple space-separated words into metacontext, use ^ with parenthesis, eg: ^(List int) l = [1,2,3]

To place multiple lines into metacontext, use ^ with blocks, eg: ^{ Wrapper1 List int }

Both base and metacontext can have both data context and code context within them.

Errors

' involves passage between a mode of error-handling in which exceptions are handled by immediately raising the exception, but variables are guaranteed not to contain nils, and a mode in which exceptions are captured by Fail values using Option types, but exceptions are always caught.

'x evaluates x, and, if this evaluation raises an error, it catches that error and puts it into the errorneous case of an Option type. If x does not raise an error, it puts the result into the successful case of an Option type.

x' takes an Option type and tests if it is an erroneous case or a successful case. If it is erroneous, it raises the contained error. If it is successful, it returns the contained result value.

x'e is like x' except that if x is an erroneous case of an option type, it returns not the contained error, but rather the result of applying the function e to the contained error.

For example:

(todo is the syntax right to terminate the optional arguments of the DivideByZeroError? constructor here?) (todo is the lambda function syntax right here?)

is

'is' is a boolean operator. 'subject is predicate' returns True (T) if any of:

and False (F) otherwise.

truthiess

Oot values are auto-coerced to booleans when booleans are expected and when the value is in the domain of either the 'len' function, or the 'nonzero' function.

The coercion uses the builtin function 'truthy'. 'truthy x' works as follows:

Other autocoercion

todo

i'm not sure; maybe some coercion between ints and floats, but not much else? Autocoercion hurts readability. Otoh shouldn't users be able to implement their own numerics? Maybe limit to acyclic (on the type graph level) coercions, and to embeddings, eg int coerces to float?

i suppose it's also nice for things to coerce to strings for debugging. But maybe pp, and Exception, and print, should just explicitly coerce.

builtin exceptions hierarchy

todo

i generally have the idea that instead of just having one 'nil' value, like Python's None, we'll have many, to disambiguate.

" The problem is that the null reference has been overloaded to mean at least seven different things:

    a semantic non-value, such as a property that is applicable to some objects but not others;
    a missing value, such as the result of looking up a key that is not present in a key-value store, or the input used to request deletion of a key;
    a sentinel representing the end of a sequence of values;
    a placeholder for an object that has yet to be initialized or has already been destroyed, and therefore cannot be used at this time;
    a shortcut for requesting some default or previous value;
    an invalid result indicating some exceptional or error condition;
    or, most commonly, just a special case to be tested for and rejected, or to be ignored and passed on where it will cause unpredictable problems somewhere else.

Since the null reference is a legitimate value of every object type as far as the language is concerned, the compiler has no way to distinguish between these different uses. The programmer is left with the responsibility to keep them straight, and it’s all too easy to occasionally slip up or forget. " -- Anders Kaseorg, http://qr.ae/CS2A6

i would also consider breaking the following into two: "a semantic non-value, such as a property that is applicable to some objects but not others;"; in true/false/nonsense, 'nonsense' is sort of a 'non-value', but this could sometimes be different from a property that is sometimes inapplicable?

also, true/false/maybe might be distinguished from true/false/don't know (eg dont know means i refuse to take any position, maybe means both are reasonable guesses), and true/false/nonsense, and true/false/neither (implying there is some other truth value besides true or false which is applicable here, eg in logics that admit other qualitatively different ones)

to deal with distributed systems:

maybe have two basic error conditions, for three result conditions in total:

(1) success (2) clean failure (3) success or failure unknown; also, partial success possible (4) corruption detected (eg ecc fail error)

so, if a side-effectful RPC was sent but there was a network failure that prevents us from knowing if the RPC request was received or not, (3) would be returned.

todo; also, this isnt syntax

object protocols

--GET and --SET

todo; also, this isnt syntax

Destructuring bind

todo

Multiple return values

todo (and commas for destructuring bind on the lhs)

Attached macros

M for map: eg [4,16]Msqrt == [2,4]

F for fold: todo

T for filter: eg [-5,5,-10,100]F(>0) == [5,100]

Z for zip: todo (and what about zipWith?)

P for pluck: todo (see underscore)

C for cross-product

U for unit: eg 3Um means '3 with units of meters'. xUm == (unit (--unitabbrevs 'm') x)

todo: check out Perl's 'metaoperators' again and see what else we're missing

todo: and what about that Ruby thing that is like a partially-applied __get?

Metaprogramming

$ involves the passage from variables to their values. In ordinary code, "x" would be replaced by the value of variable x when it is evaluated; but "$x" means the VARIABLE x, not the value of that variable. Similarly, in a context (such as in the lhs (left-hand-side) of an =s in a data constructor) in which "x" would be assumed to just be a name, "x$" means the interpolation of that name, that is, the substitution of x's value for x$.

"" involves the passage from literal strings in the source code to identifiers. ""xyz is shorthand for 'xyz'. xyz"" means to lookup the value of variable xyz at the time of evaluation, and then to substitute this string into the source code in place of xyz"". For example: a=3; xyz = 'a'; xyz"" == 3;

^ involves the passage from ordinary code to metadata and annotations about that code (see 'ordinary context and metacontext').

Attached macros can only 'see' the things they are attached to. They see the raw string text of those things, however, and can use postfix "" to get the variables if they want, or they can just use them as strings (eg the way that the unit abbreviation is just passed to a lookup table for U). Users can define custom attached macros, but their names must be more than one character long. Note that attached macros can only be used with an identifier or data constructor (not an operator) on their left, to allow them to be distinguished from capitalized word (annotations) and from keyword arguments to operators (todo is that exactly correct?).

todo describe different kinds of macros depending on what compilation stage they interferee with?

todo source filters (file scope)

line macros: In general, identifiers that consist of one lowercase letter repeated exactly twice are macros that operate on the raw source code string thru the earlier of either the end of the line, or the first ';' character not escaped by a preceding backslash.

block-scoped macros (given parsed representation, return new parsed representation)

racket-like syntax metaprogramming in block scope?

todo is some sort of a metaprogrammy thing

The environment

Variables and labels and functions are all in the same namespace.


Oot Core

'Oot' is built around a simple subset of itself called Oot Core. So,

todo

note: oot core and oot have thee same lexer and parser, but oot has a bunch of macros and metaprogrammed syntax that oot core does not

teach oot core first, then oot, then metaprogramming

todo

todo

where to put this?

"pp" is a macro that surrounds the rest of the line in double quotes, and sends result to the default console output. A synonym is 'print'. Eg

pp Hello World!

OLD, OUT OF DATE: the oot programming language (under construction)

Lisp is a homeoiconic language built upon lists. Oot is a homeoiconic language build upon labeled (hyper multi reified) graphs.

Why Oot? -- built around a powerful data structure: labeled graphs. -- interface-based (or "attribute-based") type discipline -- constraint satisfaction -- readable, concise syntax -- memory managed. static typing, type inference. lazy evaluation. higher order functions. -- goal: the referential transparency, higher order functions, and type system of Haskell, the readability and convenience of Python

see [2] for more detail.

audience: general purpose. programmers.

statically typed. memory managed. type inference. lazy evaluation. higher order functions.

support for the following paradigms: imperative, oop, functional (referential transparency), logic, macros

priorities: power, readability, conciseness

syntax that compiles to cleaner core

primary data structure(s): labeled (reifiable) graph. this also provides a syntax for lists, association tables, arrays, trees, graphs, relations, structs, objects with managed attributes.

some goals:

tasks that could drive development

some features:

anti features:

some other design decisions:

A single

Syntax

AnA?

General note on syntax

In some languages punctuation characters are also separators, so you aren't required to put a space before or after them, but in Oot, many punctuation characters mean something different if they are attached to another token as opposed to being surrounded by whitespace. For example, in Oot,

  x = 3

is not interchangable with

  x=3

The '=' in "x = 3" is called "freestanding". The '=' in "x=3" is called "attached".

Basic function calling syntax

Put the function arguments to the LEFT of the function, separated by spaces.

Example: if f is a function that takes arguments x and y: y x f

You can pass keyword arguments using keyword=value. The order of keyword arguments doesn't matter. All keyword arguments must go to the right of all positional arguments.

Example:

  x = 3 [] lb="apple" ins
  x == ["apple"=3]

G-constructor syntax

G-constructors are literals used to construct directed graphs (for technical notes on what we mean by directed graph, see [3]).

A graph in Oot is something that you might draw by drawing a bunch of circles and then drawing some arrows between some of the circles, and then attaching notes to some of the arrows. To non-CS people, a graph might be called a "network". We call the circles "nodes", the arrows "arcs", and the notes on the arrows "edge labels".

G-constructors are constructs surrouded by [] delimiters (can be implicitly closed by indentation).

The outermost node in the constructor is called the "root node". The constructor can only construct one connected component, although other unconnected nodes can be added later by merging graphs.

Example: a graph with only the root node:

  []

Nodes can contain a list of objects, separated by spaces. These objects are considered to be other nodes such that directed edges go from this node to those. These edges are labeled by their position in the list (zero-indexed).

Example: a node that contains a list of the first two even numbers:

  [2 4]

This represents the following directed graph:

   root
    /\
 0 /  \ 1
  /    \
 2      4

If a variable X holds a node, then that node's edges can be traversed using the "." operator.

Example: after putting the above node into "x", x.1 == 4:

  x = [2 4]
  x.1 == 4    # evaluates to t (true)

Edges can have multiple labels. To assign an additional label to an edge, use "label = destination".

Example: same as above, but the edge to 2 is labeled "yellow" as well as 0:

  ["yellow"=2 4]

This represents the following directed graph (note: should we use '/' instead of '=' to make the directedness clearer?):

            root
             /\
 0,"yellow" /  \ 1
           /    \
          2      4

Any of the labels may be used in traversal:

  x = ["yellow"=2 4]
  x."yellow" == 2

An object pointed to by a node may itself be a node that points to other objects. However, primitive values, such as "2", cannot point to anything.

Example: A node that points to two nodes, one that points to strings describing some fruits, and one that points to strings describing some vegetables.

  ["fruits"=["apple" "pear"] "vegetables"=["carrot" "brocoli"]]

                 root
                  /\
      0,"fruits" /  \ 1,"vegetables"
                /    \
               /      \
              /        \
             /          \
            /            \
           /              \
          /\              /\
         /  \            /  \
        /    \          /    \
  "apple"  "pear"  "carrot"  "brocoli"

todo num labels

"." is left-associative and may be used to traverse: x = ["fruits"=["apple" "pear"] "vegetables"=["carrot" "brocoli"]] x."fruits".1 == "pear"

 <code>[name | value1, value2, etc]</code> is notation for a node with a node id (global label) name. It is an error (compile-time if possible, but runtime in general) to assign the same name to two different nodes. 

Example:

 ["x" | fruits=["apple" "pear"] vegetables=["y" | "carrot" "brocoli"]]


                 root, "x"
                  /\
      0,"fruits" /  \ 1,"vegetables"
                /    \
               /      \
              /        \
             /          \
            /            \
           /              \
          *               "y"
          /\              /\
         /  \            /  \
        /    \          /    \
  "apple"  "pear"  "carrot"  "brocoli"

(the * represents the unnamed node in the example)

Labeled nodes may be pointed to using the notation x..name, where x is a variable that refers to the G-constructor.

  x = ["x" | fruits=["apple" "pear"] vegetables=["y" | "carrot" "brocoli"]]
  x.."y".0 == "carrot"

Within the constructor, ".." may be used with no variable to its left. Using node labels, cycles may be created:

 Example: ["s" | .."s"]
 "s"
 / \
 \_/
 

By referencing parts of other G-constructors, one can create nodes with multiple parents without cycles

Example:

  x = ["a" | ["b" | 2], 3]
  y = ["c" | x.."b"]

Now y looks like:

 root,"c"  "a"
        \  / \
        "b"   3
         |
         2

Note: the graph in "x" is unchanged! the values of x and y have not been "linked" together in any way, it is as if a copy was made of some of the contents of x and put into y. If you don't want time and memory to be spent making a copy, and you don't have any more use for the graph in x, then don't use the variable "x" afterwards and the compiler will probably notice and optimize under the hood by just using the part of memory that it used to call "x" for y and adding node "c" in-place. If you want to make sure, use the variable "x" throughout, "x = ["a" | ["b" | 2], 3]; x = ["c" | x.."b"]", which guarantees that the compiler will mutate x in-place rather than copying. In any case, the compiler may choose to defer the copy unless and until it wants to do it. Unless x is a mutuable variable, the compiler will probably simply internally refer to x when "b" is called for.

Note: only the part of the graph that is reachable by a directed path from x.."b" is included in y.

Note: the node labels are only available from variables bound to the G-constructor in which they were created:

Example:

  x = ["a" | ["b" | 2], 3]
  y = ["c" | x.."b"]

  x.."b"    # succeeds
  y.."c"    # succeeds 
  x.."c"    # error
  y.."b"    # error

The reason it works this way is that a G-constructor actually returns not the root node, but rather an "index node object", which is a node that contains an assoc table from node labels to nodes. one of the nodes is labeled "root". for convenience, "." is actually a shortcut for "..root." ".." is used to directly access the index node.

To create a variable with all the labels, use the midx ("merge index") command, which takes a list and returns a list of the same length. Each item in the return list contains the same node as the corresponding item in the input list; but all items in the return list can access any of the node names anywhere in the input list:

  [x y] = ([x y] midx)
  x.."c"    # succeeds

Again, however, it should be noted that the values of the nodes are not "linked".

It is an error if there is a name clash between the labels of any of the nodes being merged (except for the label "root"). This can be avoided when names are hardcoded by using symbols for names (todo). When labels are not symbols, use of midx generates a compiler warning. When labels are dynamically computed at runtime, errors can be avoided by using the midxrn function which renames clashing names and returns a table listing the new names (todo), or by using midxic ("merge index ignore clash"), which resolves name clashes differently for each variable in the list by preferring that index's names over incoming names, and then preferring name bindings from the beginning of the list over names from the end, or by using midxo ("merge index overwrite"), which resolves name clashes uniformally for all variables in the list (i.e. all variables end up with copies of the same index) by preferring name bindings from the beginning of the list.

As you might guess from the compiler warning, my recommendation is that you avoid midx except for when you are using hardcoded labels (preferably symbols). Usually midxrn is the right choice otherwise.

Within a graph, the reserved word "this" always refers to the current node.

Example: a node with a self-loop may also be created as follows: [this]

Within a G-constructor, the reserved word "root" refers to the root node. Example:

[1, [2, root]]


    root _
     /\   |
    /  \  |
   1   /\_|
      2  

Using the "unshadow" token operator, "^", on "this", the (lexical) parent node may be referred to. Repeating the unshadow operator allows reference to the (lexical) grandparent.

Example:

 [1, [2, ^this]]
    root _
     /\   |
    /  \  |
   1   /\_|
      2  
 [1, [2, ^this.0]]  
    root 
     /\   
    /  \  
   1   /\
      2  1
  1. note; the ^this.0 is evaluated at the time of graph construction; if one of the 1s is changed later, the other wont be affected
 [1, [2, [3, ^^this]]]
    root ___
     /\     |
    /  \    |
   1   /\   |
      2 /\  |
       3  \_|

Not all edges of a node have to have integer labels (in fact, none of them do). To separate those edges with integer labels from those without, use the "--" symbol. Edges with integer labels go on the left. Edges on the right must have an explicit label. The integer labels are no different from other labels; the -- syntax just tells the constructor not to automatically add them.

  [1 -- "yellow"=2]
            root
             /\
          0 /  \ "yellow"
           /    \
          1      2
  [-- "yellow"=2]
            root
             /
   "yellow" /  
           /  
          2   

many different edges from the same node may all point to the same destination object:

  [1, ["red"=root, "yellow"=root]]
        <-------------|
    root<--           |
     /\   |           |
    /  \  |2, "yellow"|
   1   /\_|           |
      |               |
      |_______________|  
           1, "red"

(btw the only reason i'm not drawing arrows on most edges here is that it's hard to do -- all edges are directed)

An edge may have multiple edges with the same label, but in this case . will only follow the "first" one (first is the one with the lowest integer label; if all the edges with the given label are non-integer, todo)

To add an edge programmatically, use "ins":

x = [1 2]
x = 3 x ins
x == [1 2 3]
x = x "red"="apple" ins
x == [1 2 3 "red"="apple"]
# alternative
x = [1 2 3]
x = "apple" x lb="red" ins
x == [1 2 3 "red"="apple"]
# use i=n keyword arg to ins to not assign the next integer to new label
x = "banana" x lb="yellow" i=n ins
x == [1 2 3 "red"="apple" -- "yellow"="banana"]
# or, alternatively, if x.__ordered = f, ins will not assign an integer:
x = [1 2 3]
x.__ordered = f
x = x "yellow"="banana" ins
x == [1 2 3 -- "yellow"="banana"]

To get or set a node's label use its "lb" edge:

 x = [1 [2]]
 x.lb = "apple"
 x == ["apple" | 1 [2]]

To operate on nodes other than root, use assignment syntax:

 x = [1 [2]]
 x.1.___lb = "apple"
 x == [1 ["apple" | 2]]

To access a node which represents an edge (this is called "reifying" the edge), use the '^.' operator in place of '.'. The reified edge has labels "src", "dst", "lb". Note that the contents of lb is a list:

 x = [1 "yellow"=2]
 x^.1.dst == 2
 x^."yellow".lb == [1, "yellow"]

You can change an edge's labels or even its source or destination:

 x = [1 "yellow"=2]
 x^."yellow".lb = [3 "red"]
 x == [1 -- [3,"red"]=2]

To insert the contents of one list into another, you can use the @ operator:

  x = [2 3]
  y = [1 @x 4]
  y == [1 2 3 4]

example: a single node with value "10": ex = [10] ex.n == 10

example: a single node that points to itself:

ex = ['s | s]
ex = [this]

todotodo

example: a list

ex = ["apple" "banana" "cherry"]
ex = ["apple" "banana" "cherry"]
ex.0 == "apple"
ex.1:2 = ["banana" "cherry"]
ex = $[apple banana cherry]
ex = "apple", "banana", "cherry"
ex2 = ["grapefruit", @ex]
ex2 = ["grapefruit","apple","banana","cherry"]
fruit2 = "banana"; ex = $[apple `fruit2 cherry]
fruit23 = $[banana cherry]; ex = $[apple `@fruit23]

example: an association table
ex = [
         apple = red
	 banana = yellow
	 cherry = red
ex = assoc $[[apple red] [banana yellow] [cherry red]]
ex = assoc $[
	apple red
	banana yellow
	cherry red
ex = assoc [
	"apple" "red"
	"banana" "yellow"
	"cherry" "red"

todotodo

idea: when multiple edges have the same name, allow to get the set, but also "any" and "all" operators. "any" picks an arbitrary edge out of the set (like, the first one), and "all" does something to all of them (parallel programming). actually, better: the std get is short for getSet composed with any (or should it be all? these are the same in the degenerate case of unique names). use the same mechanism to deal with multiple nodes with the same name. so, when a node is linked to a named node, it can be linked to "any" node with that name (creating one edge), or to "all" of them (creating possibly more than one edge). now you don't have the ugly problem of namespace conflicts when you merge graphs.


Function definition

Just use an =s sign to write an equation. The function on the left will be defined in the current namespace.

To make f work like "+" (although f won't be infix):

 y x f = x y +

Partial function application

If function f takes 2 arguments, and you want to call it with its first argument x and its second one y, then

  result = y x f

is the way to do that. However, you can also do

  g = x f

In this case, the value of "g" is a (partially applied) function, namely, the function f but with its first argument fixed to y. That leaves one argument to be specified, so g is a function that takes one argument. You can call g just like any function. The result of g is defined by

  y g == y x f

Oot's expressions are right-associative, for example,

  y x f == y (x f)

So, in Oot, passing multiple arguments to a function is equivalent to passing it just one argument and creating a partial function, then passing that function a second argument to make another function, etc.

Code inside G-constructors

Children of G-constructors are whitespace-delimited. That means that, in order to put code inside of a G-constructor, it must contain no spaces. So

 [y x f]

isn't interpreted as y applied to the result of x applied to f, but rather as three separate children, y, then x, then f. To get y applied to the result of x applied to f, you'd do:

 [(y)(x)(f)]

or

 [y,x,f]

Another example:

 [y,x,f b,a,g] has two children, y applied to the result of x applied to f, and b applied to the result of a applied to g.

Infix operators

An "operator" is any non-reserved symbol which is not alphanumeric. (todo)

Operators are used just like functions, but operators which are at least binary may be used as a tight-binding binary infix operator by attaching it to its left and right operands, when those operands are themselves alphanumeric or surrounded by parens (todo). This has the effect of putting parens around the attached tokens, putting a space in between them, and moving the operator to the right. For example,

 a*b c +
 ->
 (a b *) c +
 (a d +)*(b) c +
 ->
 ((a d +) b *) c +

Operators may be attached on one side, which has the effect of making the attached chain the result of partially applying the operator to the attached operand:

 ((a d +) *b) c +
 ->
 ((a d +) (b *)) c +

In addition, there are "loose binding infix operators". The predefined loose binding infix ops include '==', '!=', '||', "&&', '=~', '>', '<', '>=', '<=', '<==>' ("spaceship operator"), 'in' (todo: find others to include). When freestanding, a loose-binding infix operator has the effect of putting parentheses around everything to its left (within its containing subexpression) and of putting parentheses around everything to its left (within its containing subexpression). Example:

  2 1 + <= (2 5 +) 5 -
  -> 
  (2 1 +) <= ((2 5 +) 5 -)

Loose-infix operators may not be attached on just one side. To partially apply a loose infix to one operand, and present the result as an ordinary function which is waiting for the other side, use parens, e.g.:

 3 (5 <=) == (5 <= 3)

To use a loose-infix like an ordinary function, put parens around it alone, e.g.:

 (5 3 (<=)) == (3 <= 5)

(note: this is like what haskell calls "sections") (todo: reverse the ordering there?)

Surrounding a non-operator with parens and attaching the result to something has the effect of putting parens around the whole thing and putting spaces in between the components:

 (y)(x)(f) == (y x f)

(these spaces do not make this into separate items in a list)

Users can define new loose infix operators but they must be named by symbols starting or ending with of at least one of '=|&~<>' and which don't contain any characters besides those and '!-'. User-defined non-loose-infix operators may not start or end with ''=|&~<>'. This makes it easy for readers to classify unfamiliar operators as loose infix or not by sight. No special declaration is needed to make an operator loose infix; these rules are sufficient to determine that.

If there are multiple distinct loose infix operators in one expression, or multiple distinct tight infix operators in one attached chain, they must be disambiguated with another grouping construct; an expression like "a || b == t" is illegal, as is "a*b+c".

If there are multiple instances of the same loose infix operators in one expression, or multiple instances of the same tight infix ops in one attached chain, then by default they must be disambiguated with another grouping construct, but infix operator may be declared associative, or they may be declared to have an associative combination operator by which their results are combined. Examples:

 && infixassoc
 && <= infixcombine

the former means that (a && b) && c == a && (b && c), which renders an expression like a && b && c unambiguous. The latter means that an expression like a <= b <= c is translated to (a <= b) && (b <= c). With infixcombine, the combination operator (the second, or leftmost argument to infixcombine) must itself be declared infixassoc.

Note: subeq and sub (subset containment) are expressed by <= and <.

Note: if you know other languages with various different "precedence levels" and associativity rules for operators, Oot can be roughly characterized as having:

^~<>. A loose infix op may no attached args or 2, but not 1.

Subtraction

Negative number literals are written with a '-' attached on the left. A freestanding '-' denotes the subtraction function, which is a normal postfix function.

Fancy grouping syntax

Parentheses group in the usual way.

Without any grouping, expressions are right associative. So

  z y x f = z (y (x f))
  

(actually, mb z y x f should be like z(y,x,f) in other langs; one-way auto-currying, rather than using Haskell's convention of never having tuple arguments)

(Freestanding) '//' is a grouping construct that has the effect of putting parentheses around everything to its left (until it hits an infix operator or the boundary of the containing subexpression). If there are multiple ones on a line, the one on the left "acts first". It is the equivalent of the Haskell's '$'. So, for example,

   x (y g // f) h == x ((y g) f) h

',' is a grouping construct that has the effect of putting parentheses around the clause to its left, and also putting parentheses the clause to its right, where a clause is everything until you hit the beginning or the end of the containing subexpression, or an infix operator, or one of ',', '', skipping over subexpressions.

(after ditching right-associativity, isn't and , the same then? not sure)

','s may be attached or freestanding.

For example,

 (2 1 f  , 4 3 g  , h)   (2 1 f  , 4 3 g  , j)  a // 5 b // 6 c
 ==
 ((2 1 f ,  4 3 g  , h)   (2 1 f  , 4 3 g  , j)  a) 5 b // 6 c
 ==
 (((2 1 f ,  4 3 g  , h)   (2 1 f  , 4 3 g  , j)  a) 5 b) 6 c
 ==
 ((((2 1 f)   (4 3 g) (h))   ((2 1 f) (4 3 g) (j))  a) 5 b) 6 c
 ==
 ((((2 (1 f))   ((4 (3 g)) h))   (((2 (1 f)) ((4 (3 g)) j))  a)) (5 b)) (6 c)

(Attached) '.' is a weird grouping construct and a special infix operator. It "binds tightly", like an infix operator, and which has the effect of putting a '(' to the left of the subexpression or function immediately to its left and a ')' to the right of the subexpression or function immediately to its right. '.'s are left-associative with other '.'s. It must be attached to both its left and right operands. What it does is to evaluate the thing on its right, and then feed it to the thing on its left

So,

 3 + x.y.z.w + 2 == 3 + (w (z (y x))) + 2

So it reverses the things embedded in the sequence. In addition, '.' has a special meaning when it is found on the left hand side ("lhs") of a freestanding '='. We'll see later that '.' is useful for things that work like attribute access (struct field access), or things that work like indexing into a list, or things that work like looking up a value from an association table based on a key, or things that work like tree or graph traversal.

Variadic functions start with *

todo: use a non-chorded prefix like - instead of *?

If a function's name starts with *, for example "*func", then it is a variadic function, meaning that it takes an arbitrary number of arguments. An expressions whose head is a variadic function, and where the rest of the arguments do not contain an uncontained loose infix operator, is parsed differently; it is not right-associative; instead, each item is passed as a separate argument to the function. If the rest of the arguments on the line do contain an uncontained loose infix operator, then it is as if everything but the *func on the line is surrounded by parens, and constitutes a single argument to the *func. Subexpressions within parens (or on subsequent, indented lines, which amount to the same thing) are parsed normally -- but subsequent, indented lines each constitute a separate argument to the *fn. This all is easier to understand by examples:

 a b c d *blah
   -> (a) (b) (c) (d) *blah
   -> (a) (b) (c) (d) *blah
 c d *blah
  b
  a
   -> (a) (b) (c) (d) *blah
 a <= b c d *blah
   -> (a <= (b (c d))) *blah
   -> (a <= (b (c d))) *blah
 e (a <= b c d) f *blah
  -> (e) (a <= (b (c d))) (f) *blah
  -> (e) (a <= (b (c d))) (f) *blah
 f *blah
  a <= b c d
  e
  -> (e) (a <= (b (c d))) (f) *blah
 e f (a <= b c d) *blah
  -> (e) (f) (a <= (b (c d))) *blah
 a <= b c d *blah
  f
  e
  -> (e) (f) (a <= (b (c d))) *blah

Some variadic functions that you should know:

The variadic forms of conditionals. Examples: a <= b c d *if t e

is like "if a <= b c d then t else e" in some other languages, and equivalent to "e t (a <= b c d) if" in oot.

   a <= b c d *if
     t
     e

optional convention: the "then" is indented extra, to make it easier to distinguish from the "else", so you should write the above like:

   a <= b c d *if
       t
     e
 defaultFoo *cd
  a < b
   foo1
  a == b
   foo2
  a > b
   foo3

->

 [[a<b foo1] [a==b foo2] [a>b foo3]] defaultFoo cd

(remember, cd is like "cond" in lisp)

optional convention: the actions associated with each condition in "cond" are indented an extra step

 defaultFoo x *sw
  'a'
    foo1
  'b'
    foo2
  ->

[['a' foo1] ['b' foo2]] defaultFoo x sw

(sw is "switch" -- it's like a cond where each conditional is an == comparison of the switch expression (x) to the relevant item ('a', 'b'))

also, many basic operators, such as arithmatic operators, have a variadic sister form in the standard library that does a fold over the arguments with that operator.

1 2 3 4 *+ == 1+2+3+4 1 2 3 4 == 1*2*3*4

if you have a graph node, and you want to pass its edges into a function as arguments, use @:

l = [1 2 3 4] @l == 1*2*3*4

the region of arguments to which a variadic function applies goes to the left until up until '', ',', or the edge of the containing subexpression.

Exceptions: the following predefined functions are not variadic: * (fold) (multiplication)

How to read Oot expressions

If there is just one symbol then you're done, if there's just one subexpression then recurse :)

First, if there is an infix operator, then it divides everything to its left from everything to its right. todo: update

For the most part, the computation proceeds from the left to the right.

Next, note each ')' or '//'. The things immediately to the left of these, and also the last thing on the right side of the line, are the functions that will be acting on arguments, the "active functions". These things may be symbols denoting functions, or thery may be subexpressions which evaluate to functions.

Look at the rightmost active function on the line -- this is the last function that will be applied, taking as arguments all of the other things to its left, up to any //s (if there is a //, then everything to its left is the last argument of the current active function). If there is a //, then look to its left to find the active expression, and look over it. Repeat. Now you have a general, top-down sense of what functions are demanding the various arguments in the expression.

Now start at the left and read left-to-right, to understand how each subexpression is calculated. This is a bottom-up viewpoint, in the sense that you will start out by seeing how each subexpression is calculated, and then after progressing through each set of arguments, you will see the active function that eats those arguments.

When there are ,s, they separate two subexpressions which are each arguments of the same active function.

When there are //s, the things to the left of the // are computed first, and then they are fed to the active function controlling the // as its final argument. So //s represent a pipeline (or composition of functions) which can be read from left to right.

Sequences of things separated by '.'s are, like '//'s, pipelines where the thing on the left is calculated first, and then the result is fed to the thing on the right. The difference is that the "thing on the left" is as narrow as possible, rather than as wide as possible as with '//'. Technically, '.' is an active function, but TODO prob here

The intent of '.' is for it to be used for things where it is easier to think about what is going on by reading the things in between the '.'s from left to right. For example, imagine that lookup in a 2-D array is done like this: pass the row number to a row lookup function, and then, representing that row, you get back a "column lookup function" for that row. Now pass the column number to the column lookup function, and it returns the value at that location. So, to lookup row 5 column 2, you do "2 (5 row_lookup)"; "(5 row_lookup)" evaluated to the column lookup function for row 5, and you pass 2 to that, and get back the value stored at row 5, col 2. You could write this "row_lookup.5.2" which is shorter and seems to be easier to read.

Example:

 (2 1 f  , 4 3 g  , h)   (1+1 1 f  , 4 x.y.z g  , j)  a // 5 b // 6 c

Here we have 3-stage pipeline with active functions a,b,c. First "(2 1 f , 4 3 g , h) (1+1 1 f , 4 x.y.z g , j) a" will be computed, then its result will be used as the last argument of b (with 5 being the other argument to b), and finally the result of that will be used as the last argument of c.

a is taking 2 arguments, each subexpressions. The active function in the subexpression which computes a's first argument is j and the active active function in the subexpression for a's last argument is h.

h itself takes two arguments, given by (4 3 g) and (2 1 f). j also takes two arguments. j's first argument, "4 x.y.z g", is a subexpression which has an active function g which takes two arguments, x.y.z and 4. "x.y.z" means "z (y x)". j's last argument is "(1+1) 1 f".

Why?

Oot is "backwards" from most programming languages so that the main flow of the computation within an expression goes left-to-right rather than right to left. This means that to get a top-down sense of what is going on, you must read Oot from right to left, and to get a bottom-up sense, read it from left to right.

I find that for short or easy to understand expressions, I want to read it once top-down and then I understand. For long or hard to understand expressions, I find that I first glace through it to get a top-down overview, and then I slowly read through the expression bottom-up. It is easier to read from left-to-right as compared to right-to-left, because the spelling of each word is left-to-right. For a short expression, the difference doesn't really matter, because it's easy to move your eyes to either the leftmost or the rightmost end of the expression.

So, traditional programming languages are good for the top-down phase but bad for the bottom-up phase, whereas Oot is good for the bottom-up phase but bad for the top-down phase. Although the top-down phase is more common, this is due to the prevalence of expressions that are short and easy to read in any case; it seems that the long, difficult expressions will be easier in Oot.

These are just personal observations which have not been thought about much or checked extensively or objectively.

The // operator is inspired by Haskell's $ and is a concise, readable alternative to the common "pipeline" situation described above. The . operator mimics how object attribute access reads in other languages such as Python. The ',' acts much like it does in languages with a f(x,y) syntax; it allows you to separate subexpressions without typing a zillon parentheses (also, it is easier to read something with parens at a high nesting level with commas inside them then it is to read something using nested parens, i.e. parens on both nesting levels).

The approach to infix operators is motivated by a desire to allow users to define infix ops but to avoid having user-defined precedence levels which cannot be immediately inferred by sight, as this makes code hard to read. Although there are in effect three precedence levels, users can tell whether an operator is infix, and what its precedence is if so, by sight (are there attached args? does the operator start or end with the loose infix chars?). If there are multiple infix operators present on the same precedence level, then either their grouping doesn't matter, or it must be specified explicitly; so the reader doesn't have to apply associativity rules to determine how infix operators group. If an operator is not infix, then it is right-associative. So all the reader has to do is to identify operators with attached arguments, identify operators that start or end with the loose infix chars, and apply right-associativity to the rest.

In my view, there are three reasons to want tight-binding infix operators; (1) saving a few keystrokes ("a*b" vs "a b *"); (2) removing parens via associativity ("1*2*3*4" vs "((1 2 *) 3 *) 4 *"); (3) removing parens via tight binding ("a*b c +" vs "(a*b) c +"). #1 is trivial, #2 can be accompilished in a concise manner using the * (fold) token operator.

In my view, the main reason to want loose-binding infix operators is that it seems easier to read "blah blah blah

blah blah blah" than "(blah blah blah) (blah blah blah)", probably because the "" in the middle reminds you what the relationship between the arguments is as you pass from one to the other. Another reason is that symmetric comparison operators are things which compare two things, so it's visually nice to be able to have them symmetrically in between the two things being compared.

Keyword arguments

 exponent base g = base^exponent
 1 2 g # == 2
 base=2 exponent=1 g # == 2
  exponent=1 2 g # == 2
  exponent=1 base=2 g # == 2
  2 exponent=1 g # == 2
  1 base=2 g # ==2

As each keyword argument is encountered, if it matches a keyword slot in the function definition, it is applied to that slot. If it does not match any slot, and the function in question has a *more argument (see below), then the function can access that keyword argument using *more. Otherwise, if it does not match any slot, a compile-time error is emitted.

Note that if you use a keyword to assign to an argument, then that argument is "skipped over" when it's position comes up, requiring care in reading:

 toMultiplyBy toSubtract toStartWith h = (toStartWith toSubtract -)*toMultiplyBy
 
 3 2 1 h # == -3
 3 1 toSubtract=2 h # == -3; the same, because toSubtract was skipped
  1. that is to say, toMultiplyBy toSubtract toStartWith h == toMultiplyBy toStartWith (toSubtract=toSubtract h)

It is possible to give a keyword argument which is discarded if it does not match; to do this, use keyword?=value

e.g. exponent=1 2 unused?="hi" g # == 2 2 exponent=1 unused?="hi" g # == 2

It is possible to give a keyword argument which, if it does not match, is treated as a positional argument, as if there were no keyword; do do this, use keyword-=value

e.g. exponent=1 badKeyword-=2 g # == 2

If no value is specified for a keyword argument in a function call, then 't' is assumed to be the value. This saves you one character when sending optional arguments to functions which are really just binary switches that turn on some non-default behavior, e.g. word ignorecase= find

Usually, once you apply an argument to a function, you can't change it later with a keyword. If you want to do that, use the function applyArgsAsDefault:

 1 base=3 base=2 g # error; base assigned to twice
 1 base=2 base=2 g # error; base assigned to twice
 a = (base=2 g); 1 base=2 a # error; base assigned to twice
 a = (2 g); 1 base=2 a # error; base assigned to twice
 a = [2] g applyArgsAsDefault; 1 base=3 a # == 3
 a = base=2 g applyArgsAsDefault; 1 base=3 a # == 3
 1 2 blah?=4 g # == 2
 A "*more" argument must be the last (leftmost) in the function definition, and signals that the function does something with arbitrary keyword arguments. Within the function body, the unasked-for keywords and their values are edges and children of a node assigned to *more.
 "unasked-for keyword"="val" "hi" printFirstKeyword # hi\nunasked-for keyword

Default arguments

In function definitions, default arguments be the "first", or rightmost arguments. Arguments which have defaults ARE NOT positional, and can only be given by keywords.

 x=1 f = x + 3
 f == 4
 5 f # is an error; the x argument is not positional, only keyword
 x=5 f == 8
 exponent base=e g = base^exponent
 
 1 g == 2.718...
 1 base=2 g == 2

Neutralizing a function

If you have given a function all its required arguments (i.e. those without defaults), but rather than evaluating it, you want it to sit as a partially applied function waiting for possible default overrides, you can "neutralize" it with the "neu" higher order function. A neutralized function eats its arguments like normal, but then when all its required args have been supplied, it yields itself (with the arguments applied), rather than whatever value it is supposed to yield. Then to make it "go" later, us "unneu".

 exponent base=e g = base^exponent
 1 (g neu) #== 1 (g neu)
 (1 (g neu)) unneu # == 2.718...
 a = 1 (g neu); b = base=2 a; b unneu # == 2

sq

todo: NO, this stuff should be always, sq is for strict eval

sq (short for "sequence") labels a body of consecutive lines that will be written as if they will executed in sequence. For example,

x f = sq
  y = x + 3
  z = y * 2
  z + 1

This looks like an imperative sequence of steps but actually the compiler will translate it to an expression; the value of the last line will be returned as the value of the expression, but with variable binding in the previous line substituted in to that, and then the variable binding in the previous line substituted in to that, etc.

The effect is similar to this Haskell code (todo: check 4 mistakes):

f x = 
  let y = x + 3 in
    let z = y * 2 in
      z + 1

sq is similar to Haskell's "do" except that it doesn't use monads or produce a monadic type; it's the let-binding syntactic sugar of Haskell's do without the monads. sq also has some extra syntactic sugar, see below.

Convenience mutation

 x = 5
 x = x + 1
 x pr

->

 x = 5
 x2 = x + 1
 x2 pr

(so, this is basically saying that symbols act like normal "variables", even when statements are otherwise unordered)

In place convenience mutuation

If you want to use a variable within an expression, and then replace the value of that variable with the result of the expression, then attach a postfix '=' to the variable:

x= sort
->
x = x sort
-> (reducing using convenience mutation, as described above)
x2 = x sort
.. ('x's below replaced by 'x2's) ..
x = 0
x= ++1
->
x = x ++1
->
x2 = x ++1
.. ('x's below replaced by 'x2's) ..

In place convenience mutuation shortcuts for ++1 and --1

'x+++' is a shortcut for "x= ++1" and "x---" is a shortcut for "x= --1"

assignment to graph objects

'.' must be the operator on the top level of the l-value

 g = [3 4]
 g.0 = 1
 -> g = 1 0 g __set 
 -> ... (convenience mutation)
 g.0 == 1
 g = [3 4]
 g.1+++
 ->
 g = [3 4]
 g.1 = g.1 ++1
 -> ... (assignment to graph objects)
 g == [3 5]

end of sq section


Pronouns

"it"

When you call a function (or use an infix operator) inside an sq, the symbol "it" may (or may not) be rebound by that function. That is, the function gets to reach out of its lexical context and change the value of "it" in your lexical context.

For example, if you match a string against a regular expression using the =~ comparison operator, "it" is bound to a list of matches.

 "hi there" =~ r<<th..e>>
 it.0.txt == "there"

This is similar to how Perl binds $0, $1, $2, etc when you match a regex.

"it" may be rebound once each line (using convenience mutation).

 "hi there" =~ r<<th..e>>
 "hi dude" =~ r<<d..e>>
 it.0.txt == "dude"

If multiple functions or operators on a single line try to rebind "it", then the outermost one wins.

To convert any function that binds something to "it" into a function that returns an ordered table, where the label of the first element is x' and the label of the second element is it' and the first element is the value that the function returns, and the second element is the value that the function binds to "it", use the "giveit" higher order function. This allows you to get those "it" values when you are not in an sq block:

 g = "hi dude" r<<d..e>> ((=~) giveit)
 g.0 == t
 g.1.0.txt == "dude"

$$

todo: how can this work for fns to call both $$ and non-$$ unless keywords are also positional? mb use -- for keyword only?

$$, outside of a list, is replaced with:

  fn i=n s=n z=n y=n x=n :

i.e.

 f ($$ x g) 
 ->
 f (fn i=n s=n z=n y=n x=n : x g)

in order to make functions passed to higher order functions easy to read, functions that require functional arguments (call these 'subfunctions') are encouraged to call the subfunctions that are passed in with keywords. The following conventions obtain:

For example, foldr is defined like this:

foldr  { [b] a [a b a] = a}
xs0 z0 f foldr = xs0 z0 rgo
  [] s rgo     =  s
  (xs x ins) s rgo = xs (s=s x=x f) rgo

You can write a fold that finds if any item in a list is even like this:

  ($$ s || x 2 mod // !) foldr

(todo: check for mistakes) the 's' and the 'x' help the reader see which argument (s) is the boolean state which is being passed between invocations of the anonymous function, and which argument (x) is the list item.

List comprhensions

<<[1,2,3] | x>=2 >> == [2,3]

<<x*2 : [1,2,3]>> == [2,4,6]

<<x*2 : [1,2,3] | x>=2 >> == [4,6]

Uses x pronoun.

Graph comprehensions

<<< : list comprehension over leaves, any boundary
<flavor<< : list comprehension over leaves, flavor boundary
<!flavor<< : list comprehension over leaves, non-flavor boundary

<<<< : same but all nodes
...
...

<<<<< : same but interior nodes

todo: how to work in graph patterns?

Assertions

To make an assertion, put '!' at the end of the expression to be asserted:

 1 1 ++ == 2 !

another example:

 x printAge
  x > 0 !
  prr "i see that you are $$x years old!"

Depending on the debug mode, the assertion may or may not be evaluated. If the assertion fails, an AssertExc? (short for AssertException?) will be raised. If you would like to provide a string error message along with the assertion, just put it after the '!':

 1 1 ++ == 2 ! 'the sum of one plus one didn''t evaluate to two'

If you want to manually construct the exception that is raised, put an expression after the '!':

 x > (x 1 -) ! s="An exception occured when we tried to subtract 1 from (x+1)" CustomExceptionConstructor

Referential transparency

Oot is supposed to be a referentially transparent language, like Haskell -- except that supports some constructs which do not eem to be referentially transparent, but which can be thought of as syntactic sugar for expressions which are (since the power of all general programming languages is in some sense equivalent, it is certainly debatable whether this makes Oot any more referentially transparent than imperative languages, which claim no pretensions of referentially transparency). Some of the simplest of those constructs are the sq syntax detailed above. Others are part of the error handling facilities. Two other complex ones are covered here.

"&" is used as a prefix for variables that refer to state, "references". Any function that references a & variable (references a reference, heh heh) must declare itself to be using state, and declare "which" state (or states) it uses (todo: define syntax for this); the compiler is prompted by this declaration to, under the hood, convert the function's type to monadic type with a monad similar to Haskell's state monad. This change in type allows the compiler to make sure that other related functions which also, covertly, carry state are declared as such. The compiler API allows IDEs and users to query which functions are "contaminated" by (which) state.

To bind two reference symbols to the same value, use :=, e.g. &a = 3 &b := &a

Functions that are nondeterministic or based on values returned by I/O must also be prefixed by &.

Another way in which referential transparency would be broken is by the use of I/O, specifically, output side-effects. Oot has an equivalent to Haskell's I/O monad, and when you use an I/O function in your function, your function is transparently converted to be within the I/O monad. As with state, the compiler keeps track of which other functions are tainted and can tell you or your IDE; however, no explicit declaration is needed to do I/O, provided the values returned don't affect your result.

Error handling is permitted without declarations. This means that, in theory, you could smuggle arbitrary input or non-deterministic values into parts of the program marked as "pure" by hiding them in exceptions which are then caught. This is, of course, not recommended. But hey, even Haskell has unsafePerformIO.

All the business about stacking the different monads is handled by the compiler.

Graph assignment

[] form

@ form

Token operators

Unary postfix token operators

todo

These bind tighter than infix operators and

! : negation (boolneg) t! == f

!f : apply f x maketrue = t f!maketrue == t

modules

A module is a block of code which shares a namespace (the "module namespace") and which exports a namespace (the "module's exported namespace"). Modules may span multiple files, and one file may contain many modules, but a module may only span files within one directory. Module names should be globally unique, although in the case of a conflict, the module search path defines which module definitions shadow which others.

The contents of a module's namespace is explicitly defined in the module code. The contents of a module's exported namespace is implicitly defined. A module's exported namespace includes the same names as in its own namespace, except for:

A module begins at the first line in a file that is not a comment or a % directive, or at a module declaration. A module declaration looks like

 %module modulename

A module continues until the next module declaration or the end of the file.

A module which is implicitly begun by code at the beginning of a file, rather than being explicitly begun by a %module line, is given a module name which is the name of the directory which contains the file.

Modules may have a version. A version is a sequence of integers separated by dots, with the last element being an integer (not a dot). A version with less dots is equivalent to a version with more dots with "0"s in between them. A total ordering on versions is given by: version A > version B iff they differ and the first differing integer, reading from left to right, is larger in A. For example, 1.1.1 > 1.1 > 0.1 > 0, and 1 = 1.0 = 1.0.0. A module's version is assigned by the %ver directive. Modules without a %ver directive are version 0. Examples:

%ver 1 %ver 0.0.1

If the line immediately after a module declaration contains a ./ /.-delimited comment, then its contents is considered the modules's docstring. If the same module spans multiple parts, and if multiple parts have such a comment, they are all concatenated in unspecified order.

Files can have docstrings too -- if the first line of the file, excluding any lines beginning with #, contains a comment, that is the file's docstring.

Module names can't begin with the "_" character.

import

"imp" (import) is used to add the contents of some other module's exported namespace into the current module's namespace.

Forms of import (these can be mixed and matched): %imp modulename #import modulename

 %imp modulename/version
   #import modulename, require it to be version "version"
 %imp modulename/minversion:maxversion
   #import modulename, require it to be some version in between minversion and maxversion, inclusive
 %imp modulename/minversion:suggested_version:maxversion
   #import modulename, require it to be some version in between minversion and maxversion, inclusive, and
   #if "suggested_version" is available, use that one
 %imp exp modulename
   #import modulename, and also place the imported names into this module's exported namespace
 %imp modulename1 modulename2
   #import multiple modules
 %imp exp modulename1 modulename2
   #import multiple modules, and also place the imported names into this module's exported namespace
 %imp exp modulename=
   #import modulename, but prefix the imported names with modulename.; for example, "a" becomes "modulename.a"
 %imp exp modulename=newname
   #import modulename, but prefix the imported names with newname

To import a module's own namespace, rather than its exported namespace, put a "_" in front of it's name. This causes a compiler error unless the "--exposeprivates" command-line option is given to the compiler.

Pattern matching

Oot patterns are sort of like regular expressions, generalized to match on graphs. They are used in three ways in Oot:

1) To define functions by cases, using pattern matching as a convenient syntax to define the cases
2) To constrain the shapes of data interfaces
3) As part of the specification of graph transformations

Note that since Oot is lazy, some nodes may have infinitely many arcs, in which case such an attempt to match such a structure would result in an infinite loop as Oot attempted to check every arc. However, uses #2 and #3 don't involve checking every arc and so would still be relevant in such a situation.

To match a pattern against something, use the infix "=~" comparison operator, which returns a boolean, and in the case of a match also binds, using convenience mutation, each of the pattern variables in the pattern to their values in the match. Note: for type safety, the compiler must be able to know what the symbols will be and what their types are. If you hardcode the pattern, the compiler will infer this. But, in case you want to dynamically construct a pattern within the program, a mechanism is provided to statically specify the pattern variables and their types; see below.

When =~ is used inside an sq, then in the case of one or more matches, the contents of "it" will be set to a list of tables. Each item in the list is a table that corresponds to one match. Each table maps each pattern variable to its value in that match.

Oot patterns are written within {}s.

Oot patterns look sort of like graph constructors and match top-down, that is, whatever you write will first be matched against the node you are matching against, and then the children of the root of the pattern will each be matched against each child of the node you are matching against, etc, recursively.

The empty pattern always matches:

  x =~ {} 
  == t

All implementation types are Oot patterns. For example, {Bool}.

You can define a new pattern and bind it to a symbol like this:

 x = {Bool}

You can refer to another pattern within a pattern. Here is a pattern that matches a list of two Booleans:

 x = {Bool}
 y = {[x, x]}
 y == {[Bool, Bool]}

Inside patterns, the occurence of a symbol means, "bind whatever is here to this symbol for use later". For instance

todo

use of partial inverses is like ... :^-1 is \exists xs s.t. h:xs = l see http://en.wikipedia.org/wiki/Curry_%28programming_language%29 for the connection to logic programming basically, in Curry, you can define a fn "last" which yields the last element of a list as: last l

xs++[e] =:= l = e where xs,e free
  in oot the equivalent would be
    last l = {[@?xs ?e] == l; e}
  (or, using +- (which may be the tail version of cons):
    last l = {?xs +- ?e == l; e}
  )

Function definitions, pattern matching, and guards

todo

Guards are arbitrary other conditions on a partial function definition, which use the "if" keyword:

 y x f if y > 2 =
   x y div

Interfaces

todo

Different parts of the program may request different interfaces from the same value. For example, there may be a function that creates a binary tree, which supports the "tree" interface, and then passes it into a "map" function which requires the unordered list interface. The actual value which is created will support both interfaces, but the first function sees it only as a tree, and the second only as an unordered list ??todo??

patterns on graphs are the data interface, not the functional interface. Functional interface is a list of fn names and signatures. (umm.. what's the diff?, since __get is fn application? none fundamentally, i guess, but it seems diff)

a "behavior" completely determines the semantics of the value (although behaviors will almost always be unspecified, except where there is default types..)

interfaces have versions, but they must match their module versions

so, the type inference system assigns to each lexical symbol a type, that is, an interface. an interface type may be composite, that is, a conjunction or disjunction of interfaces.

used-as-by-argument: The used-as type of a symbol which is the argument to a function is at least used-as type of the function's signature for that argument used-as-by-assignment source: The used-as type of a symbol which is assigned to another symbol is at least the used-as type of the other symbol

the used-as type of a symbol is the conjunction of all of the used-as types found by used-as-by-argument and used-as-by-assignment-source.

next, the compiler (actually, this is sort of the linker, i guess?) makes a list of all known interfaces, and connects them into a graph by using interface adaptors as edges. (todo: with parametric interfaces a more complicated approach may be needed?). with each interface, it associates the list of reachable interfaces.

next, the compiler makes a list of all known implementations. For each one, it determines the set of supported interfaces by starting with the set of directly supported interfaces, then substituting each interface with its set of reachable (adaptable-to) interfaces.

next, for each constructor in the program, an implementation must be chosen. the implementation must support all of the used-as interfaces (behaviors, actually; and note that no implementation can support more than one behavior with the same interface; altho mb behaviors are only specified with constructors -- and "my choice" types?) of the constructor. if different implementations are chosen for the constructors that may be assigned to the same symbol, then vtables will have to be used -- compiler command-line switches can specify how important it is to avoid this (in cases where the other hints, namely type tags and specificity, argue towards using vtables). no, the behavior can be specified anywhere along the inference chain, not just at the constructor. i don't think "my choice" (i.e. universal vs. existential types) are needed (anyone, the provider and the client, can add interfaces, but each will only see their interfaces), but i'm not sure.

Type annotations

Type annotations tell the compiler that a certain variable must support a certain interface.

Examples: x{int} = 3

 x = 3{int}
 ["a"="apple" "b"="banana"]{node}
 ["apple" "banana"]{list}

Type tags

Type tages are hints to the compiler or interpreter about which interface implementation to use for a particular variable.

Type tages are written within type annotations following a colon following the interface type. Multiple type tags may be used, separated by commas. Boolean type tags are either present or absent; and parameterized type tags use the same keyword=value syntax as arc labels. Type tags which are earlier in the list have some amount of priority over those later in the list. Even after the implementation is selected, the chosen implementation has access to the type tags, so they can be used to specify optional implementation-level details such as the initial capacity of a resizable array. Example:

 {node: fastLookup, fastInsert, unordered, capacity=100}

this example specifies an object of type "node", and requests an implementation with the properties fastInsert and unordered, and sets the capacity to 100. Perhaps the compiler will choose a hash-table implementation for this node.

Type tags should never change the semantics of an object, only "implementation-level details" that may affect performance.

You may be thinking, "but I know exactly how I want this variable to be implemented, why can't I just say that?". Well, you can. Just make up a new type tag that says exactly what kind of implementation you want; "redBlackTree" or whatever.

Special type tags include the "myChoice" boolean tag, and the "implementation=" keyword tag.

In addition to paying attention to type tags, the compiler tries to choose the "most specific" implementation that supports the required interfaces. That is, if a variable must support interfaces A and B, and implementation X supports A directly, and supports B via an A->B adaptor, and implementation X supports both A and B directly, the compiler will prefer implementation Y.

Remote tagging

If you control the source code of an implementation, you can add tags to it there. But even if you do not, you can remotely "tag" an implementation with a boolean tag. You can also "untag", or remove existing tags:

 %tag moduleName:implementation myNewTagName1 myNewTagName2
 %untag moduleName:implementation tagToRemove1 tagToRemove2

Remote changes you make to tags only affect other modules if your module is imported using the "tags" or "all" modifiers to "import".

Directly choosing the implementation

If you need to constrain some variable to use a particular implementation, use a type tag of the form "implementation=typename". Don't do this, it's bad style. This is provided for debugging purposes. The preferred style is to create a new type tag to express whatever property that implementation has that you want; this way, others can create new implementations later which advertise your new tag. Other than "implementation:" type tags, import statements, and reflection on modules, there is no intentionally no way to refer to individual interface implementations.

Implementations of interfaces

In many cases, you will want to just write one implementation and as you add new functions to the implementation, you want them to be exported by the interface. So that you don't have to go and write the same thing twice, you can just use the declaration

 iface name auto

which automatically defines an interface of name "name" based on the functions defined in the current module -- except for functions whose name starts with '_', which are considered "private" and not included in the interface, assuming that the first character of the interface name is alphabetical. To define an interface which also includes functions whose name starts with '_', use the above but with a name that starts with the '_' character (todo: thx dana).

By default the compiler will give an error if an interface whose name starts with '_' is used. This is to make sure that it is clear that the convention is not to do this, and it is only for debugging and other weird situations. To make it compile, give the compiler the -exposeprivates flag.

There is a compiler command to produce a normal interface declaration for any interface generated using the "auto" keyword.

Within the implementation of an interface, the type tag "notme" can be used to indicate that the variable may not have this implementation (and, furthermore, the implementation used cannot depend even indirectly on this implementation). This allows an implementation to use delegation/composition (as a substitute for subclassing; see van Roy, "...dummies", p. 32), e.g. something can claim to implement the "list" interface, when actually within the implementation, another "list" implementation is called. the prohibition on indirect dependency prevents a circular dependency between two delegators that each want the other one to do the real work.

Boundarys

In a language where data values of all types are represented as graphs, and containment of one object within another is represented as a link from some node of one object to some node of the other object, it can be hard to tell where one object ends and another begins.

For example, if one had a binary tree, in whose leaves were stored lists, some of which may have nested lists, how would one distinguish a list of two lists from an interior node of the tree? For instance, the graph might look like:

 [ [[5 3]] [[[3 1] 2]] [[3 2]] ]

is [[3 1] 2] an interior node or a leaf?

Note that this problem doesn't occur when you know beforehand the shape of the graph -- it only appears when you want to have general algorithms that apply to graphs of different depths and shapes.

One solution to this problem is to put leaf nodes inside a wrapper that identify them as such. In Oot, a wrapper type is provided, called __boundarys__. The wrapped values are called __leaf__ values, and the graph as a whole is said to be __bounded__. There are a number of functions in the standard library that recurse through a graph, do something with the leaf values, and then terminate the recursion, i.e. even if the leaf values are themselves nodes, the algorithm doesn't recurse into them and follow their arcs. A concise syntax is provided to make an expression terminal within a G-constructor:

  {{expr}}

Now we can disambiguate the previous example:

  [ [{{[5 3]}}] [[{{[3 1]}} 2]] [{{[3 2]}}] ]

is the G-constructor used to express the case where [[3 1] 2] is an interior node, and

  [ [{{[5 3]}}] [{{[[3 1] 2]}}] [{{[3 2]}}] ]

is the case where [[3 1] 2] is a leaf.

The boundary syntax is only available within G-constructors. The terminal value may be of any type. You cannot have a terminal value outside of a graph; if you fetch a terminal value, you just get the value, i.e. it is transparently unwrapped:

  [{{3}}].0 == 3

Flavored boundarys

In some cases we may want to have a single graph which contains various different object boundaries. For example, perhaps we have TODO.

Oot provides a way to attach an arbitary value to a boundary, called the boundary's __flavor__, with the idea being that the flavor tells you what kind of object is being bounded (or, to put it another way, what kind of boundary it is). We might have called this "labeled boundarys", except that a terminal object may have a node label and may be the destination of a labeled edge, so we want to avoid confusion.

Functions in the core library which make use of boundarys can be told to ignore all boundarys except those of a certain flavor, or to ignore all boundarys of a certain flavor. Functions are also provided to find all boundarys of a given set of flavors and replace them with a different flavor (flavor map).

The flavor of a boundary is specified by a value, followed by a colon, in the boundary:

 {{flavor : terminal value}}

Terminatators within a flavor are of flavor nil.

Graph transformations

A convenient way to structurally transform a graph is to pattern match on it and then to construct a new graph based on the values bound to the pattern variables. This can be composed with two graph comprehensions (allowing only a map and a conditional) (one for the forward direction (for gets), and a simple map function for the reverse (for "__set"s); unfortunately, Oot cannot calculate the inverse of a function) to yield a restricted way to transform both the graph's structure and the values within it. This is called a Oot __graph transform__.

There is a convenient syntax for specifying a new data interface as a graph transform of an existing data interface: TODO

Significant whitespace

Like Python and Haskell, Oot has significant whitespace. Like Haskell, this is optional.

A line in Oot is either a physical line, terminated by \n, or a ';'-terminated line. These are the same in Oot, except that '#' denotes a comment that runs until the end of the physical line.

An indented block in Oot is either a physically indented block, that is, a contiguous group of nonempty physical lines all prefixed by at least certain number of spaces, (the indentation), or an explicitly delimited block, delimited by parentheses (in code; but see below), or by []s (in g-constructors). Tabs are interpreted as 4 spaces. Within a G-constructor, a line with only whitespace terminates a physically indented block, but a line with only a comment does not; within code, a physically indented block is terminated by another line without as much indentation. In both cases the end of file terminates the block.

Physical blocks are interpreted differently in ordinary code and inside g-constructors. Inside g-constructors, the contents of an indented block is interpreted as if it had []s around it. To open a G-constructor from within code, use a '[' as the last thing on the line; the G-constructor goes until the end of the indented block. Similarly, an indented block after a '

' as the last thing on a line is interpreted as the edges of that node, rather than as a child node. In other cases, an indented block within a G-constructor represents a child node of the containing node. To make the first line inside a G-constructor be a grandchild rather than a child of the root, either have multiple children, separated by blank lines, or have a [. For example,
 x = [
       "apple"
       "banana"

is the same as

 x = ["apple" "banana"] 

and

 x = [[
         'name="apple"
         'color="red"
         'price=40
 pr x.0.name

is the same as

 x = [['name="apple" 'color="red" 'price=40]]; pr x.0.name

and

 x = [
         'name="apple"
         'color="red"
         'price=40
 
         'name="banana"
         'color="yellow"
         'price=50
 pr x.1.name

is the same as

 x = [['name="apple" 'color="red" 'price=40] ['name="banana" 'color="yellow" 'price=50]]; pr x.1.name
 x = [
       appleState |
	    'transitions=
	        "0"=self "a"=self "c"=cherryState
	    'color="red"
       bananaState |
	    'transitions=
	        "a"=appleState
	    'color="yellow"
       cherryState |
	    'transitions=
 	        "0"=self "a"=appleState
	    'color="red"
 pr x..bananaState."a".'color #prints "red"

is the same as

 x = [[appleState | 'transitions=["0"=self "a"=self "c"=cherryState] 'color="red"] [bananaState | 'transitions= ["a"=appleState] 'color="yellow"] [cherryState | 'transitions=["0"=self "a"=appleState] 'color="red"]]; pr x..bananaState."a".'color

more examples: [ 1 2 == [1 2]

and

 [
   [
     1
  
     2
 ==
 [[
   1
  
   2
 == [[1 2]]

and

 [
   [
     1
   [
     2
 ==
 [
   1
   2
 == [[1] [2]]

and

 [
   [
     1
   [[
     2
 ==
 [
   1
   [2]
 == [[1] [[2]]]

In ordinary code, a physical block is the same as:

surrounding the contents of each line in the physical block with parens, then inserting the elements on the physical block on the left of the previous line, with the first element down going immediately to the left, and the next one down going to the left of that, etc, and then surrounding the whole thing (including the original contents of the previous line) with parens. E.g.

 f
     x
     y

is the same as

 (y x f)

and

 (2 1 f  , 4 3 g  , h)   (2 1 f  , 4 3 g  , j)  a // 5 b // 6 c

is the same as

 c
     6
     b
         5
         a
             j
                 g
                     3
                     4
                 f
                     1
                     2
             h
                 g
                     3
                     4
                 f
                     1
                     2

or, slightly more readably,

 c
     6
     b
         5
         a
             j
                 4 3 g
                 2 1 f
             h
                 4 3 g
                 2 1 f

or,

 6 c
     5 b
         a
             j
                 4 3 g
                 2 1 f
             h
                 4 3 g
                 2 1 f

->

 (6 c)
     (5 b)
         (a)
            ((2 1 f) (4 3 g) (j))
            ((2 1 f) (4 3 g) (h))

->

 (6 c)
     (5 b)
         (((2 1 f) (4 3 g) (h)) ((2 1 f) (4 3 g) (j)) (a))

->

((((2 1 f) (4 3 g) (h)) ((2 1 f) (4 3 g) (j)) (a)) (5 b)) (6 c)

or (imo the next one is the most readable),

 a // 5 b // 6 c
     j
         4 3 g
         2 1 f
     h
         4 3 g
         2 1 f

or,

 a // 5 b // 6 c
     2 1 f , 4 3 g , j
     2 1 f , 4 3 g , h

Conditionals

if

cd

Short for "conditional". Like Lisp's "cond".

 [[a==1 "one"] [a==2 "two"]] "other" cond
   
   == 
    "other" cond
      [
        a==1 "one"
        a==2 "two"
   ==
    "other" cond
      [
          a==1
          "one"
          a==2
          "two"

sw

Resources

Oot has a mechanism to refer to "resources", which are values stored in files separately from code. Typical uses of resources include storage strings in various languages, binaries such as icons, graphics, or sounds, and GUI layout specifications.

To refer to a resource, prefix an alphanumeric token with a dollar sign. The name of the token is the name of the resource.

A resource may store a node, in which case it can be the hierarchial parent of other resources which can be accessed (for example, $blah.'strings.'blahstring1)

If no resource with the given name can be found, the value of the token is a string that is the token. So, $blah can be used as a shortcut for "blah", with the additional benefit that the string can be easily changed later without modifying source code, by adding a resource named blah. This default behavior can be changed.

The Resource Manager

The way that resources are stored and retrieved is fully customizable using the %rsrcGetFn directive, which allows you to give arbitrary Oot code that will be used to fetch resources. The supplied function will be given a string (the resource name) and must return a resource (or throw an exception, which will become a compile time warning or error).

All resources will be retrieved at compile time and typechecked.

The Resource Manager is not used for keywords.

Symbols

per module todo

Keywords

Keywords are a special type of symbol that are globally non-unique; the value of a given keyword is the same across all modules. Basically, they behave like strings, except that at compile time the compiler may assign unique small integers to each keyword in order to make comparison operations between keywords faster than string comparison. They are written with a prefix of $', e.g.:

 $'this_is_a_keyword

Exceptions

Resume-in-place. Severity Destination (who to tell) Logging

Some syntactic sugar

$[]

A dollar sign attached to the left of a left bracket indicates a G-constructor, except that by default tokens inside the G-constructor, which would usually be evaluated, are instead interpreted as strings, as if they were quoted. A token or parenthetical expression which is prefixed by a backtick is evaluated as if it were inside an ordinary [], rather than a $[]. A token which is prefixed by a single quote is treated as a symbol. []s immediately inside a $[] are treated as if they were also $[]s, unless they are prefixed by a backtick: "$[ `[] ]", in which case they are treated as ordinary []s. Examples:

 x = ["apple" "banana"] 

is the same as

 x = $[apple banana] 

is the same as

 x = $[
     apple
     banana

and

 x = ["fruit"="apple" "color"="red"]

is the same as

 x = $[fruit=apple color=red]

and

 x = [['name="apple" 'color="red" 'price=(40+1)]]

is the same as

 x = $[['name=apple 'color=red 'price=`(40+1)]]

$$ inside strings

Inside a string (but not just inside a $[]), $$token or $$(expr) evaluates the token or expression and inserts the result into the string at that location (the result of the evaluation must be of a type that implements the str interface)

For example,

 a = "dude"
 "hi $$a" prr

prints "hi dude"

$$ inside []s

Inside []s, $$ means "treat the rest of this physical line as code". This means that whitespace will not serve as graph item separators until the end of this physical line.

For example, a = "dude" g = $[ apple banana $$ "hi " ++ a g == ["apple" "banana" "hi dude"]

$$[] is like putting a $$ at the beginning of each line in the list. So,

  g = $$[
      3 + 4
      3
      5 + 2
  g == [7 3 7]

Metaprogramming

x1 x is construct name. $$ (expression eval) and '\<' -> '<' are done by Oot and then Y is parsed by tokenized and parsed by construct and construct returns a value. Note that if Y is dynamically generated, construct will be compiled in and made available at runtime.

       ??construct may accept parameters by giving a regex in place of a static construct name, i.e. <flavor<<.
       mb constrain to just x/comma-sep-keyword-or-no-params<<
         </flavor<<
	 r/i<<
	 x/keyword=value,k2=v2<<
       
       keywords interpreted as strings by default (i.e. k=v -> "k"=v), and "k," with no v -> "'k=t", hence
         in r/i<<>> the args passed to r are "i"=t    
       if you want to pass in the values of symbols, use
         c = "i"; r/`c<<>>
       list of built-in constructs:
             << >> 		graph comprehension
	     r<< >>		regex
	     s<< >>		shell (like backticks in perl)

%x Y ?? macro. x is macro name. Y is tokenized and parsed by Oot and converted to Oot Core (note: unshadowing is resolved here, among other things). x is a fn from Oot Core to Oot Core. Hygenic except for pronouns.

%begin x Y %end x preprocessor. x is construct name. Y is tokenized and parsed by construct; construct returns a string of valid Oot which will replace Y in the source and be tokenized and parsed by Oot later (returned string may itself contain metaprogramming directives including %begin). Should compiler provide anything about the rest of the code?

&& reflection, see below

eval Oot compiler/interpreter which can perhaps access symbols in calling scope (mb only via references) in a typesafe way. if this is not present in the program source (after preprocessing :)) then compiler can elide the Oot interpreter from compiled code.

other multi-stage programming/futures constructs (typesafe substitution, etc)

library fns for modular parts of the typechecking process

Regexs

string =~ r<<>>

matches -> it

case insensitive:

string =~ r/i<<>>

Comments

  1. makes the rest of the physical line into a comment

./ and /. delimit the beginning and end of comments, which may or may not be multiline. they may be nested; ./ hi ./ dude /. /. is all comment


Interfaces

In some languages, there is a way to say that objects of a certain types support a list of function with specified signatures (and possibly semantics, although these are merely written down in natural language, not checked by the compiler). Examples are Java "interfaces" and Haskell "typeclasses" (although there are differences between Java interfaces and Haskell typeclasses), as well as "APIs". The analogous concept in Oot is called a __function interface__.

In addition to function interfaces, there is another concept in Haskell called a __data interface__, which will be described below. An Oot __interface__ consists of both a function interface and a data interface component (although either one may be empty).

Data interfaces

todo

t = Tree([[1 2] [3 4]]) root / \ /\ /\ 1 2 3 4

t*leaflist looks like [1 2 3 4]

print(t*leaflist) [1 2 3 4]

def f(lst) = for i=0:len(lst) if lst[i] == 3: lst[i] = 4 return lst

y = t f(t*leaflist) == Tree([[1 2] [4 4]]) root / \ /\ /\ 1 2 4 4

but y still == Tree([[1 2] [3 4]])

print t*tree [[1 2] [3 4]]

print(f(t*leaflist)*tree) [[1 2] [3 4]]

f([1 2 3 4]) == [1 2 4 4]

so, data interfaces allow you to write functions that take a particular type of graph (like f in the example, which operates on a list of ints), but then use them to manipulate other types of objects, provided that the target object provides a data interface of the required type. the data interface specifies how to translate operations on the data interface value into operations on the target object itself.

  1. so, expressions of the form

so, the type of t*leaflist is list of ints -- however, although it can be manipulated like an ordinary value, t*leaflist is not an ordinary value, and operations on it correspond to operations on t. when you traverse the graph presented by t*leaflist, the result is dynamically generated by t's data interface, and when you apply functions to t*leaflist in order to create mutated versions of it, the mutated versions are actually values of type t generated by t's data interface according to which change you attempted to make in t*leaflist. the list of ints value that t*leaflist appears to present is really just a facade generated by t's data interface.

the burden is on the data interface to guarantee that the result of mutating t by way of mutations on t*leaflist leads to the expected changes in the illusory value t*leaflist (this is a law)

the list of ints value that t*leaflist presents is called the illusory data interface value. If a mutation of t*leaflist is made, the actual changes made to the value of t in order to create t' are opaque if t is an interface type.

Default wrapper type constructors

todo

data Tree a = Leaf a

Branch (Tree a) (Tree a)

wrapper Tree a = Leaf a

Branch (Tree a) (Tree a)

Self

Each lexical context provides a symbol that refers to itself, namely "self".

Example:

0 f = 0
x f = x 1 - / self
->
0 f = 0
x f = x 1 - / f

Constraints

Tokens prefixed by ? represent constraint variables. For instance,

 ?a*b < 12

represents the constraint that a*b is less than 12. similarly,

 ?a*?b < 12

represents the same constraint; but now both a and b are permitted to vary subject to the constraints.

A collection of constraints is asserted by creating a predicate that is, a function that returns t iff the constraints are satisfied, and passing it constraint variables, on a single line. Functions that yield results based on nondeterministic constraint solving must be marked '&'. If they are not, then deterministic constraint solving will be used, if available (hopefully this can be implemented by saving a random seed for the life of the program), or if not, all solved constraints will be cached for the life of the program (!).

 n findADivisorOfN = 
  ?a*?b == n
  ?a
 between3And6 =
  ?x > 3
  ?x < 6
  ?x

Just like normal variables, constraint variables are lexically scoped, and the constraints are solved only when one of their values are needed. Once the value of any constrained variable is fixed, the other variables participating in the same (lexically scoped) constraint network are guaranteed to meet the constraints.

 n findLinkedDivisorsOfN = 
  ?a*?b == n
  [?a ?b]
 ./ "12 findDivisorsOfN" might return "[1 12]" but won't return "[1 3]"...
    even [(12,findDivisorsOfN).0 (12,findDivisorsOfN).1)] is guaranteed not to be [1 3]
    but... /.
 n &ndFindLinkedDivisorsOfN = 
  ?a*?b == n
  [?a ?b]
 ./  [(12,&ndFindDivisorsOfN).0 (12,&ndFindDivisorsOfN).1)] might be [1 3] /.

Constraint variables may be sent through ordinary functions: x plusOne = x+1

 between3and6 =
  ?x plusOne > 2
  ?x < 6
  ?x

If a failure is encountered, a CSFail exception is raised.

Implementation possibly due to http://www.gecode.org/ (which apparently includes graph constraints).

todo: there should be some form of "any/all" mode for constraints to allow the "all" to do something in parallel to everything meeting the constraint.


focus on immutabile data structures

with copy-on-write


goroutines


smart pipes


immutability


views


apply

lets the programmer custome add things like auto-vectorized lists


Some basic functions

arithmetic:

++addition
--subtraction
multiplication (NOT variadic)
div division
pow exponentiation

logical:

or
&and
_not

lists:

+cons an element onto the end of a list
++concatenate two lists together

Shortcuts for map and fold

map, fold, and elementwise versions of operators are such common operations on operators that they are given convenient shortcuts. The '-' (dash) character is used for mapping and elementwise, and the '*' (star) character is used for folding, as follows:

Map

Standalone

An unattached '-' is map:

  [1 2 3] f - == [1,f 2,f 3,f]

As a shortcut, to map an operator over an operand, prefix the operator with a '-', and mark the affected operand with a postfix '-':

  [1 2 3]- 2 -* == [1 2 3] (2 *) map == [1*2 2*2 3*2]

If an operand is marked with a '-' but no operator is prefixed with either a '-' or a '*', then the rightmost operator in the containing subexpression will be taken as the mapping operator:

  [1 2 3]- 2 * == [1 2 3] (2 *) map == [1*2 2*2 3*2]

To map an operator elementwise over all operands to the left of it in the current subexpression, prefix it with a '-':

  [1 2 3] [4 5 6] -+ == [1+4 2+5 3+6]

To map an operator elementwise over some operands, prefix the operator with a '-', and mark the affected operands with a postfix '-':

  //recall:\\ "hello" 2 4 slice == "hello" .2:4 == "ll"
  ["hello" "there"] - 2 [4 5]- slice == ["ll" "ere"]
  quadratic = x a b c : ((a***2)*x b*x +) c +
  11 [1 2 3]- 7 [4 5 6]- -quadratic == [(((1**2)*11,7*11,+) 4+)   (((2**2)*11,7*11,+) 5+)   (((3**2)*11,7*11,+) 6+)]

fold

Standalone

An unattached '*' is fold (todo: foldr or foldl?):

[3 4 5] 1 * == ((1 3 *) 4 *) 5 *

Shortcut variadic fold

To fold an operator over all operands to the left of it in the current subexpression, prefix the operator with a '*'. The initial value will be the result of "(operator) init".

  //note:\\ ++ init == 0
  1 2 3 *++ == (((0 1 +) 2 + ) 3 +) == 5
  1 2 3 *** == 1 2 3 *(**) ==  (((1 1 *) 2 *) 3 *) == 6 /// note: ** init == 1

To use an initial value other than "(operator) init", or if "(operator) init" is not defined, attach the initial value to the left of the "*" which is to attached to the left of the operator:

  ["e" "l" "l"] "ho"*(1 ins) == "hello"

If an operator is variadic, then it must be enclosed in parenthesis before the additional '*' is appended. So, if "*variadic_op" was a variadic operator, then the way to "prefix the operator with a '*'" is "*(*variadic_op)".

To fold an operator while specifying which of its inputs is the "fold" input, prefix the operator with a '*' and mark the fold operand with a postfix '*":

  //note:\\ ins init == []
  ["h" "e" "l" "l" "o"]* -1 *ins == "hello"

If an operand is marked with a '*' but no operator is prefixed with either a '*', then the rightmost operator in the containing subexpression will be taken as the folding operator:

  ["o" "l" "l" "e"]* 1 ins == "hello"

If an operand is marked with a '*', and one or more other operands are marked with a '-', then the operand marked with the '*' will be the fold operand, and the others are taken as lists of inputs to be fed elementwise to the folding operator in synchrony with the fold:

  //note: ins init == []\\
  ["o" "l" "e" "l" "h"]* [0 0 0 1 0]- ins == "hello"

Beware of the asymmetry in the shortcuts: prefixing an operator with "-" and not marking any operands is ELEMENTWISE map, but prefixing an operator with "*" and not marking any operands is VARIADIC fold.

Exceptions

A '-' before a number literal is a negative number.

'' is the multiplication operator; it is not "variadic fold of a fold". '*' is a variadic fold of multiplication.

Reflection

Reflection on any symbol is provided by prefixing the symbol by '&&', which refers to a read-only(?) reflection object.

A reflection object supports the following edge labels:

'___uneval   	     some Oot Core expression of the value bound to this symbol
  note: evaluating &&anything.___uneval is guaranteed to terminate
'___eval     	     ___uneval but with the additional guarantee that the expression has been fully evaluated
  note: the contents of ___eval itself may not be evaluated, however, until its evaluation is forced by printing it or whatever
'___type    	     a primitive expression of the type of the value bound to this symbol
'___symbolname	     the name of the symbol to which && was prepended in the Oot source code
'___symbolfile	     the Oot source code file in which this symbol was defined (may be "eval"!)
'___symbolfileline   the line of the Oot source code  file in which this symbol was defined
'___origsrcexpr	     the expression in the original source code which defined this symbol
'___origsrctype	     the expression in the original source code which defined the type bound to this symbol
'___origsrctypefile
'___origsrctypelfileline
'___lexicalcontext   namespace reflection object
'___module	     module reflection object

Note: if && never appears, then this information, and supporting functionality, can be compiled out; similarly if && is only bound to static symbols, info about most symbols can be compiled out.

Note: &&self may come in handy


syntax (ordered by approximate descending ease of typing):

-   	postfix or unattached: negate/inverse? map token op??    note: also valid within identifiers						
=   	bind
()  	parens
;   	cons? map token op?? eol?
'   	string quote? postfix: nullable/escaped type/meta'ed type
,   	comma grouping
.   	attached: attribute access (follow arc with keyword). attached with one arg: map token op?
          actually isn't follow arc the same as fn application? mb this is just a.b.c = (a b c)
/   	apply (with grouping)
--	subtraction
==  	eq (actually, === is eq, == is eq or isa)
;;	eol? eol and opt-in print?
''      string (ended by '')
,,	vertical separator (like eol in matrix)
..  	index/follow arc
//	comment to EOL?
!   	assert? mutate?
%   	meta
&	and
*       repeat? reference? syntactic fold? macro defns: kleene-star? or \* in macro defns?
_   	infixfy? reference? negate?? prefix: privacy
+	append
{}   	block? type?
:	separate vars from code. annotation? typing? attached: map token op?  slicing? 
"	metaquote?
<   	less than   
>   	greater than
?   	prefix: constraint var; alone: unapplied var marker
[]	list delimit		note: not in ISO 646, but is in T.61	
|   	or  graph comprehensions: where		note: not in ISO 646, but is in T.61	
@   	sub in list, forall numeric children?		note: not in ISO 646, but is in T.61	
#	comment to EOL? forall numeric children, sub in list?  		note: not in ISO 646, but is in T.61							
$   	sugar, as above?		note: not in ISO 646, but is in T.61	
\   	backslash (character escape) note: not in T.61
`	note: not in T.61
^   	annotation?              note: not in T.61
~   	negate?   note: not in T.61, apparently hard to type
!!	assert? mutate?
%%  		
&&      
**	multiplication. repeat?					
__  	
++	addition, extend?
::  	type?
""	un-metaquote?
<< >> 	character metaconstruct (default: graph comprehension)
??  	
@@  			note: not in ISO 646, but is in T.61
##		note: not in ISO 646, but is in T.61
$$	anon fn helper?	  	note: not in ISO 646, but is in T.61
||		note: not in ISO 646, but is in T.61

other:
./ /.	comment
^=    	struct eq
r'	raw string (no escape sequences processed)
r''     same, ended by ''
'''	string ended by '''
r'''	raw string ended by '''
''''	string ended by ''''
r''''	raw string ended by ''''
  ... for any number of 's ? that might make parsing hard ...
.XXX    short for 'me.XXX'
$[]	list of resources (often strs)
$'[]	list of symbols
->	implication
<->	biconditional
=>	macro defn? production rule?
%=>	macro defn?
%=	compile time (recursive eager?) binding and me (re)binding

alpha ids for some things you might think would be punctuation:
div	division
pow	exponentiation

double punctuation whose interpretation is induced by meaning of single char:
(( ))  	just two parens
\\      escaped backslash	 note: not in T.61
[[ ]]  	list within list		note: not in ISO 646, but is in T.61	


punctuation unassigned b/c it might be hard to type on some keyboards:
~~  	not in T.61, apparently hard to type
{{ }}  	note: not in T.61, apparently hard to type 
^^		note: not in T.61
``   	      note: not in T.61


unassigned roles:

boundary metaquote behavior view reference I/O process ownership const strictify, unboxed shared/unique/immutable/lockfree

umm, some more reserved words: atom seq me


Some library functions

assoc: takes an association table in this form:

$[[apple red] [banana yellow] [cherry red]]

and returns one in this form

$[apple=red banana=yellow cherry=red]

Some token operators

-  	      map over items in list

--     	      map recursively over bounded leaves
-flavor-      map recursively over bounded leaves of flavor "flavor", ignoring other boundarys
-!flavor-     map recursively over bounded leaves of any flavor but "flavor", ignoring boundarys of flavor "flavor"

---	      map recursively over interior nodes and bounded leaves
--flavor-     map recursively over interior nodes and bounded leaves of flavor "flavor", ignoring other boundarys
--!flavor-    map recursively over interior nodes and bounded leaves of any flavor but "flavor",
	        ignoring boundarys of flavor "flavor"

----	      map recursively over interior nodes, stopping at boundarys
--flavor-     map recursively over interior nodes, stopping at boundarys of flavor "flavor", ignoring other boundarys
--!flavor-    map recursively over interior nodes, stopping at boundarys of any flavor but "flavor",
	        ignoring boundarys of flavor "flavor"


coordinatewise (like binary "map")

-/ foldr
-\ foldl

scans? what else? repeats? self-repeats? flips?

todo: convert term of any of a set of flavors to one flavor

note that the effects of foldr and foldl are in some senses "backwards" from their effects in Haskell, because Oot is basically right-associative and Haskell is basically left-.

postfix:

' : strictify (strict in all arguments, and all functions called are strictified and forced eval'd (i think that's redundant, but whatever))

Some interfaces

bool

boolneg

partial order

set

ord

relation

(see table in 'Binary relations by property' http://en.wikipedia.org/wiki/Binary_relation for some others)

Some type tags

fastLookup

fastInsert

slowInsert

lowMemory

Some library functions

cat (concat, 1st arg is dim) sz sub (substitute, i.e. string or otherwise) ! not (can be unary prefix? postfix?) join (on posets) meet
; or just ++, 'append', if column vectors are default)

btw, what is the fn to iterate addition to produce multiplication? iter? (a: yes; in haskell, "iterate")

here's how join operates on "database style tables" to produce a "natural inner join": finds all the keys in table 2 which are equal to keys in table 1, and finds all the entries for which the values of all equal keys match left outer join (where rows from table 1 whose keys have no matches in table 2 are still included, with nils in the table 2 spots) and full outer join (where rows from both tables whose keys have no matches in the other table) are also supported (ojoin, symojoin)

btw, a table's primary key can be identified by tab.0. a foreign key is a key which is eq to another table's primary key.

loose infix:

&& and

or

eq

~ match

in member ^= relative structural eq (referentially-transparent syntactic sugar like '='; calls ^= on left object with right object and selectors) != > < >= <= <==> spaceship operator mag abs

max min amax argmax amin sgn sign transpose (tsp? unarp postfix ^T) conj (conjugate) ctran (conjugate transpose) (ctsp?) (unary postfix? ^* unary prefix?)

trig fns, pi, e, nrt, sqrt

pinv: "pseudoinverse"; "constructor inv" returns a function that, given a data object constructed by "constructor", returns a function call argument object that would produce that data object (note: constructor does not need to be invertible; many function call argument objects may map to the same data object) (todo: is pseudoinverse an appropriate name? i think so; http://planetmath.org/encyclopedia/Pseudoinverse.html says that "The term ``pseudoinverse is actually used for any operator pinv satisfying Mpinv(M)M=M")

inv: "fn inv" produces the inverse of function "fn", if it has one (law: x fn fn inv == x) (unary postfix? ^-1)

prr print with newline pr print use the "to" keyword to specify a stream, i.e. "dude, there was an error" to=stderr prr prints to stderr

[...] diff true if all pairs of elements in the list are != (this is especially useful in constraint propagation)

relations: closure

tree traversal algoritms:

tree algs:

sors algs: qsort

graph algorithms: min spanning tree, topo sort,

the following basic graph ops direct producted with: arc, node; return mutated version of structure vs. return list; and, for nodes, only operate on terminal nodes, only operate on non-terminal nodes, operate on all; input to subfunction includes position in graph, or independent of position;

 label map
 map over arcs/nodes of label satisfying predicate
 delete arcs/nodes of label satisfying predicate
 fold (mutable, like map) over arcs/nodes of label satisfying predicate, with various tree/graph traversal choices (i.e. mutation by a "visitor"
 change or delete arcs/nodes of label satisfying predicate
 visit arcs/nodes and return arbitrarily changed graphs

network algorithms:

NP complete algs: 2SAT, 3SAT,

hofs: applyArgsAsDefault

boundarys: flavor map (flavm), tdepth (bound at depth)

state machines:

other automata:

strings: regex, split, escape

matrices:

files:

net:

stacks and queues (actually this is just a type of list): dup, pop (head), push (cons), ?? (see Cat, Joy)

logic: prolog-style, unification, constraint satisfaction, k-SAT (CNF and DNF) (resolution, also stochastic local search), theorem proving (propositional and 1-st order logic)

"save" (not "pickle")

also, check out main ruby, python, haskell libs, as well as Boost

Compiler and IDE features

Compiler

IDE


see [Self-ideas-computer-oot-ootDesignTodos]

see [Self-ideas-computer-oot-ootDesignToReads]


use cases

rosetta

other

other names

mb i'll name it Moebius instead of Oot.

Other Oot-related pages on this site

See other items in the directory [4].

Also, see the directory [5], and some items from the directory [6].


Footnotes:

1. Y