proj-oot-old-ootSyntaxNotes

---

disregarded in favor of `` for ASTs and 'logic', functional reactive streams for 'capturing' side-effects, 'eval' for generating these streams from blocks, and ! as a sigil for marking (unmasked) side-effects and exec

! relates to impure functions and facts

TODO: really? maybe we should stuff all the logicy stuff into ? and leave ! for 'running' blocks, and for marking reference (impure) variables.

A statement like "x = 3" can be thought of as a change to a memory location, but it can also be thought of as the declaration of a fact. Facts are first-class in Oot.

To refer to the fact represented by "x = 3", use the "!=" operator instead of "=":

the-fact-that-x-is-3 = x != 3

A function with multiple return values can be used to construct a composite fact by applying the prefix '!' operator. Within the block, any mutation to any of the return values will be a fact within the composite fact. The primary return value of this operation is the fact, but a secondary return value with label 'result' is the primary result of the block. Eg, the following indicates a composite fact containing the facts 'x != 3' and 'y != 4':

fact1 = !{x y = $ x = 3; y = 4; 5}
fact1, the-result = !{x y = $ x = 3; y = 4; 5}
result == 5

A fact can be 'applied', meaning that its contained mutations are applied to the current environment, with the postfix ! operator. Eg:

x == 0
fact1 = !{x y = $ x = 3; y = 4}
fact1!
x == 3

Note that, unlike with suspended functions, the code in the block was executed at the time that block was evaluated, which may (depending on laziness and strictness operators, see the section on '$') occur when the fact was created, rather than when the fact was applied.

A fact is not an opaque object. The information in a fact are represented as a graph, and this information can be accessed without ever applying the fact.

Note that the '!' operator transforms imperative functions into declarative facts, whereas the '?' operator goes the other way, from declarative query patterns into functions.

TODO representation of facts as graphs. Note that where this gets tricks is representing abstract logical properties, eg forall, eg two things being equal, etc.

TODO i guess these representations (for patterns too) need to be pluggable, because we cant pick a perfect logic that will be good for all situations

TODO where

TODO do we need a dual thingee that records the reads of a block (non-determinism/reliance on outside information), rather than its writes (side-effects)? see queries, above

TODO we might need our logical language to be 'pluggable' since it is unclear which the 'best' representation for logical statements is

---

yknow, it's a pain to type in uppercase. So maybe we should use uppercase for labels, and have something like ".sym" for either "symbols" (the equivalent of :sym in ruby) or annotations.

otoh i find it easier to read SYM than :sym, and clearer for someone coming from another language.

we could have BOTH uppercase and a prefix sigil, and then have ootfmt transform the prefix'd symbols to uppercase

the easiest to type punctuation chars are: -=[]';,./`\

yknow... we can always fit more into the 'prefix -' system by adding more -s to the existing guys... so eg could use:

-x symbol --x private ---x oot-defined

i kinda like that.

---


thinking about anonymous fn syntax; ppl seem to like js's => but can we do better? i had been thinking surrounding variables with `` but mb we can do it with just one character, if that character represents an 'arrow' that is a binary operator. A Haskell-like syntax for fn defn also seems nice, 'f x = y' instead of 'f = \x -> y'.

For (named) function defn syntax, we also want the 'matlab copy and paste' feature, where the fn defn looks like eg 'function result1, result2 = functionname(x,y) ... end' and then you can just copy and paste that to where you are using the fn. In oot then i guess it would be something like 'result1, result2 = functionname x y {...}'. Since we would have to have the substring 'result1, result2 = functionname x y', we can't just have the Haskell-style 'functionname x y = {...}', unless we use chained, double assignment for this (instead of setting many things equal to each other, like in Python):

result1, result2 = functionname x y = {...}

hmm actually i kinda like that.

yknow, for anon fns, => isn't THAT hard to type, b/c the keys are close to each other. So mb just stick with that, after all, everyone will recognize it. Right now we are using '$', which is slightly easier to type, but harder for newbies to learn.

so right now i'm leaning towards

> for anonymous functions

result1, result2 = functionname x y = {...} for named functions ({}s are optional)

but, have to look into if removing ordinary chained assignment causes other problems (the name of the language feature a = b = c is 'chained assignment'). i dont use it much in my code, but maybe it's good for something i'm unaware of. some links:

http://programmers.stackexchange.com/questions/165532/multiple-attribution-in-python-js

"In many languages, the = operator returns the value that was assigned."

"Languages that consider assignment to be expressions have this type of feature.. C, C++, Java, and C# support the syntax you provided. VB does not support chained assignment (since assignment is a statement and not an expression like in the C family)."

https://en.wikipedia.org/wiki/Assignment_%28computer_science%29#Chained_assignment

in Python and Javascript, "x = y = somefunction()" is not the same as "x = somefunction(); y = somefunction()" because in the chained assignment case somefunction() is only called once, so x will be aliased to y, but this is not so in the ordinary case

http://blogs.msdn.com/b/ericlippert/archive/2010/02/11/chaining-simple-assignments-is-not-so-simple.aspx seems to show that chained assignment leads to unneccessary confusion about the way the language works

http://pandas.pydata.org/pandas-docs/stable/indexing.html says "Warning Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called chained assignment and should be avoided. See Returning a View versus Copy "

http://deathofagremmie.com/2012/12/27/a-c-python-chained-assignment-gotcha/ gives an example where he was porting from C to Python and copied over a chained assignment and it didnt work right b/c these languages have subtley different interpretations of that

https://www.wakari.io/sharing/bundle/hayd/pandorable says it's annoying for libraries to make sure this works right: "Chained assignment Pandas jumps though hoops for this to work as often as possible (or warn if it's not going to), but it's kindof easy to find edge cases which will fail without the warning, best to avoid.". https://github.com/pydata/pandas/issues/7585 gives an example of the trouble.

random similar scala hack: http://www.scala-lang.org/old/node/12478

http://codeschool.org/python-continued/ notes that Python (and Javascript) support chained assignent (but via different rationales; assignments are expressions in Js but not in Python).

as an aside, http://codeschool.org/python-continued/ also notes that they find it annoying that python support a special case for chained (in)equality operators to be treated as implicit logical conjuctions, eg a < b < c is equivalent to '(a < b) and (b < c)' (like in math), whereas other languages would parse it as (a < b) < c

so my tentative conclusion is that not only CAN we get rid of chained assignment, we SHOULD, b/c it appears to be an impediment to readability.

my remaining hesitation is whether chained assignment could/should be used to represent the case where we 'mutate' a field within an immutable object by returning a copy of the immutable object with the desired mutation applied, and then we assign this copy to some other variable. One could imagine that that could be done with: 'y = x.field1 = 3'. This example shows that if we really wanted to do it like that, we could get away with both at once, because x.field1 is in the function name position, but '.' is invalid in a function name, so we could recognize that if '.' is there then we aren't doing a fn defn. Not sure if that's a great idea, though. Also, not sure if we really should be using '=' for applying mutations (that dont alter the lhs) at all, if we're already using it for 'let' type reassignment; b/c for ppl who have seen other languages, 'y = x.f = 3' doesn't intuitively look like it is taking immutable data x, doing something to a copy of it, then putting the result in y. mb '!=' would be good here? but we wanted to use that for not-equals (we could always use ~= for that; but we wanted that for approximately-equals!). Also, it's hard to type. mb '/='? kinda hard to type, too. mb '-='; that's real easy to type. All we gotta do is tell ppl that Oot doesn't support stuff like "a += 3". (but mb we want -= for not-equals!). Other alternatives for non-equals (inequality) are <>, =/= (erlang) . Here's a table: https://en.wikipedia.org/wiki/Relational_operator#Standard_relational_operators

the thing is, mutation of a field member will be typed extremely frequently, probably more frequently than not-equals. So we probably dont want to give up '-='.

otoh, since we have local mutation already anyways, we could accomplish 'assign a mutated copy of x to y' in two lines already:

y = x y.field1 = 3

this is clear and doesnt require the introduction of another operator. If ppl want the relevant (immutable) fn for hof purposes, they can use:

y.---set(-field1, _)

(this doesnt mutate anything, it just returns the mutated copy; there is no way to do the mutation in a fn because our let-style local mutations are just syntactic sugar hiding renaming of immutable variables)

---

the current anon fn syntax is:

Anonymous functions are written with a dollarsign to separate the arguments from the function body. {}s can be used for functions with multiple lines, eg:

x $ x + x
{x $ y = 3; x + x}

with return arguments that's eg:

{r2 = x $ r2 = 3; x + x} which is a little confusing b/c it's not immediately clear that the leftmost = is to separate the return arguments from the arguments in the anonymous function; it seems like mb $ should bind looser than =, dividing the expression into two (which i guess it does, but that's confusing)

hmm.. maybe a better syntax would be to reuse the named fn defn syntax but with _ as the fn name:

r2 = fnname x = {r2 = 3; x + x} r2 = _ x = {r2 = 3; x + x} if we have no return args: _ x = {r2 = 3; x + x}

not so bad but not quite as easy to type, or to read, as: x $ {r2 = 3; x + x}

but as noted above, WITH return arguments, this becomes harder to read.

is => any better?

{r2 = x => r2 = 3; x + x}

a little but not much

should we even use = to the right of a fn name and params for the function body? we could use eg '=>'. but = is appropriate because a named fn defn is like a let binding in the current scope.

so... i guess _ x = {...} is best for anonymous fns after all

wait how about => instead of _? that's even more typing but:

> x = {...}

now that seems much clearer to me. with return args:

{r2 = => x = r2 = 3; x + x} (let's add in the implicit {}s on the right) {r2 = => x = {r2 = 3; x + x}}

the = => looks clumsy; mb:

{r2 <= x = {r2 = 3; x + x}}

nah.. could always go back to using backticks:

{r2 `x` {r2 = 3; x + x}}

huh that makes a lot of sense

otoh, could make a fn 'anonymous' just by having a fn defn return the fn. Then we get back a rationale for the _ in place of fn name, and we're back to:

fn = (r2 = _ x = {r2 = 3; x + x})

it's not quite as easy to read or to type, but it's simpler. The rule is just:

if there is one '=', and the lhs of the =s has multiple words with spaces in between (as opposed to '.'), then this a fn defn. the leftmost word is the fn name. The others are the arguments. If there is a second = to the left of this =, a syntactic sibling (eg not outside of a parens), then to the left of the second one are the returns arguments. This construct binds the function name to the fn value just like a 'let' equals. It is an expression which returns the bound function. You can use '_' for the fn name if you want an anonymous fn (_ is never bound to).

one issue with this, however, is that, if this is an expression returning the bound fn, then why not make all equations expressions, and therefore chainable! but if they are chainable, then using '=' for return arguments is an ugly special case. But if you do not do that, then you dont have the matlab-copy-and-paste-from-fn-defn property.

perhaps the best option for readability would be to have => separating the fn header and body, eg

r2 = fnname x => {r2 = 3; x + x} x => {r2 = 3; x + x} r2 = x => {r2 = 3; x + x}

the third line still is confusing though; it looks like a 0-ary function named 'x' AND it looks like assigning the anon fn to 'r2'. So mandate the fn name/_:

r2 = _ x => {r2 = 3; x + x}

Hm, that's approaching readability. A remaining complaint is that that (a) the previous line has 4 keystrokes, 3 of which are unnecessary, to make the lambda fn (_ >=). Also, the '=>' is semantically unnecessary. Also, this adds a lot of keystrokes to Haskell-style programming where you define lots of small functions:

f1 x => y f2 x => z

ok, how about this then: the => is only needed when you are using = for return arguments. Otherwise, you can use '=' like in Haskell. Now we have:

r2 = fnname x => {r2 = 3; x + x} fnname x = x + x anon = (fnname x = x + x) anon = (_ x = x + x) anon = (r2 = _ x => x + x)

ok i kinda like that.

---

new syntax idea for fn defn, and anon fn defn (todo add this to ootSyntaxMain if you like it):

ordinary fn defn:

fnname x = x + x fnname x => x + x

ordinary fn defn, returning a value used as an anonymous fn:

anon = (fnname x = x + x) anon = (fnname x => x + x)

if you dont want to think of a 'fnname' for an anonymous fn:

anon = (_ x = x + x) anon = (_ x => x + x)

fn defn with return args:

r2 = fnname x => {r2 = 3; x + x}

anonymous fn with return args:

anon = (r2 = _ x => x + x)

0-ary anonymous fn:

_ => 3

0-ary named fn:

f => 3

field mutation:

x.field1 = 3

field mutation as a non-mutating hof: x.---set(FIELD1, _) (equivalently, x.---set(-field1, _))

---

new syntax idea for fn defn, and anon fn defn (todo add this to ootSyntaxMain if you like it):

ordinary fn defn:

fnname x = x + x

if you dont want to think of a 'fnname' for an anonymous fn:

anon = (x => x + x)

fn defn with return args:

r2 = fnname x => {r2 = 3; x + x}

anonymous fn with return args:

anon = (r2 = _ x => x + x)

0-ary anonymous fn:

(=> 3)

field mutation:

x.field1 = 3

field mutation as a non-mutating hof: x.---set(FIELD1, _) (equivalently, x.---set(-field1, _))

on second thought, i'm not too happy with this. Why should the = change to a => when there are return arguments? it's confusing.

just stick with r = f x = body , like we had before. => for anons. 'r = _ x = body' for named anons

---

slightly new syntax idea for fn defn, and anon fn defn (todo add this to ootSyntaxMain if you like it):

ordinary fn defn:

fnname x = x + x

if you dont want to think of a 'fnname' for an anonymous fn:

anon = (x => x + x)

fn defn with return args:

r2 = fnname x = {r2 = 3; x + x}

anonymous fn with return args:

anon = (r2 = _ x = x + x)

0-ary anonymous fn:

(=> 3)

field mutation:

x.field1 = 3

field mutation as a non-mutating hof: x.---set(FIELD1, _) (equivalently, x.---set(-field1, _))

---

---

does 'f x = body' style of fn defn syntax prevent us from being an LL(1) grammar? if so, perhaps we should add a 'def' on the left like Python, a 'fun' on the left like ML, or a 'function' on the left, like Octave? but that's annoying to type. But i'm not sure if it actually prevents it. After all, in eg Python you can have 'a = 3+2' even though the '=' isnt seen until later. How does that work? It seems that the 'a' is just parsed as an ordinary expression; i guess its 'l-value'-ness is deduced later, in the transition from parse tree to AST:

in python 3, AST:

import ast; ast.dump(ast.parse('a = 3')) "Module(body=[Assign(targets=[Name(id='a', ctx=Store())], value=Num(n=3))])"

how 'a = 3+2' is parsed in the parse tree in Python 3:

import parser, token, symbol, pprint

def recur_map2(fun, data): if hasattr(data, "__iter__"): return [(elem != data and recur_map2(fun, elem)) or fun(elem) for elem in data] else: return fun(data)

def prettyprint_parse_tree_for_code(code_string): parsed = parser.suite(code_string).tolist() parsed_mapped = recur_map2(lambda x: ((type(x) == int) and x in symbol.sym_name and symbol.sym_name[x]) or ((type(x) == int) and x in token.tok_name and token.tok_name[x]) or x, parsed) return pprint.PrettyPrinter?().pprint(parsed_mapped)

prettyprint_parse_tree_for_code('a = 3+2') ['file_input', ['stmt', ['simple_stmt', ['small_stmt', ['expr_stmt', ['testlist_star_expr', ['test', ['or_test', ['and_test', ['not_test', ['comparison', ['expr', ['xor_expr', ['and_expr', ['shift_expr', ['arith_expr', ['term', ['factor', ['power', ['atom', ['NAME', ['a']]]]]]]]]]]]]]]]], ['EQUAL', ['=']], ['testlist_star_expr', ['test', ['or_test', ['and_test', ['not_test', ['comparison', ['expr', ['xor_expr', ['and_expr', ['shift_expr', ['arith_expr', ['term', ['factor', ['power', ['atom', ['NUMBER', ['3']]]]]], ['PLUS', ['+']], ['term', ['factor', ['power', ['atom', ['NUMBER', ['2']]]]]]]]]]]]]]]]]]], ['NEWLINE', ]]], ['NEWLINE', ], ['ENDMARKER', ]]

see for example, Python/ast.c, for example, http://svn.python.org/projects/python/tags/r26b3/Python/ast.c , function ast_for_expr_stmt, section ' /* a normal assignment */'. We see that there is arbitrary code being executed here that checks things like, is there an equals sign after the LHS, and is the LHS a yield (not allowed, so if so, emit an error). Indeed if you do:

def f(): (yield 3) = 4

in Python, you get:

SyntaxError?: can't assign to yield expression

So i guess the grammar for the inital parser can be LL(1) even if arbitrary logic will be needed to transform that parse tree (concrete syntax tree) into an AST, and even if syntax errors may still be omitted upon the transition to AST.

so i guess we just need to make sure that the parser can do GROUPING

---