proj-oot-ootSyntaxThoughts

want LL(1) at the high-level parsed via recursive descent, plus Pratt parsing (or shunting yard, but i've heard Pratt parsing is supposed to fit together nicer with recursive descent?)

https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html https://matklad.github.io/2020/04/15/from-pratt-to-dijkstra.html https://matklad.github.io/2023/05/21/resilient-ll-parsing-tutorial.html

https://www.reddit.com/r/rust/comments/g0eusf/blog_post_simple_but_powerful_pratt_parsing/ https://www.reddit.com/r/rust/comments/g1p1mn/blog_post_from_pratt_to_dijkstra/ https://lobste.rs/s/o1jwxo/from_pratt_dijkstra https://internals.rust-lang.org/t/proposal-grammar-working-group/8442/46

---

unlike Haskell, we don't want to allow arbitrary operator precedence, because then you have to look up (or have memorized) every function in code that you read just in order to parse it.

like Haskell and Lisp and unlike Python, we separate fn arguments by spaces, not commas, and we don't require them to be surrounded by parens

like Haskell and unlike lisp, we dont require outer fns invocation to be surrounded by parens;

like haskell unlike lisp, we have function composition syntax (e.g. haskell's $)

like python and unlike haskell and lisp, we have syntax for list/array/dict access and mutation

like haskell, we have optional whitespace

like lisp and octave and unlike python, we allow space-separated list literals

like javascript, we have a type of map literal that implicitly quotes the key names

like ruby and perl (supposedly; see http://news.ycombinator.com/item?id=3068819 ), good options for quoting within literals

like octave, we have a way to quickly print out the result of each step (but ours is opt-in, not opt-out; e.g. mb if there IS a semicolon, then print)

(not syntax)

like python and unlike haskell and lisp, our fundamental data structure is dict-like, not list-like





pattern matching partial functions? seems messier than a pattern-matching switch statment to me; that way it's all in one place. is there any benefit to doing it the partial way? also, look at simon's advanced guards thingee -- is there some corecursion thing going on there that makes partial functions look more appropriate?

--- i just wrote:

"

Read left-to-right (standard orientation)

That is to say, in Oot you write "x = 1++(f y)", not "3++(y f) = x". "3++(y f) = x" (the nonstandard way that we didn't adopt) would be clearer, because when you read it from left to right, that follows the order that data actually flows thru the expression.

However, programming text is usually left-aligned, which makes it easy to scan down the screen with your eye and look at the leftmost words.. and hard to scan down looking at the rightmost words. A common use of scanning is to look for where some variable is defined. So, the lhs of the assignment operator should indeed be on the left hand side.

todo: mb mix and match? x = 3++(y f) ? "

hmm, mb should do:

anything with a side-effect, including assignment, goes on the leftmost side. So, there can only be one side-effect per line. The side-effectful thing is executed last. Everything else is 'backwards', e.g. x f instead of f x, i.e. the left-er stuff is executed earlier.

hmmm but now where/how do we put parens now that everything is backwards?

instead of:

defun (map f xs if (unit? xs unit cons f car xs,, map f, cdr xs

we have

defun ( unit xs cdr, f map,, xs car ..

no no no, the whole idea of doing it backwards is for the order of evaluation to be followed. so you can't put something after a conditional first. so i guess the order of arguments should not be reversed

defun unit? xs unit cons xs car f ,, xs cdr, f map ) map xs f

no, too irregular. since we want the side-effect to go on the left, and the side-effect is usually the last thing, we either have to put the first stuff on the right, as usual, or we do it backwards but then we have an irregularity because we switch directions between the first thing on the left and the second.

so let's stick with the usual direction.

---

i do like the rule that a side-effectful command/function may only be leftmost on each line (and hence there can be at most one per line). But this means that when you change something from pure to side-effectful, callers all the way up the call chain must rewrite the way they call you.


right associative, but left associative with , (which only changes parsing to the right of the comma), newline with parens open is left associative w/r/t the parens opening through the line boundary. In other words its like http://chrisdone.com/z/ except using opening parens instead of indentation, and with commas to indicate multiple arguments.

If there is a newline without unbalanced opening parens on that line, then the line is implicitly surrounded with parens.

empty lines are sugar for {}s which are like 'big parens' that define 'blocks'. Blocks are like big parens which auto-close parens as needed at the end of the block. They also provide scope for block-scoped macros.

since we are using {} and () now for grouping, and we are using [] for data literals, we need another bracket grouping symbol for annotations and/or type system directives. How about <>. Use 1 for comparison.

---

todo read http://en.wikipedia.org/wiki/Bracket#Uses_of_.22.28.22_and_.22.29.22 and subsequent 'uses of' sections.

---

for punctuation used as operators and also as normal symbols, you want the languagey things for single uses and the normal symbols as repetitions, because the languagey uses are much more common, e.g. - is inverse, -- is subtraction. But for punctuation used as grouping, you want the single uses as symbols and repetitions as grouping, because once you think you see a grouping symbol you dont want to have to lookahead to see if its not really one, and also you want weird grouping stuff to stand out when you are skimming code

--

mb start with right associative, and switch into left association mode when you enter parens, or start with left associative, and switch into right association mode when you enter parens

dont alternate/switch again with nested parens; thats too confusing

--

in the context of wondering if "a , b , c = 1 , 2 , 3" should work, and if it would that would seem to imply that a, b, c == [a b c], what is really gained from Haskell-ish currying, e.g. always having f(x)(y)(z) instead of f([x,y,z]) ?

http://www.haskell.org/haskellwiki/Currying says "The major advantage of considering all functions as curried is theoretical: formal proofs are easier when all functions are treated uniformly (one argument in, one result out). Having said that, there are Haskell idioms and techniques for which you need to understand currying."

http://www.haskell.org/haskellwiki/Composing_functions_with_multiple_values notes the imbalance between curried inputs and tuple outputs

oh i remember one thing it lets you do; it lets you not distinguish between:

can this be easily mimiced with tuples and partial application? partial application can take the second and turn it into the first, but is there a situation where you need to go the other way somehow?

i think that's it. the benifit of currying instead of having tuple arguments is that it lets an arbitrary multiargument function be implemented as a function that takes some of its arguments and returns a function that takes the rest. i guess that's important.

--

remember that big blocks are also scopes for things like macros and transactions (should we allow transactions with 'dynamic boundaries' too? probably)

-- ppl seem to dislike function scope, and prefer block scope but with closures (e.g. something defined in an ancestor block is available in all descendents)

http://www.adequatelygood.com/JavaScript-Scoping-and-Hoisting.html

"

Don’t tell me it’s got lexical scope, because JavaScript’s? scoping is an abomination in the face of God. Guy Steele isn’t even dead and JS scope makes him pre-emptively roll in his not-yet-occupied grave.

...

At the same time, we’re ignoring the things about JavaScript? that make it not Scheme. It’s got a much richer syntax including a great notation for data. I’m not a huge fan of prototypes (anymore), but it’s an interesting dispatch model that Scheme doesn’t have. "

-- http://journal.stuffwithstuff.com/2013/07/18/javascript-isnt-scheme/

    "JavaScript's C-like syntax, including curly braces and the clunky for statement, makes it appear to be an ordinary procedural language. This is misleading because JavaScript has more in common with functional languages like Lisp or Scheme than with C or Java. It has arrays instead of lists and objects instead of property lists. Functions are first class. It has closures. You get lambdas without having to balance all those parens." -- http://www.crockford.com/javascript/javascript.html

"

Avatar danielparks • 4 days ago

Could you expand on your contention that JavaScript? isn’t lexically scoped? 3 • Reply • Share ›

    Avatar
    Calvin Metcalf danielparks • 4 days ago
    It is functionally scoped instead of block scoped and while it is mostly lexically scoped 'this' is dynamically scoped.
    6
    •
    Reply
    •
    Share ›
        Avatar
        munificent Mod Calvin Metcalf • 4 days ago
            −
        Not just that, but thanks to with and the global object, you always have dynamically scoped variables.
        1
        •
        Reply
        •
        Share ›" "

---

let's try to think of a way to use capitalization more usefuly than (a) distinguishing a certain type (the Type type), or (b) scoping.

note: many of these may deserve some other special syntax, if not capitalization

so far i like the last one the best. e.g. capitalized names can optionally break through hygenicity of macros. But how to encourage ppl not to use metaprogramming tricks like ruby's method_missing to do the same thing for lowercase names? perhaps a version of method_missing that only works on capitalized names should be provided, and use of the full (original) method_missing that works on all names should be discouraged (at a higher level of the metaprogramming hierarchy).

--

--

coq's 'notations' syntax

" We can make numerical expressions a little easier to read and write by introducing "notations" for addition, multiplication, and subtraction.

Notation "x + y" := (plus x y) (at level 50, left associativity) : nat_scope. Notation "x - y" := (minus x y) (at level 50, left associativity) : nat_scope. Notation "x * y" := (mult x y) (at level 40, left associativity) : nat_scope. "

"at level x" is precedence, "nat_scope" is which namespace the notation is declared in

" Notation "( x , y )" := (pair x y). "

" Notation "x :: l" := (cons x l) (at level 60, right associativity). Notation "[ ]" := nil. Notation "[ x ; .. ; y ]" := (cons x .. (cons y nil) ..).

"

"For example, since we defined + as infix notation for the plus function at level 50, ... The + operator will bind tighter than ::, so 1 + 2 :: [3] will be parsed, as we'd expect, as (1 + 2) :: [3] rather than 1 + (2 :: [3]). "

"

The right associativity annotation tells Coq how to parenthesize expressions involving several uses of :: so that, for example, the next three declarations mean exactly the same thing:

Definition mylist1 := 1 :: (2 :: (3 :: nil)). Definition mylist2 := 1 :: 2 :: 3 :: nil. Definition mylist3 := [1;2;3]. "

" Notation "x ++ y" := (app x y) (right associativity, at level 60). "

--

homoiconicity: at some point need to take this more seriously. No "graph data constructor". Grouping constructs, etc, are the same in graphs as in code. Graph node labels are used in code. Etc.

--

list of syntactic/semantic universals:

x > 3}; the pipe in Haskell guards) (note: "implies" (arrow) is related; "b, given a" is the same as "a -> b", so maybe just use that? e.g. instead of Haskell guard syntax "f x x > 3 = 2" we'd use "x > 3 --> f x = 2"; this unification is also needed to unify Haskell pattern guards, and typeclass contexts)

--

--- http://www.haskell.org/haskellwiki/GADTs_for_dummies ( http://web.archive.org/web/20130702221947/http://www.haskell.org/haskellwiki/GADTs_for_dummies )brings up an excellent point: type classes are like arbitrary functions on types, with normal Haskell stuff like pattern matching, algebraic data types (multiple constructors), guards, etc, except with a confusing relational (rather than functional) syntax.

this brings up the obvious point: Oot could be like this but use normal syntax

seems like there is really a lot of mileage to be had just by delivering a uniform notation for data, code, patterns, types, typeclasses. Seems like you could get one just by reading through that article and using the 'basic' (base level functions) one.

note: Haskell's data syntax is a little confusing if thought of as just pattern matching; from the above:

" data Either a b = Left a Either a b = Right b

we write just

data Either a b = Left a

"
Right b

but really we meant "the type of Left a is Either a b, AND the type of Right a is Either a b AND nothing else is Either a b"

note that this is like a Coq match statement.

---

it can be annoying if functions are variadic:

http://stackoverflow.com/questions/7823516/why-are-many-clojure-functions-variadic

---

if we have optional return arguments, then need a way to get all the return arguments sometimes

perhaps similar to the way that we have variadic *args, kw formal parameters at the end of fn declarations

---

so perhaps we have 2 syntactical argument conventions:

---

it's useful to have clojure-style variadic map:

"Returns a lazy sequence consisting of the result of applying f to the set of first items of each coll, followed by applying f to the set of second items in each coll, until any one of the colls is exhausted. Any remaining items in other colls are ignored. Function f should accept number-of-colls arguments."

(map + [1 2 3] [4 5 6]) (5 7 9)

user=> (apply map vector [[:a :b :c] [:d :e :f] [:g :h :i]])

([:a :d :g] [:b :e :h] [:c :f :i])

and

(defn numbered-lines [lines] (map vector (iterate inc 0) lines))

but you also want to do simple maps without enclosing them in []s:

(map inc [1 2 3 4 5]) (2 3 4 5 6)

if you don't have variadicity, and the last positional argument of 'map' is 'args', then you'd have to do:

(map inc [[1_2_3_4_5?]])

and if this could be transformed into a variadic form but it wasn't the default, you'd have to variadicize it explicitly:

(map* inc [1 2 3 4 5])

--

i like clojure's super lightweight lambda syntax: #(blah %)

--

could have precendence for custom operators, but defined by their ASCII or something (to match arith, i guess)

later: i think Scala did this; but i think ppl still get confused by precedence in scala, which has 10 levels: http://stackoverflow.com/questions/7618923/actual-precedence-for-infix-operators-in-scala

---

clojure has a weird notation with slashes i guess for accessing member of Java classes:

https://speakerd.s3.amazonaws.com/presentations/2471a370b3610130440476a0f7eede16/2013-05-17-ClojureOOP-Geecon.pdf

© 2011 innoQ Deutschland GmbH? ( defn make-id [prefix id] ( join "-" [prefix ( Long/toString id 16)]))

--

clojure has an interesting pattern matching syntax:

(defn id-generator ([prefix] (id-generator prefix 0)) ([prefix v] (let [cnt (atom v)] (fn [] (make-id prefix (swap! cnt inc))))))

--

i guess compared to clojure, we want to omit parens (and indentation levels) for things like enclosing a 'let' statement or a fn defn. Wouldn't the previous have been nicer if it were:

defn id-generator: [prefix] id-generator prefix 0 [prefix v] cnt = atom v fn [] (make-id prefix (swap! cnt inc))

notice how it looks nicer with that colon in the first line, too. mb i'm wrong and python is right and we should have that. the colon would mean a special form that introduces a block and goes until the end of the block.

without the colon its not too bad tho

defn id-generator [prefix] id-generator prefix 0 [prefix v] cnt = atom v fn [] (make-id prefix (swap! cnt inc))

--

another example: clojure:

(defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] (let [s (atom s0)] (fn [evt] (if (= evt reset-event) (do (println "Reset event, returning to " s0) (swap! s (fn [_] s0))) (let [[actions transitions] (state-transitions @s)] (if-let [new-state (transitions evt)] (do (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state))) (println "Unexpected/unhandled event" evt "in state" @s)))))))

suggested:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 fn [evt] (if (= evt reset-event) (do (println "Reset event, returning to " s0) (swap! s (fn [_] s0))) [actions transitions] = state-transitions @s (if-let [new-state (transitions evt)] (do (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state))) (println "Unexpected/unhandled event" evt "in state" @s)))

better, yes?

but the implicit grouping of the assignment with the statement below it to form only one branch of the 'if' is confusing. so how about an explicit block:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 fn [evt] (if (= evt reset-event) (do (println "Reset event, returning to " s0) (swap! s (fn [_] s0))) { [actions transitions] = state-transitions @s (if-let [new-state (transitions evt)] (do (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state))) (println "Unexpected/unhandled event" evt "in state" @s)) })

but this change makes it a little HARDER to read in another way: now you have to think about the interleaving of } and ) at the end

but what about those if's and do's? this is tough because in algol you'd have if/then/fi, so the construct could end before the end of the block, unlike lets. And this is necessary because if you did assignment within the if, you'd want it to stick until later. (i guess in clojure if you had a bunch of other stuff to do after the if that depends on variable assignment done within the if, you'd have to make another function and call it at the end of each branch of the if?)

to make the language homeoiconic, rather than having the parser worry about the special case of how many parts 'if' has, it encloses all three parts of the 'if' in a list. now if you want to do more than one side-effecty thing inside one of the branches of the 'if', you need a 'do', which, to be uniform, is also inside a list.

this is already seriously annoying. one small change we can make is replace the do's by blocks:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 fn [evt] (if (= evt reset-event) { (println "Reset event, returning to " s0) (swap! s (fn [_] s0)) } { [actions transitions] = state-transitions @s (if-let [new-state (transitions evt)] { (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state)) } (println "Unexpected/unhandled event" evt "in state" @s)) })

is this really better than using an algol format for those ifs? at least then you'd get to say 'else':

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 fn [evt] (if (= evt reset-event) THEN { (println "Reset event, returning to " s0) (swap! s (fn [_] s0)) } ELSE { [actions transitions] = state-transitions @s (if-let [new-state (transitions evt)] THEN { (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state)) } ELSE (println "Unexpected/unhandled event" evt "in state" @s)) })

you could make this non-syntax by making ELSE = 'else' i guess...

but now how would it look like if you accepted grammar and got rid of the parens around if:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 fn [evt] if (= evt reset-event) THEN { (println "Reset event, returning to " s0) (swap! s (fn [_] s0)) } ELSE { [actions transitions] = state-transitions @s if-let [new-state (transitions evt)] THEN { (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state)) } ELSE (println "Unexpected/unhandled event" evt "in state" @s) }

less pretty, certainly, but perhaps easier to follow? i'm not sure. and a little more to type, what with those THENs and ELSEs. without the THENs:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 fn [evt] if (= evt reset-event) { (println "Reset event, returning to " s0) (swap! s (fn [_] s0)) } ELSE { [actions transitions] = state-transitions @s if-let [new-state (transitions evt)] { (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state)) } ELSE (println "Unexpected/unhandled event" evt "in state" @s) }

and what does it look like without the ELSEs:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 fn [evt] if (= evt reset-event) { (println "Reset event, returning to " s0) (swap! s (fn [_] s0)) } { [actions transitions] = state-transitions @s if-let [new-state (transitions evt)] { (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state)) } (println "Unexpected/unhandled event" evt "in state" @s) }

hmm.. a little prettier and not terribly harder to follow.

leaving in the parens around 'if' is not terrible though:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 fn [evt] (if (= evt reset-event) { (println "Reset event, returning to " s0) (swap! s (fn [_] s0)) } { [actions transitions] = state-transitions @s (if-let [new-state (transitions evt)] { (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state)) } (println "Unexpected/unhandled event" evt "in state" @s)) })

alternately, the programmer could choose to use the implicit block syntax for the ifs, forcing them to use explicit blocks for the fn (which might be a good idea anyway; i get having syntax for 'let', but not so much for 'fn'):

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 {fn [evt]

    if (= evt reset-event) { 
        (println "Reset event, returning to " s0) 
        (swap! s (fn [_] s0))
      }
      {
        [actions transitions] = state-transitions @s
        (if-let [new-state (transitions evt)] {
            (println "Event" evt "causes transition from" @s "to" new-state)
            (doseq [f actions] (f)) 
            (swap! s (fn [_] new-state))
          }
          (println "Unexpected/unhandled event" evt "in state" @s))
      }
   }

hmm.. alternately.. we could use ':' to open a new implicit block within the current one (which will be closed, without closing the enclosing one, upon an empty line:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 {fn [evt] if (= evt reset-event): (println "Reset event, returning to " s0) (swap! s (fn [_] s0))

      :
        [actions transitions] = state-transitions @s
        (if-let [new-state (transitions evt)] {
            (println "Event" evt "causes transition from" @s "to" new-state)
            (doseq [f actions] (f)) 
            (swap! s (fn [_] new-state))
          }
          (println "Unexpected/unhandled event" evt "in state" @s))
   }

no, because at then end, because you have two nested :s, you have to have two empty lines to close them, violating the "don't measure quantity of whitespace" principal.

how would we like it to look? maybe like this:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 {fn [evt] if (= evt reset-event): (println "Reset event, returning to " s0) (swap! s (fn [_] s0)) : [actions transitions] = state-transitions @s (if-let [new-state (transitions evt)] { (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state)) } (println "Unexpected/unhandled event" evt "in state" @s)) }

so, maybe a block started by a ':' must be 'held open' by another ':' at each implicit block closure; and a ':' on a blank line doubles as an implicit block closure, just like an empty line.

but now does nesting those get confusing?

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 {fn [evt] if (= evt reset-event): (println "Reset event, returning to " s0) (swap! s (fn [_] s0)) : [actions transitions] = state-transitions @s if-let [new-state (transitions evt)]: (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state)) : println "Unexpected/unhandled event" evt "in state" @s }

yes; because indentation is ignored, so how do you know that the second lone ':' isn't for the first level of 'if'?

you could require '::' for a nested one, but this violates the principal that you should be able to cut and paste inner code.

in this particular case, however, we could just make the rule that a lone ':' also belongs to the innermost :-block.

but now if the first branch itself ended with an 'if', you wouldn't be able to use : on it. this seems like it would be prone to error.

how about if you use :s to specify how many parts the construct has? so instead of introducing grammar for 'if', you just say 'if::' if you want it to eat two blocks (in addition to whatever other arguments you give it outside of those blocks)?

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 {fn [evt] if (= evt reset-event):: (println "Reset event, returning to " s0) (swap! s (fn [_] s0)) : [actions transitions] = state-transitions @s if-let [new-state (transitions evt)]:: (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state)) : println "Unexpected/unhandled event" evt "in state" @s }

ahh.. that seems right!

--

and i forgot to take out the needless parens around individual lines (since this is implicit); and let's put the == back an an infix operator:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 {fn [evt] if (evt == reset-event):: println "Reset event, returning to " s0 swap! s (fn [_] s0) : [actions transitions] = state-transitions @s if-let [new-state (transitions evt)]:: println "Event" evt "causes transition from" @s "to" new-state doseq [f actions] (f) swap! s (fn [_] new-state) : println "Unexpected/unhandled event" evt "in state" @s }

and we can use : for the fn too since there are no top-level empty lines in there:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 fn [evt]: if (evt == reset-event):: println "Reset event, returning to " s0 swap! s (fn [_] s0) : [actions transitions] = state-transitions @s if-let [new-state (transitions evt)]:: println "Event" evt "causes transition from" @s "to" new-state doseq [f actions] (f) swap! s (fn [_] new-state) : println "Unexpected/unhandled event" evt "in state" @s

compare to the original clojure:

(defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] (let [s (atom s0)] (fn [evt] (if (= evt reset-event) (do (println "Reset event, returning to " s0) (swap! s (fn [_] s0))) (let [[actions transitions] (state-transitions @s)] (if-let [new-state (transitions evt)] (do (println "Event" evt "causes transition from" @s "to" new-state) (doseq [f actions] (f)) (swap! s (fn [_] new-state))) (println "Unexpected/unhandled event" evt "in state" @s)))))))

suggested:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 fn [evt]: if (evt == reset-event):: println "Reset event, returning to " s0 swap! s (fn [_] s0) : [actions transitions] = state-transitions @s if-let [new-state (transitions evt)]:: println "Event" evt "causes transition from" @s "to" new-state doseq [f actions] (f) swap! s (fn [_] new-state) : println "Unexpected/unhandled event" evt "in state" @s

both are 14 lines long. The clojure is 91 words and 746 characters according to wc, including whitespace. suggested is 91 words and 676 characters, including whitespace.

perl -e 'while ($s = <STDIN>) {$s =~ /(\s*)/; $whitespaceCount += length($1)}; print $whitespaceCount ' < /tmp/t.txt

clojure indentation whitespace is 124 chars. suggested indentation whitespace is 102 chars.

so suggested saves 22 indentation whitespace characters and ((746 - 676) - 22) = 48 non-indentation characters. 48/(746 - 124) = 7.7% of the Clojure non-indentation characters. (i bet the saved characters were parens, although there were a couple of 'let's, and maybe i forgot something else)

and i think suggested is easier to read.

--

now let's go back and tackle the bigger question.

if you have a homoiconic language an immutable variables, like Clojure, then i think it must be more verbose to do this (Python:)

def f(w): if w_condition(w): x = g(w) else: x = h(w) y = transformFn(x) if y_condition(y): z = j(y) else: z = k(y) return z

because 'x' can only hold its value within a 'let'. But this 'let' must itself be inside a conditional branch that determines what value you are assigning (g(w) or h(w)). So this 'let' must be repeated twice, once within the w_condition(w) == True branch, and once within the w_condition(w) == False branch. Anything else that depends on x must also be repeated. So all the code involving y and z must be duplicated. Instead of literally duplicating it, what i guess you would do is split this into two functions, kinda like the CPS transform, as if you did this in Python:

def f(w): if w_condition(w): return z_from_x(g(w)) else: return z_from_x(h(w))

def z_from_x(z): y = transformFn(x) if y_condition(y): z = j(y) else: z = k(y) return z

which i guess is why you see so many small fns in clojure.

i dont like this, because (a) it increases the number of things (in this case, separate functions) in the programmer's mind that they have to keep track of/understand when first reading the program, and (b) it necessitates you to make up a name for the second part of the original function.

now one solution would be to make x a locally mutable variable. This is probably not 'the Clojure way' but this is the Oot way. Another solution is if Clojure allows you to declare the variable in the f scope, and then not assign to it until farther down, but have that assignment be valid in the entire f scope. i dunno if that's the Clojure way, or even if its possible in Clojure, i'd have to ask.

i'm thinking it's not. (goes and looks it up). ok, it's sort of possible but it's not the clojure way.

http://clojuredocs.org/clojure_core/1.2.0/clojure.core/with-local-vars , http://stackoverflow.com/questions/940712/redefining-a-letd-variable-in-clojure-loop . also, i think these 'vars' are 'local' in the sense of thread local, but NOT limited by lexical scope (not positive..): http://stuartsierra.com/2013/03/29/perils-of-dynamic-scope

--

"prologue directive"s like strict mode and asm in javascript:

function MyAsmModule?() { "use asm"; module body }

--

i want to not have as many parens as lisp. lisp fans and programming language guys i talk to tend to laugh when i suggest that part of the lack of popularity is that ppl dont like the parens, but i think it's true.

it's a factor for me personally, too. mb that makes me a wimp

---

some syntax principals:

--

if x is of type T, should be able to say

x.y

to disabiguate between type T's y and type Q's y.

see http://ghc.haskell.org/trac/haskell-prime/wiki/TypeDirectedNameResolution for detailed discussion of this sort of thing in Haskell, and http://www.yesodweb.com/blog/2011/09/limitations-of-haskell , "Namespace clashing, particularly for record fields" for the need

--

actually the if:: syntax isnt very ambiguous; since the the number of clauses is also mandatorily delimited by :, you can count whether the number of clauses match the number of expected clauses (when there is a syntax error of this sort, there is some ambiguity about which clauses belong to which operator, but there's still an error because there is no ambiguity about the count)

--

with if:: you could even have 'if then else' by treating 'then' and 'else' as KEYWORD arguments, rather than positional arguments, to the if.

if (x == 3):: pr 'x was 3' else: pr 'x was not 3'

note that the number of colons is the total number of block args, not the number of colons to go. with this syntax you could also do:

if (x == 3):: then: pr 'x was 3' else: pr 'x was not 3'

that's just the difference between {if (x == ) {pr 'x was 3'} else/{pr 'x was not 3'}} and {if (x == ) then/{pr 'x was 3'} else/{pr 'x was not 3'}}

--

hmm if there is more code after such an 'if' how do we delimit the end of the if? right now the only mechanism provided is to make it the end of a block, either implicitly by putting an empty line at the end, or explicitly, by surrounding the whole if in {}s.

should we give any other options? i dont think so; whatever else we can think of won't be much better than braces.

--

compile-time conditionals

like C's #ifdef, Nimrod's "when" http://nimrod-code.org/tut1.html#when-statement

--

so i guess the : and the , are slightly similar in that 'a , b' means '(a) (b)' and 'a; : b;' means '(a;) (b;)', except that the , is delimited by EOL whereas the : must be on its own line and otherwise it takes a label

i'm having second thoughts about the 'if::' thing though. why not just use braces if you have to type it twice anyways

if { condition then: first else: second }

that's like:

if((condition), then=(first), else=(second))

in other words it's a node labeled 'if' which has an edge labeled 0 going to the block 'condition', an edge labeled '1' and 'then' going to the block 'first', and an edge labeled '2' and 'else' going to the block 'second'

another way to say that is that the above compiles to a node with label 'if' and edges:

{condition} 'then'/{first} 'else'/{second}

later (is this right? not sure, doublecheck) so i guess then we have:

defn make-fsm "creates an fsm with initial state s0, a reset event, and a map of transitions. [state-transitions] must be a map of state->[[f1 f2 ...] {e0->s0, e1->s2, ...}]" [s0 reset-event state-transitions ] s = atom s0 fn [evt]: if (evt == reset-event) { println "Reset event, returning to " s0 swap! s (fn [_] s0) : [actions transitions] = state-transitions @s if-let [new-state (transitions evt)]:: println "Event" evt "causes transition from" @s "to" new-state doseq [f actions] (f) swap! s (fn [_] new-state) : println "Unexpected/unhandled event" evt "in state" @s }

--

maybe have one sigil for 'edge label' and another for 'node label'?

mb '/' for edge label (and then

for right associativity) and ':' for node label?

this seemingly provides for nice homoiconicity but actually within a data constructor we still want a way to label both nodes in the data, and separately label nodes in the AST, e.g. we might want a node label 'a' in the data but still want to annotate the place where we define a as "ANNOTATION1" in the source code.

--

hmm actually if we alternate between nodes labels and edge labels that helps us disambiguate... hmm..

if:: condition then: if:: condition2 then: thenThenStuff else: thenElseStuff else: elseStuff

if: condition then/ if: if:: condition then: if:: condition2 then: thenThenStuff else: thenElseStuff else: elseStuff

       condition2 
       then/
         thenThenStuff
       else/
         thenElseStuff
  else/
    elseStuff

hmm.. but how do we know the last 'else' shouldn't be part of the inner 'if'? i guess this doesn't work with nesting

--

another mode: 'command mode'. like Tcl's commands (see the bit in the tcl notes file; todo learn more tcl and see if i'm understanding it right): first argument on each line is a command, others are autostringified

---

operator precedence proposal:

Precedence Operator 8 . (note: in haskell this is looser than fn application) (right associative) 7 function application (left associative) 6 all unary operators 5.5 exponentiation 5 (and most everything else) 4 ++ --, 3.5 +++, cons (dunno about this one; in Haskell they are a lower level than addition, also right associative) 3 == != < <= > >= in/elem, <...

...> (trinary ops)
    2             && || (in haskell these are right-associative but i think i'll make them non-associative)
    1.5           $, >> (haskell's $ and >>, i mean) (in haskell these are right-associative)
    1             = (not really an operator)
    0             other stuff that is not operators

note: within these levels, stuff is non-associative and requires explicit parens, unless it is one operator repeated and it associates

---

trinary infix operators

---

infixify is a trinary infix operators

trinary ops could be of the form <...

<bob> could be the binary infixification of bob. <bob
...>, e.g. <bobbob>, <**>
bob> could be the trinary infixification of box.

--

if you are willing to screw with basic arithmetic you can free up all sorts of stuff:

++ for addition -++ for subtraction for multiplication - for division 2 for gt, >= for ge ^^ for exponentiation

now we've freed: +, -, *, /, <, >, ^

note that here - is a syntactic sugar metaoperator, e.g. it turns an operator into its inverse. by default the first argument of the inverted operator corresponds to the result of the normal operator, and the normal operator is inverted with respect to its first argument, but this can be overridden. One syntax for overriding it might be:

normal: bob a b c inverse w/r/t b: -bob a result- c and another might be: inverse w/r/t b: -bob a _ c in this case how do we give the result arguments, though?

it's important to have / and < free if we want to use them as matching delimiters. But if 3 be a delimiter either (that's not exactly true; we can have 4 would be good for that, but now we're using that for gt, lt.

we still have / \ and < > free in this scheme. Could use < > for character-based escape, e.g. <r r> for regexs, <c c> for comments (or mb <-- -->).

we could use / like Haskell's $, and use \ for the opposite, with the convention that by default, a line is left associative except for the /, but if you see \, it's right-associative except for the \. That may be confusing tho.. also if we use \ as an operator or delimiter then we have to find something else for EOL continuation, and character escape, the traditional uses of \. we could use

instead of /.

also, note that $var is interpolation, and ?var is variable (e.g. to specify a random variable in a formula). {} is code. () is parens. [] is data. - is allowed in the middle of a word, e.g. this-is-a-normal-identifier. we need a syntactic metaop for map. *var is glob expansion. we need an 'in/elem'; this should be a symbol, not alphanumeric, because it is in the precedence table. Everything in the precendence table should have a punctuation in its identifier. we need cons, and append, and should both of these be separate from each other and from addition? we need string interp like Python (probably %), and should this have precendece? probably. and we need an annotation region delimiter (perhaps
? / \ ?). we need isa (and should this have precendece? probably). is isa the same as elem combined in some way with <=, e.g. a isa T means that a is an element of some type T1, and T1 <= T? probably. note that, somewhat similarly, cons is append composed with [] (wrap), e.g. cons(a,b) = append([a],b).

might want to use ^ for annotation.. or for arithmetical hyperoperators.

could even throw out multiplication and just use +1 for addition, +2 for multiplication, (+0 would be successor, except that's unary) etc. this would free up ++ and for other things, but would make addition, subtraction, multiplication, and division pretty annoying to type (+1, -+1, +2, -+2), and possibly hard to read.

could use + for addition, ++ for multiplication, +++ for exponentiation, etc. then either - or -+ for subtraction, etc.

the friendlier thing to do would be to leave ++, --, , as addition, subtraction, multiplication, division.

otoh if what we're really interested in is --, , then maybe just the usual +,-,*,/ is best. if other considerations are close, that is greatly preferred, b/c it won't confuse everyone.

we could still say that unary operators must be physically attached to their tokens; so f -a is 'apply f to the inverse of a' but 'f - a' is 'apply - to f,a'. this would allow us to still use * as glob expansion.

note that -3 still works, as -3 is the 'inverse' of 3. (but actually it's only the additive inverse, e.g. 0-3.. the multiplicative inverse is 1/3.. doing these substitutions automatically might be possible, you just need the compiler to know the additive and multiplicative identities.. hmm..)

could have an operator prefix that means 'bitwise', e.g. %, so %

is bitwise OR, %&& is bitwise AND, %%- is bitwise NOT, etc.

--

mb require all symmetric-looking operators (all operators which are palindromes of only characters whose ASCII visual appearance is horizontally symmetric) to be associative?

or even commutative? an example of a common non-commutative associative operation is string concatenation. so, i guess the question is, is it intuitive to allow symmetric-looking operator like ++ represent string concatenation? or should we make them things like +>?

or require all operators to be associative? well, subtraction is not associative ( http://www.computerhope.com/jargon/a/assooper.htm ) and it should be an operator so so much for that. so if we adopt the above convention, then subtraction should be -+ not -.

note that we still need an 'assoc' annotation for functions which are not assigned to operators (esp. since each operator must also have a non-operator form).

my current guess is: yes, require all symmetric-looking operators to be associative (but not commutative)

--

could use ; for EOL, ;; for EOL plus comment-to-EOL, and , vs ,, for row vs. column in data

---

x.f = (f(x))

---

could use / and \ as -> and <- (or probably vice versa, / for <- and \ for ->), and use that for assignment, and reverse = for symmetric reference binding

e.g.

x / 3 print 3

rather than x := 3 print 3

but then must be division, and so we've given up a good paired delimiter, unless we use / for boundaries instead of , as above

---

from the discussion in the middle of ootNotes3.txt:

now that we're not using =s anyhow, and we have to type `` no matter what, let's use it for arguments, rather than return args:

full: funcname = return args `args` fn_body implicit args: funcname = return args `` fn_body_with_implicit_args one anonymous return arg: funcname = `args` fn_body_with_implicit_args anonymous: (`` fn_body_with_implicit_args)

  a .bob/b .harry/c = [1 2 .harry/3 .bob/4 5]  /// a==1, b==4, c==3
  f .bob/3 .harry/4

note that now we can still use a/b to denote a directed arc in data.

note: could also distinguish between attached and unattached = for keyword vs. other things

---

implicit return like ruby

---

instead of using a separate matched delimiter pair for boundaries, could use something like:

[redBoundary: ] (redBoundary: ) {redBOundary: } <redBoundary: >

--

note that <> does not trigger matching in emacs in text mode, but () [] {} do. so maybe the inner matching that doesn't have to match, which i was calling (), should be <> ; or mb boundaries, which dont have to match, should be <> ?

--

? is sort of no different from quote (and $ no different from antiquote).. but consider statistical random variables; if we say, "?X + 3", where "?X" indicates that X is a random variable, then the +3 operation applies, not to the distribution, but to whatever value ?X takes in some instance. But "mean[?X]" applies to the distribution (todo should we use the [] notation too? if not, what to replace it with? this could also be useful for vectorization and the map token operator and the like). But now if we say "?X + y", where y is a 'normal' variable, then 'var[?X + y]' can be different from 'var[?X + ?Y]', because it matters if ?Y is correlated to ?X.

so what i am getting at is that ?X is not strictly the same as quote(X), because the output of quote(X) is a variable, which is a type of expression, and you can't add 3 to an expression ("quote(X) + 3" is a type error), but you can add 3 to a random variable, even though "?X + y" has similar properties to quote(X + antiquote(Y))

--

separately, let's consider the f(x) vs. mean[X] notation and if not that, find something like it

Since we are using "f x" for f(x), and ?X for random variable X, maybe mean[x] be "mean? X"

i like that. it seems like a better use of ? than just to mark predicate functions.

note that if we do that we are allowing prefix and postfix of the same sigil to mean different but related things. which is also useful with '-' to mark the input that the inverse is w/r/t.

--

i looked up what this is called in math

mean[X] is called the 'expectation operator'. In stats, things called 'operators' are written with square brackets.

this following pages are relevant:

https://en.wikipedia.org/wiki/Notation_in_probability_and_statistics

https://en.wikipedia.org/wiki/Differential_operator

http://quantum.phys.cmu.edu/CQT/chaps/cqt03.pdf
w>' (a 'ket') to refer to 'a wave function for wave w', which is needed because actually wave w has various different wave functions depending on which variable you use (f(x) and g(p), if x is position and g is momentum, for example). i presume you use the ket when you are saying something that applies equally no matter which variable you choose for the wave functions. the <xy> notation apparently means the inner product of two wave functions.

--

variant of operator precedence parsing that might handle the two types of - better (?not sure though?):

http://compilers.iecc.com/comparch/article/01-07-068

---

list of operators from various languages on this page might be helpful:

https://en.wikipedia.org/wiki/Operator_%28computer_programming%29

--

in Pyret, i think : is synonym for begin and ; is synonym for end

---

@ as 'from' for lifting functions on views

--

https://en.wikipedia.org/wiki/Del

del in 3D can be represented as [d/dx, d/dy, d/dz], where d/dx is a function on expressions.

gradient can be represented as del f = the application of del to f = sort of the converse of a map of f over del; instead of getting [f(d/dx), f(d/dy), f(d/dz)], we get [d/dx(f), d/dy(f), d/dz(f)].

Divergence can be represented as dot(del, v) = like the dot product except that when an expression is multiplied by d/dx, the result is the application of d/dx to the expression.

want to be able to make Oot represent this sort of thing

mb use __apply? or i guess just overload * for this sort of operator?

--

MB make Oot annotation character unshifted because it will be common

MB . Or - or '

--

clojure uses #(+ % 2) for (`x` + x 2). we probably want something like ``(+ _ 2), e.g. `` instead of + and _ instead of %

--

did i mention that source code is UTF-8 and must be ASCII, except in strings and in character metaescapes?

--

in assembly they typically use the syntax LABEL: for labels, perhaps i should think about the same thing for Oot

--

do we need a :: meaning 'to the right of this is a type annotation'? if so, does ^ mean this, instead of ::? if so, do we still need a block syntax like {} for annotations, or is ^() good enough?

could just use ^ for annotation of Oot nets, and ^^ for meta-annotation (e.g. type annotations) and ^^^ for meta-meta-annotation (e.g. comments and custom tool directives). then any other syntax for these could be regarded as syntactic sugar synonyms.

--

Parallax Propellor 'Spin' language has a useful syntax for binary literals and related array slicing:

in the following, 'a' is an array of binary I/O lines. X is a byte. Here are 3 ways to assign values to 8 of the lines in a at once:

a[16..23] := %11111111 a[16..23] := %1111_1111 a[16..23] := x

In the second one, the underscore is ignored; it is just for readability.

--

idea from Hoon: things connected by . should be resolved right to left, e.g. if you have an object test with a field foo, this is referenced as foo.test, not test.foo. My reason for liking this is that it is the same reason to have y = f(g(x)) instead of ((x)g)f = y; since text is left-aligned in the editor, it's nice to scroll down the page, quickly scanning the left-most things for the final operation before assignment to the variable, which is what you are usually looking for.

--

y'know the octave function defn syntax is so convenient to cut and paste, maybe its better than our `` syntax above?

--

  1. hastags seem to be the standard for a single character type prefix for tags, and @user for usernames.

(contrast with +tag @context in the todo.txt standard)

--

maybe just put # at the beginning of 'nonessential' lines rather than have footnotes?

--

maybe try to make Oot Android-keyboard friendly with an eye towards a 'learn to program' app?

on my keyboard i have:

apparently on the Android 2.2 default keyboard, you get the letters and can also swipe up to access the numbers and:

apparently on the Android 4.1 default keyboard, you get the letters and ,.'-/ and can longpress to access:

apparently on the Android 4.4 default keyboard, you get the letters and !?,./

apparently on Windows 8.1 tablet, you get the letters and ,.':-? and can longpress for the numbers

apparently on the Android 4.4 default keyboard, you get the letters and . and can longpress for @#$%&*-+_!:;/? and maybe two others

using 'google keyboard':

^{}[] as well as some other symbols (bullet, sqrt, pi, division, multiplication, backwards Paragraph sign, delta, a bunch of currency signs, degree sign, copyright, r (reserved), TM, c/o)

---

on my Android, neither the hardware keyboard nor the first few pages of the software keyboard give me the pipe character '

'. So better not use that for anything common.

---

should look at the ASTs of languages such as Python and Haskell and ask whether these can easily be represented in our oot graphs, and if so, can we generalize this to make it easy to make up new syntaxes? mb also consider ometa

--

in Python we have grammar-supported constructs like while condition:\n body. we don't want a complicated grammar for Oot (although Python's is not that bad: https://docs.python.org/3/reference/grammar.html ), because we want something close to 'homeoiconic', if not quite. But perhaps we could have some general grammar-supported constructs that can be specialized by the user? e.g. instead of only having almost no grammar (e.g. Lisp's '(while condition body)'), or hardwired grammar like Python's 'while condition:\n body' we could have something like "%constructname arg1:\n arg2", where constructname is a user-defined construct, so '%while arg1:\n arg2', where the % indicates that it is this sort of construct. E.g. predefine a small number of grammatical forms, hardwire these into the grammar, and require sigils to access them.

i'm not saying we should use Python's 'while' grammatical form in particular, or that we should use '%' as a sigil for this, that's just an example.

in this case (while), i probably prefer the more general scheme like (example for 'if', not 'while'):

if { condition then: first else: second }

e.g. we have a potentially user-defined control structure (if), which has a positional argument ('condition'), and two keyword arguments ('then' and 'else').

--

y'know, as for matched delimiters, we have three obvious ones and three obvious uses:

(): grouping within computation []: data constructor {}: blocks of code

perhaps we should just use them in this fashion.

our grammar is still somewhat simple if we don't use e.g. [] for indexing.

--

haskell's convenient operator defn syntax:

a + b = ...

--

import .. hiding ...

--

single + (rather than ++) is arithmetic (addition) for readability/remember-ability? (and same for -, /, *)

--

multiple return args can be done sorta like this:

a = b can be a macro for:

setq_pattern((*a, *), [b_return1, b_return2, etc])

that is, by *a, *, we mean: if a, the lhs, is the pattern 'x' and b returns optional return args 'q,r,s', then we turn 'x' into 'x, *', and match it against 'q,r,s'; x matches q and * matches r,s, so q is assigned to x, and r and s are both thrown away if the lhs is x,y and b returns q,r,s, then q is assigned to x and r is assigned to y

---

potential lightweight syntax for Option types (Haskell Maybe): 'x and x'

---

the default units should be 2-, 3-, and 4-character suffixes, e.g. 3hr, 2sec, 1mile, 1m, 1min (minute), 1yard, 1km, 1day, 1yr, 1gram, 1kg, 1np (neper), 1db (decibel; see also https://www.google.com/search?client=ubuntu&channel=fs&q=octave&ie=utf-8&oe=utf-8#channel=fs&q=octave+decibel+neper and http://nayuki.eigenstate.org/page/extending-the-use-of-logarithmic-scales), 1k (1000), 1ki (1024), 1kib (kibibyte) (note: our units are all lowercase, in constrast to the std, which has e.g. KiB? for kibibyte).

---

i guess we want to be case insensitive, except for the distinction between lowercase/capitalized/uppercase?

---

we're trying to NOT have too much syntax specific to the type system. We're trying to make a 'programmable' type system that re-uses syntax from elsewhere in the language.

---

unreserved capitalized words are :keyword values (:keyword is Ruby syntax), not annotations. This requires modules to export lists of 'reserved words' so the language knows which capitalized words are keywords and which are annotations (but if we're doing this, why not just have lowercase keywords? hmm.. mb b/c that's harder to read, b/c the reader will not always know ahead of time about the funky metaprogramming modules used by the program they are reading, making it hard to distinguish reserved words from ordinary identifiers? yes i think that's right, so keep annotations uppercase too)

instead of reserved words, could just have a keyword to annotation graph transform

--

_ as a prefix in an identifier denotes a private function __ as a prefix in an identifier denotes a field with language-given special semantics (eg __get) ___ is disalllowed anywhere in an identifier; this is reserved for use in mangling inside the compiler/interpreter

--

commas, when not within a data constructor, are for creating/destructure-binding optional return arguments. when in data constructors, could be delimiters or tuples depending on data constructor type.

---

could use : for 'begin', like Python.

---

some current syntax proposals:

so, use : to open a block, like 'begin', its like '(' except it is autoclosed at the end of a block

; is end-of-statement an ; is usually implicitly inserted at the end of each line (eg with each newline character), EXCEPT when the current line has unbalanced (open) parens OR when the line has a : or :: in it

statements are like being surrounded by implicit {}s

;; is end-of-autoblock

a block can be opened and closed with {}

opens a grouping which lasts until the end of the current autoblock
opens a grouping which lasts until the statement before the next ':' or until the end of the current autoblock

an 'autoblock' is opened at the beginning of each empty-line-separated-paragraph and closed at the end of each empty-line-separated-paragraph. To close an autoblock manually without an empty line, use ';;'. To open a grouping that lasts until the next autoblock close, use '::'. To open a grouping that lasts until the next autoblock close or the next ':', use ':'. The point of using autoblocks instead of ordinary blocks is that all of the remaining ':' and '::' groupings are closed at once at the autoblock close; so you can avoid seeing like six parens at the end of an autoblock like in Lisp.

'/' goes in between a label and the value being assigned to the node with that label, e.g. in function calls, keyword_argument_label/argument_value (like '=' in Python)

so eg

if a:: print 'a' if b: print 'b' print ('in fact, ' + 'a AND b') if c then/: print 'c' print 'in fact, a AND c (but not necessarily b)' stream/stdin

print 'hi'

note that that can be written as a one-liner:

{if a:: print 'a'; if b: print 'b'; print 'in fact, ' + 'a AND b'; if c then/: print 'c'; print 'in fact, a AND c (but not necessarily b)' stream/stdin; ;; print 'hi'}

let's make the block grouping explicit:

{ {if a { {print 'a'} {if b { {print 'b'} {print ('in fact, ' + 'a AND b')} }} {if c then/ { {print 'c'} {print 'in fact, a AND c (but not necessarily b)' stream/stdin} }} } } }

{print 'hi'}

we could also write that as a oneliner:

{{if a {{print 'a';}{if b {{print 'b';} {print ('in fact, ' + 'a AND b');}}} {if c then/ {{print 'c';}{print 'in fact, a AND c (but not necessarily b)' stream/stdin;}}}}}} ;; {print 'hi'}

the last ;; could just be a ; there because we've already made the effect of the autoblock explicit.

(are the ';'s needed as connectives separating imperative statements? or are they just for grouping, in which case we could get rid of them?)

regarding the dangling else problem, that is, how to parse: if a then if b then c else d as "if a then (if b then c else d)" or as "if a then (if b then c) else d"?

actually it would be neither of these, "if a then/ if b then/ c else/ d" would be parsed left-associative, as if everything was an argument to a curried function 'if' (after the tighter-binding '/' was bound): (((((if a) then/if) b) then/c) else/d)

(in Python that would look like "if(a, then=if, b, then/=c, else=d)"

so, how would one write "if a then (if b then c else d)" or "if a then (if b then c) else d"? Like this:

if a (then/(if b then/c else/d)) if a then/(if b then/c) else/d


current proposal for fn defn syntax:

(todo: find the previous proposal and make sure i didn't forget something better)

two forms: returning a single argument, and returning one or more arguments

returning a single argument:

  `input1 input2` 
    {code for function}
     or equivalently:
  `input1 input2`: 
     code for function ;;
  the value of the last line of 'code for function' is the return value

returning one or more named arguments:

`[output1 output2] = input1 input2` {code for function} (or the same equivalency using the colon as above)

  the value of output1 and output2 must be explicitly assigned in the function (if not assigned, what happens? autouse of nullable types and Null? default values for their types? compile-time error? run-time error?)

note: input or output arguments can have default values as follows:

`[output1/default1 output2] = input1 input2/default2` should we adopt the Python convention that if an input arg has a default, so must all later input args ('SyntaxError?: non-default argument follows default argument')? Or is this only a problem if you try to CALL a function giving a keyword arg and then a later positional arg?

so to define a named fn:

  testfn = `[output1/default1 output2] = input1 input2/default2`

when calling a function, multiple output args are optional, take them using commas:

  output1, output2 = testfn input1 input2

to take all output args into one variable:

  outputs = ,testfn input1 input2

to take only the first output:

  output1 = testfn input1 input2

to take only the first output, and destructure it into a pair:

  [output1_1, output1_2] = testfn input1 input2

to take both outputs, and destructure the first one into a pair:

  [output1_1, output1_2], output2 = testfn input1 input2

question: is

  [output1, output2] = ,testfn input1 input2

equivalent to

  output1, output2 = testfn input1 input2
  ?

i think so

note: if used inside an expression, either first arg, or all of them, depending on prefix ,: f(testfn) ;;; uses first output argument of testfn f(,testfn) ;;; uses list of all output arguments of testfn

---

~ is looking like a good candidate for 'near/sort of/kind of/approximate' plus mb 'weird special case' (as it's used in various Hoon odor syntactic sugars). In both cases, '~' means 'now the language is going to try to take care of some annoyance or edge case for you; usually it should Just Work but if not, be aware that there's something going on here that is not being spelled out explicitly'.

in the former meaning, '~' could be used to mark 'inexact' type representations, such as using int32 (which has a maximum) to represent the ideal of an integer.

~ could also be our universal coercion operator, e.g. if f outputs an integer and g takes a string,

g(~f(x)) could be a shorthand for g(str(f(x))) (with the compiler or runtime deducing the 'str' from the signature of g)


in fact, we should have a way to attach arbitrary 'laws' to typeclasses; such as saying an operator is associative, a relation is transitive, or such as the http://www.haskell.org/haskellwiki/Monad_laws . The toolchain is allowed to make the assumption that any instance of the class obeys the laws (e.g. to reorder parens around associative operators, which eg in the case of a foldr being leniently evaluated, could allow the prevention of a stack overflow by strictifying when the stack becomes too large); this threat makes the laws more than documentation (and indeed, perhaps someday we'll have theorem-proving tools to allow compliance with the laws to be proven in particular instances).

---

.. to access a meta node (instead of .)

and mb instead of __get and __set, just ..get and ..set

---

---

the more i think about it, the more i like having language-supported operator name conventions / operator syntax conventions for associativity and communitivity etc, even if it makes the language harder to get into.

so,

addition: ++ subtraction: -++ multiplication: division: -

leaving - free for inverse, and + and * free

this also leaves '--' free for comments, because -++ is subtraction, and '--' would mean 'the inverse of the inverse', which is unnecessary (mostly; maybe with intuitionistic logic it could be useful; but i think we can say that in general its use would be rare enough that it's okay to use '--' for something else more common, and force people to write -(-) if they really want the inverse of the inverse)

for convenience, we could also define -+ to be an alias for subtraction, and -* an alias for division

---

if - is inverse, we could do:

-3 could still be the literal 'negative 3', and -x where x is a number (rather than a function) is still the negation of x.

but then:

f = -g

means f is the inverse of g, not f is negative g, as expected.

so maybe what we want is just the convention that '-' tends to mean the inverse of something but only when a prefix in the operator name, eg it's not the case that '-' is the name of the inversion operation itself, and define:

addition: ++ subtraction: -+ multiplication: division: -*

but f = -g could still be the negation of g.

Alternately, define a special operator for negation, maybe 0-, making use of the programming language convention where a zero prefix has a special meaning (eg 0xaa for hex) (even though - isnt a numeric literal). So:

hmmm...

i guess i sorta like the former option. So, '-' is not the syntax for the inversion operator. '-g' is unary negation.

but this is kind of a waste, because we wasted a chance to have another unshifted dual prefix/postfix operator, -a and -a.

---

all possible unshifted prefix and postfix operators:

(note: we exclude eg [a a[ because we assume we want to use matched delimiters for other things)

---

we also have a big choice; is 'a' or "a" the way we write strings? We need to reserve the other one for maybe/option/exception tower. But strings are pretty common too, right?

so far i'm leaning towards "" for strings. One reason for that is that in shell one-liners, you'd like to avoid doing -quoting, because you'll be calling it as: oot -e 'code'

note that this same consideration means we need a verbose alternative to 'a and a'. I don't think we need syntax for that, though.

---

so if Capitalized means 'annotation' (e.g. Static, Const), and if UPPERCASE means label (label-type annotation), then how do we do keywords?

option (1): use Capitalized for keywords, UPPERCASE for annotation, something else for labels (mb LABEL::THELABEL; or maybe single-character uppercase, e.g. 'A')

option (2): following Ruby (:keyword), use .keyword (but this conflicts with object.attribute attribute referencing; but not really, since the . is at the beginning; but we may want to use ..object to indicate its superclass or something)

i guess i like option (2) right now

hmmm... another option is ""keyword. That's maybe even easier to remember, and leaves "a." open for something else, although it is a bit too hard to type.... hmmm, maybe that's too hard to type.

later: hmmm... so maybe now i actually like (1) better, with single-character uppercase for labels (and uppercase for annotation, LABEL::THELABEL if you need more than 26 labels). That way we save .a and a. for something else. ... but it does mean that when we want 'special' variable names, e.g. 'X' for auto-anonymous, we have no way to distinguish them just by looking at them. ... or, we COULD say that special variable names ARE keywords, keywords that are recognized by macros. I guess i like this best at this moment.

---

so if a.b is really just (a b), this does the trick; f a object.child.grandchild b == (((f a) object.child.grandchild ) b) == (((f a) (((object child) grandchild)) ) b), which is what we want

and i still like the idea that 'a , b' is short for '(a) (b)'. And a ,, b , c is short for ((a) (b)) (c).

Note that this means that , substitutes for the '

' pipeline character, or '$' in Haskell:

f a , g b == (f a) (g b) that is, first apply f to a, then apply the result to (g b), in other words, since things are curried, a is the first argument given to f and (g b) is the second argument.

so '.', when attached, brings things together, and ',' pushes things apart. which is nice. and we can still do ',,' for things that separate, but less than ','.

This might inspire us to use '..' and '...' as ways to bring things even closer together, eg

f object.a..b.grandchild == f (object.(a.b).grandchild) == f (object (a b) grandchild)

this does forbid the usage of '..' as 'parent', however. Maybe that's okay.

also, since .keyword seems okay here (it's easy to recognize that . is attached to the beginning of something), we might specify that the above behavior only occurs:

that is, the following constructs are still undefined:

hmmm i was looking for a syntactic marker for lazy arguments with a dual for strictifying, maybe ',a' and 'a,'?

---

BUT, the above use of commas conflicts with the multiple-argument-consuming syntax that i proposed earlier (e.g. a,b = max(n) means a is the max, b is the argmax)

maybe just force keyword return args when more than one arg is returned? eg

a loc/b = max(n)

where 'loc' is a keyword?

hmm, i kinda like that. i was worried about the reader forgetting which multiple return argument was which

---

Ruby's "block" syntax for anonymous functions is actually pretty good. Just enclose the function with {}, eg {blah}, and use {

argument1, argument2blah} if it takes arguments. I guess this is the same as a 'suspension', and it covers the 0-argument case cleanly. Changes i would make:

---

in anonymous functions, use the privilaged 'automatic' variables (xx, xx1, xx2, ... ? or just x, x1, x2, ...), with no need to specify the arguments taken by the function. E.g. mb:

---

if ,a and a, are used to specify strictness and non-strictness in function signatures, do they have any meaning in ordinary code, too? The 'strict' (',a', i guess) could be like 'a seq a' in Haskell, but then what's the use of the 'lazy' one? One would think it should just be a suspension, but now we already have {a} for that.

oh well, maybe it's easier to type a, than {a}

---

so, things are definitely different when attached and when unattached

should this only apply to single-character things? Or should we allow, eg, ++ to be addition when unattached, but postincrement when used as a postfix? Or should we fix ++x and x++ to be pre-whatever-with-default and post-whatever-with-default for all binary operators? Note that for arithmetic, these defaults are not the same as the operator's identities.

(i guess, just like oot knows about commutativity and associativity and inverses, it should know what the identities are for various operators?)

---

do we want Haskell-style 'sections', e.g. syntax for partial application of binary operators, or for cheaply reifying them into anonymous functions? eg i think Haskell would say

so using our syntax for the anonymous functions, we would have:

we could just leave it at that. But it's nice to at least be able to un-binary-op-ify a function. We could have:

but that is indeed a special case that makes parsing of the {} construct more complicated (unless we want to give up error messages whenever we see "++ 1" inside any expression).

We could have a character that when attached, binary-op-ifies and un-binary-op-ifies depending on if its prefix or postfix. E.g.

i guess i like that... but otoh actually giving up error messages when we try to parse "++ 1" is probably fine, too, and maybe even simpler.

So maybe stick with {++} as shorthand for {x ++ x1} and {++ 1} as shorthand for {x ++ 1}.

Now what about ++1 and 1++? Is that pre-increment and post-increment as applied to 1? Or is that shorthand for {++ 1} and {1 ++}?

---

i guess if '++1' means 'addition, pre-increment, with its default', then if '-' was a shortcut for '-++', then '-3' would mean 'subtraction, pre-increment', and if pre-increment applied to an immutable just meant to apply the operation as normal in an expression, then the default for '-' could be 0, and '-3' is short for '0 -++ 3', which is exactly what we want. Subtraction is fairly common so it wouldn't be too big of a deal to assign '-' to it. Although this way we do lose the possibility of having -a, a-, which is kinda big.

So if we are willing to have ++ for addition, why not -- for subtraction? but that violates our symmetry-of-operator-name-implies-commutativity-and-associativity rule.

so are we reduced to -+3 for "-3"? that seems too hard to type; not only is it an extra character, but it's not a double like --.

---

really, having ++ to save '+' for other stuff isn't as important as saving '-' for other stuff, since - is unshifted.

also, btw, i like having identifiers with - inside of them, like in Haskell

also, it's too bad that + is shifted and - is not, b/c addition is more common than subtraction

now i'm wondering if we should give up simplicity and just use '+' and '-' for addition and subtraction for ease of typing and adherence to convention..

or use + for addition, but introduce some other single-shifted-character for subtraction, like $ or something... but that's even harder to remember than _a for -a, which (i forgot which) language uses.

actually i guess ML has a good idea-- use '~'. That's kind of easy to remember, because it means 'not' in some other languages.

So right now we have:

---

yknow, as beautiful as that is... addition really is very common. Perhaps we should just byte the bullet and use '+' for addition. Even if our conventions prohibit us from using ++ for cons or for concatenation.

then again, it's not THAT much harder to type ++ instead of +. And cons and concatenation are reasonably common too.

but it still seems wrong that it should be harder to type addition than subtraction.

but i guess that is inherent in the idea that symmetry in the operator name should stand for commutivity.

The problem is that commutative operators tend to be 'simple', which tends to imply commonly-used, such as addition being more common than subtraction.

To apply the operator-name-symmetry-implies-commutativity consistently, i guess you'd say that single-character names are symmetric. So '+' implies commutivity and associativity. Then addition would be easier to type than subtraction. But this would mean that all non-commutative operators would have to have names at least two characters long, with two different characters. That is a big burden to type, and there are some common non-commutative operators, such as cons and concatenate. So you don't want to do that.

So you are left with saying that commutative operators must be at least two characters, while non-commutative ones can be one-character. Which makes subtraction easier to type than addition.

I guess one could still make addition easy to type relative to subtraction if it were two unshifted characters. But then we're getting into breaking convention in a really hard to remember way territory again.

The other thing one could do is to just use '+' as addition; the rule doesn't say that single-character names MUST be non-commutative, only that they can be. But then we have to think of another operator for cons, and a third one for concat; whereas at least concat would be easier to remember if it were some form of '+', so then we only have to think of a third thing for cons.

if only computer keyboards had + unshifted, or if we could use another, unshifted symbol for addition instead of +

---

y'know, i think i'm maybe pursuing beauty and 'ease of tying for the expert ooter' too much over ease of remembering for the casual oot user who is just trying the language out on a once a month hobby project or shell command.

you can't expect the casual user to remember +, ~, , -/ for addition, subtraction, multiplication, division (and even worse if we used another symbol in place of addition). There's no pattern. I think maybe we can expect the user to remember only one or two simple deviant rules, and from these they must be able to think of the name for each of addition, subtraction, multiplication, division, individually, in at most one step per rule.

So, e.g. +,-,*,/ would be zero rules. ++, --, , would be one rule (double everything). ++, --, , , and ~ for unary negation, would be two rules (double everything, and ~ for negation). ++, --, , div would be two rules (double everything and 'div' for division). +, ~, *, div would be two rules (~ instead of -, and 'div' for division).

i think it's more important for the casual user to be able to do arithmetic without making a mistake than it is for the expert to be able to look at an unknown operator and know whether it's associative and commutative without looking at its definition.

One could, instead, have the convention be that binary operators with multicharacter symmetric names take two arguments of the same type. So, e.g., under that convention, ++ isn't a good choice for cons.

addition can be thought of as a special case of concatenate (Python extend), applied to things which are identical (e.g. you add two numbers by thinking of each one as a set of identical dots laid out in a line, then concatenating those two sets together). So it's no biggie if the addition operator doesn't look symmetric. So the concatenate operator and the addition operator can be the same, and can just be '+'. wait: with vectors, you have distinct elementwise-add, and concat. hmm... i kinda like the matlab convention of '*' for matrix multiplication and '.*' for elementwise multiply. so prefix '.' is kinda like syntactic sugar for 'map' (well, not quite, actually 'elementwise' which is a generalization of map, i guess, but you see what i mean). So, '+' could be concat (which is also addition on numbers), and .+ is elementwise add. This may suggest what the dual (postfix-. eg a.) to prefix-. (eg .a) should be; if .op lifts 'op' to an fmap of itself, then op. should somehow 'lower' an fmap to its op. OR, perhaps +. is cons? E.g. "a op. b" is shorthand for "a op [b]". hmm, i like that..

The cons (Python append) operator is different, however, although cons can be written in terms of concatenate as 'x + [y]'. We probably do want syntactic sugar (an operator) for cons, especially because in many cases concat will be defined in terms of cons instead of the other way around, but we probably don't want to use ':' for that because we're using ':' elsewhere. Erlang uses '

', if we use ',' for pipeing (Shell's , Haskell's '$'), and keywords for multiple return args, then we could use for cons, although it's too bad it's shifted.

note also that we could use '%' for division instead of '/' or ''. That would be somewhat memorable.

this and related pages might be of use: http://en.wikipedia.org/wiki/Plus_and_minus_signs . That page notes that J uses _ for unary negation.

As for comment syntax, i think:

  1. is okay # easy to see and fairly easy to type % is okay % easy to see and fairly easy to type is good very easy to see and easy to type -- is okay -- fairly easy to see and easy to type ,, is not great ,, too hard to see is not great too hard to see /* is not great /* too hard to type

is slightly harder to parse than # or % but i think readability and typability is more important

i guess commenting is paramount, so that decides '' as the comment syntax

if we want to use one of '/' or '' for label/keyword argument (and i think we do, because we want it to be unshifted and asymmetric, and we want ';' to be some sort of delimiter), then we must use '/' for that, and use something else for division (probably either % or 'div').


also, are we sure we want to use ',' as the dual to '.', as above, rather than using , ,, ,,, etc as multidimensional array item separators, providing sugar to eliminate the need for [] in short arrays (e.g "a,b = f(x)" instead of "[a b] = f(x)"), and allowing one to delimit items in arrays without parens (e.g. "f a, f b" instead of "[(f a) (f b)]")? If not, then use '

'. '' doesn't seem like a bad choice; it is easy to see (even if hard to type), and already well-known.

one way to do both would be to say that within an array context, ',' and ',,' etc does one thing, and outside of it, it does another (i use to have the idea of different array context syntaxes, but maybe it would be easier not to have that, except maybe for auto-stringify, which now could be ""[]). But then we lose the syntactic sugar of "a , b" for "[a b]". Another thing would be to say that, outside of [] context, ',' is syntactic sugar for "a , b -> [a b]", and ',,' ',,,' etc are for pushing-away grouping, and inside of array context, they are all multidim separators.

Another thing to do would be to use ; ;; ;;; for multidim array separators, but use ',' for the first level, and use ,, ,,, etc for pushing-away grouping.

right now, i'm thinking it would be best to use '

' for pushing-away grouping, and , ,, ,,, for array separators. ; ;; ;;; are then open for termination.

---

note: or maybe * should be concatenate? i kinda remember seeing that choice for operations on strings in math..

--- yknow, .+ is ridiculously hard to type. Maybe .. for cons? Or ++? Or +?

... if we use +a and a+ for elementwise and cons-like, and + for concat, then ++ is cons wait, no, ++ is either/both elementwise or cons.. we dont want that ---

if '{}' is: suspension, anonymous lambda/ruby-style block constructor (can these take variables with `var`?)

then it conflicts with {} as grouping notation, right? or does it not matter (i think it doesn't). if it does, just use {{}} for anonymous lambda an suspensions

---

note: i don't think we need tuple constructor syntax b/c tuples are arrays are dicts are graphs to us, the immutability of tuple length is just an annotation

---

so right now, we seem to have:

/: label : line comment +: concatenation (with addition as a special case when used on numbers)

, ,,: multidim array separator characters, a , b is sugar for [a , b] "": string """: HERE document []: graph/array/dict constructor ""[]: autostringify array constructor
,: push-apart grouping
;; ;;;
terminators -: open. boundary? a. .a: open --: open ..: open

todo: think about a better way to do cons, see above

---

"a

b" is not just "(a) (b)", like Haskell $, but also makes the first thing be the input to the second: "(b) (a)"

but i guess this is not quite like Unix pipes. Unix pipes also connect the output of a stream (STDOUT) of the first thing to the input (STDIN) of the second. So if function 'a' caused a 'print' side-effect, it would not actually print to the console, but rather, it would print to 'b'. Should '

' do this?

One idea would be to have '

' just do the simpler thing (a b == (b) (a)), and have do the more complicated thing (a b == (b) (a), and also redirect a's STDOUT to b's STDIN). Then could also have argumentsfor something even more complicated; 'arguments' could specify which streams are intercepted and connected to which.

---

i guess + for addition, * for multiplication, - for subtraction are the easiest to remember. We can still use those symbols for other things in prefix or suffix (attached) forms. We could even use / for division, and maybe we should, but i am tempted to use 'div' for division since it's kinda weird, what with division by zero; or use % for division because, why not.

---

instead of having a dicriminated union case construct as a built-in ('case' as in GHC Haskell core), could just have a type() function and a normal case/switch or even just if.

you'd still need pattern matching, though. But again, if patterns are first class, that could just be a function (at least, in the core language; maybe then you'd add some syntactic sugar on top to make it concise to pattern-match in Oot proper).

---

so, we need to be able to apply __set to subfields of objects, e.g.

object1.field1.subfield2 = 3

as noted elsewhere, some ways to do this are to give object1 the entire path, or to request object 1's 'setter' for its field1. This could be done using a protocol/magic method called __set, or __lvalue

---

  1. token for footnote application
    1. token to define footnote

actually, a prefix vs. postfix would probably be better for this, eg #token for footnote definition, token# for token application

note: footnotes' lexical scoping is at their place of application, not their place of definition; it's just like copy/pasting code, except that footnotes are implicitly surrounded by a grouping construct such as {}

---

need some std semantic conventions for prefix vs. postfix to let ppl be able to remember them.

one convention could be: the thing that you tend to do first in the code (in the sense that it is executed first) is one, the thing you tend to do afterwards is the other. So, for example, defining a footnote is the one, using it is the other; wrapping a value is the one, unwrapping it is the other; constructing a value is the one, deconstructing it is the other.

which should be which #first, second# makes sense b/c of reading from left to right, but first#, #second makes sense since function application takes arguments from the right and gives them to a function on the left.

i guess #first, second# is also slightly easier to remember because #first is reminiscent of C's #define, for those who know C.

So let's go with #first, second# for now.

So, #define_footnote, use_footnote#; and 'wrap_in_optional, unwrap_optional'

---

. implies that the things in between are keywords, not variables;

e.g. in Python, if you say

x.date.month

that means "go to variable x, look up the field named 'date', then look up the field named 'month' on the result". "date" and "month" refer to keywords (field names), not to variables.

if we do that, then .keyword seems like a good syntax for keywords (leaving ' for error handling)

---

really, in order to be able to have a chain of functions with default arguments that call one another, where the top/outermost function doesn't re-specify the defaults for default arguments used by the innermost function, what you need is a sentinel value that can be set as the 'default' at the outermost and passed along, such that if you pass this sentinel into a function in a parameter that has a default specified, the sentinel value is always replaced with the default. You'd think 'None' would be this way in Python, but it isn't. In any case, should probably have a special sentinel just for this.


is

'X is A' then has aspects of all of:

eg "That man is a human", "Every human is an animal", "Haskell integers are Haskell Nums", "That man is Bob", "1+1 is 2", "The sky is blue"

maybe 'x is A' returns True when any of those return True.

This encourages extensibility when doing tests for things like whether exception X is of a certain type.


it's still unclear about what infix '.' should be used for. If structs (graphs, nodes) are also functions, such that applying them calls their --GET, then [1 2 3].1 == [1 2 3] 1, so why bother to have the period? it provides tight binding, but that's not very much reason to waste such a good, easy-to-type symbol.

one other idea is to use it to reverse/pipe things, eg:

sort (filter (x)) == x.filter.sort

another idea is to use it as a low precedence grouping operator with no other semantics (like '$' in Haskell?), eg:

f x . g y == (f x) (g y)

(this is sort of like putting the dot BETWEEN groups instead of replacing each space WITHIN a group by a dot) (note that this is what we've proposed doing with commas, tho, see below)

note that without a dot, you can't specify subedges within a graph constructor without entering code context:

a graph with an edge whose target is another edge, rather than a node: gwet = [NODE1 NODE2 NODE1/NODE2 NODE2/(NODE1 NODE2). ); gwet.NODE2.0.--SRC == gwet.NODE1

instead of

a graph with an edge whose target is another edge, rather than a node: gwet = [NODE1 NODE2 NODE1/NODE2 NODE2/NODE1.NODE2. ); gwet.NODE2.0.--SRC == gwet.NODE1

also, the meaning of double, triple . eg .. ... are unclear too

so far, i have provisionally assigned it to pipe

---

it's still unclear what the 'syntax' for negation will be. If we want to say eg "-3" then we need to alter the syntax to accomodate (and possibly use prefix '_' and '__' for privacy?). Otherwise we should probably use '_'

---

as of now, _ is still unclear

---

d "apple" == red
d2 = [A=[0=$B] B=[1=$C] B=[2=$D] D ]]
d2 A 0 1 2 == d2 D == d2 A 0 1 2 == (((d2 A) 0) 1) 2

todo: Perhaps the dot semantics will differ from ordinary function application when used on the lhs of an assignment? Perhaps something about testing for existence at each step, although i think we've got that covered with '(d2 A 0 1 2). Or is this useful at all?

When on the lhs of an assignment, two dots can be used to create desired edge if one doesn't yet exist:

d3 = [D]
d3..A..0..1..2  = d3 D

Note that in this case, the nodes newly created are all labeled NIL.

Three dots can be used to create a first-class path object by defining the starting and ending nodes (todo really?):

d2 = [A=[0=$B] B=[1=$C] B=[2=$D] D ]]
path = d2 A...0 1...2
len(path) == 1

---

other things to think about:

---

Commas (convenience only; not in Oot Core)

A single comma has the lowest precedence aside from = and multiple commas, and has the effect of placing an implicit data constructor around its area of precedence, and surrounding each side of itself with implicit parens, unless this area is itself delimited on both sides by an explicit data constructor. In addition, within this implicit or explicit data constructor, it serves as a shorthand for binding each comma-delimited item to a node labeled by its ordinal position within the comma-delimited list, starting with 0. For example, "a + b, c = f x" is the same as "a + b,c = f x" is the same as "[(a + b), (c)] = f x"; and all are the same as "[0=(a+b); 1=(c);] = f x".

A string of multiple commas has different effects depending on whether it is found in code context or in data context. In code context, it does nothing except by virtue of its precedence (a double comma's precedence is one step lower than a single comma, and each additional comma is one step lower); eg "f x ,, f y == (f x) (f y)"; eg "f x ,, g z ,,, f y == ((fx) (gz)) (f y) == f x,,g z,,,f y"

In data context, a string of multiple commas creates a multidimensional array. The first dimension of the array is delimited by single commas, the next dimension by double commas, etc.

todo:

commas for arrays is shorthand for: [a,b] is [0=a; 1=b]

multidim arrays are shorthand for: [a,b,,c,d] is [0,0=a, 0,1=b, 1,0=c, 1,1=d] is [[0=0,1=0]=a, [0=0,1=1]=b, [0=1,1=0]=c, [0=1,1=1]=d]

todo:

note (and think about) the idea of using node literals as nodenames for multidims

---

syntax table needs to be updated with the last 10 or so entries here

---

we want to easily generate dicts like

{'a': a, 'b': b, 'c': c}

seems like the way to go should be a macro mac used like:

mac [a b c]

but wouldn't that violate our rule that macros can't see the names of lowercase identifiers? Could try to just say, okay, but they can't branch based on these names; but that sounds hard to enforce; could that just be a convention? hmm..

this line of thinking also suggests that maybe ""xyz should return the string name of a variable VALUE, rather than the token in the source code that it is next to; eg

xyz = $abc; ""xyz == 'abc'

but that is DIRECTLY violating our desired prohibition.

however, for the task at hand, probably a simpler answer would just be to have some syntax to construct a list of strings without typing all the quotation marks, eg:

""[a b c] == ["a" "b" "c"]

now 'mac' can function without breaking the prohibition:

mac ""[a b c]

---

consider making '..' be 'traverse to parent'. This is only really useful when dealing with tree data with backlinks, but that's pretty common. This would be way useful to eg do things like JS while bypassing the 'when "this" is bound' madness (see eg https://news.ycombinator.com/item?id=9743627 )

---

oh! i know! put the privacy and system reserved dashes at the END, and put the negation and inverse dashes at the BEGINNING:

we can still use multiple dashes in front to mean something else, such as to 'mark' (annotate, distinguish) something.

Also, as for distinguishing the argument to be computed in an inverse, mb just use '_', or some other 'distinguished' identifier:

_ -* 3 = 6 == 2 (the inverse of the multiplication function: what * 3 = 6)

however, _ is maybe better used as a convention for a throwaway variable. Could use __. Or could use WHAT--:

WHAT-- -* 3 = 6 == 2


hmm... if we want to be able to do if..elif..elif..else as an ordinary (lazy) 'if' function passed blocks with keywords, then it'll get passed 'elif' multiple times, and it needs to know the ordering of them. So we have ordered multimaps.

also, with 'switch', we might have eg:

  switch: i
  2     // "two"
  1     // "one"
  0     // "zero"

but if we were going to just give everything an additional list position label, then "two" is labeled both 0 (because it is first) and also 2 (because that's what it is explicitly given).

so, we need to distinguish the list position keys from the other keys somehow, probably by having edgetypes, or possibly by using views. Note that this makes 'lists' a little more different from associative arrays.

---

need syntax to indicate lack of ordeering; mb one ';', so the implicit default would be ';;'? i dunno, that could make one-liners confusing. but it fits with our mutidim array syntax

i like this, but it conflicts with our current comment syntax

mb use -- for comments like SQL, Lua, HTML, etc? no, this conflicts with our desire to have - inside identifiers, like in Haskell, instead of _

mb use # for comments? but i thought that was for metaprogramming? ok how about '# ' to begin a comment; eg #comment is NOT a comment, but '# comment' is. This annoys me because then either you can't use ## for the target in footnote defns, and you cant use ##### to start a comment.

if we're going there, then how about ' -- ' starts a comment; again, it annoys me that ------ wouldn't start one though. Wait, mb it would; just say that any number of -s, AS LONG AS THEY ARE SURROUNDED BY SPACES, starts a line comment; eg the PCRE regexp expression "\s-+\s" begins a to-EOL comment. OK, it annoys me that we are allowing the comment character to serve another function, and ;; would have been cooler, but this isn't crazy.

i still like ;; better tho, and we can use ,, for multidim arrays. mb find another way to indicate non-sequential delimiter. Like '.;'; these keys are right next to each other on my keyboard. Or still consider ' : ' (UNattached :); our previous use for : was attached; except that we had defined unattached colon as parenthesizing both the left and the right sides, eg a b : c d was (a b) (c d). Mb could drop that use. Or maybe could use that for that, and use '::' for non-sequential delimiter. That's a little more irregular and hence harder to remember, though. We could make : the delimiter and :: the thing that parensethizes both sides, but i dont like that because you expect you can type a delimiter twice.

so, so far my best guess is: leave ';;' as comment delimiter. Leave : as parenthesizer. Use :: for unsequenced delimiter.

But wait; harder to type and uglier to read but easier to remember is :; for unsequenced delimiter. I guess that's better then. OK, i added that to oot.txt.

---

as oot.txt says,

todo: in 'a ; b :; c ; d', does ; or :; have precedence, that is, is this like a ; (b :; c) ; d, or like (a ; b) :; (c ; d)?

this question depends on which one is the default (the one that newline turns into). If sequential is the default, then most lines (implicitly) end in ;, so it makes sense for :; to separate large blocks of ;s ((a ; b) :; (c ; d)); but it non-sequential is the default, then most lines implicitly end in :;, and then it makes sense for ; to separate large blocks of :; (a ; (b :; c) ; d).

i guess if we are really pushing the concurrent, brain-like angle, non-sequential should be the default. But that'll make it harder for beginners, who'll write:

  print("Hello")
  print("World")

and then wonder why "World Hello" got printed.

am still considering having :, not ;, be the non-sequential delimiter. It's easier to type, but harder to visually distinguish.

---

in/elem may need punctuation, since they have precedence.

(or at least one-letter macros?)

---

or, mb '.' should be 'this token is a keyword' so you don't have to capitalize all the time to make up for things like 'args.symbol' in Python.

let's do that with one period, and use .. for pipelining

this makes sense; '.' is unshifted so we should use it for common shortcuts.

---

now that using a..b for reified edges has been displaced, we need a new notation for that. That's okay, because it's uncommon.

---

something like python's *args and kw (and note that in general, *list just expands the list and dict just expands the dict)

---

python's decorator syntax

---

we need lightweight syntax for the task of creating an instance of an anonymous class with a given set of superclasses (see ootErrorNotes2 for motivation)

---

the more i think about it, the more i think '.' would be most natural for pipelining. mb it can ALSO be for keywords ('symbols'?).

mb .. can be pipelining without keywords.

eg in python:

import sys sys.argv

in oot, we could do the same thing:

sys.argv

and this would mean:

start with the value of variable 'sys', and then apply attribute access (function calling) to it to access the attribute whose name is the value of keyword ARGV

note that the order of function composition is not reversed here; this is like:

sys ARGV

aside from forcing 'argv' to be read as a keyword, in another language the period syntax would also make a difference when you chain:

a.b.c means (a B)(C), rather than a(B,C)

but if Oot is a currying language, then this is not actually a difference

another difference is that '.' binds tightly, so eg

f a.b.c q == (f (a.b.c)) q == (f ((a b) c)) q, in contrast to f a b c q == (((f a) b) c) q

if you wanted to lookup the attribute whose name is the value of the VARIABLE x, maybe it could be:

sys..x

or mb we use '..' for something else (eg reified arcs) and do as Python does, and force explicit use of metafunctions for this:

getattr(sys, x)

of course this would take '.' off the table for the purpose of having f(x) be represented as x.f, to allow easy pipelining, eg x.f1.f2 instead of f2 (f1 (x)) (or f2 $ f1 $ x, as Haskell puts it). hmm..

perhaps use 'pipe' but 'attached':

x

f1f2 = f2(f1(x))

vs unattached:

f2

f1x = Haskell's '$' operator: f2 $ f1 $ x == f2(f1(x))

and

x

f1f2 = a pipeline where stdin and stdout (or their object-y variants; or more generally, mutations to the 'outside environment' as defined by some statemask or its complement) are remapped

note:

---

ok, i just looked at the syntax table and one problem is that ';;' is listed for both:

i have an idea. Paragraphs are just for auto-closing parentheses (they are not blocks; blocks are curly braces, '{}'). Which is just a convenience. And the problem here is only caused by one-liners. So, just prohibit paragraphs in one-liners. Or, more to the point, every line is its own paragraph in one-liners; more formally, ';' is both EOL and EndOfParagraph? (or even MORE formally, it's just EndOfParagraph?, since EndOfParagraph? implies EOL already). So there is no way to type 'EOL' in a one-liner; ';' is E.O.P.; EOL can only be typed by an actual newline.

(should we force all parens to be explicit in one-liners or just have ; be EndOfParagraph?? obviously the latter; because the interpreter shouldn't have to behave differently whether it is dealing with a one-liner or not).

---

we don't have variadic functions (functions that take a variable number of arguments); if you have such a function you have to pass the variable number of stuff explicitly as a list. But to make this easier to type in the common case, there is syntactic sugar in postfix asterisk (asterisk*, postfix*):

(func* a b c); == (func [a b c]); all it does is enclose the rest of this subexpression in a list constructor

---

for function definition syntax, could just do

outputs = fn x y z = fnbody

is this LL(1)? it seems not (you don't know how many equals signs there are until you look), but maybe it's fine if we just parse it as a "sequence of expressions of arbitrary length, separated by =s". How does Python handle this with ordinary expressions? I think it does something similar; i think it doesn't know for sure that it is in the LHS of an =s sign when it parses the LHS; LHS-specific constraints are applied after parsing, during the conversion from concrete syntax tree to AST

---

i guess i should say somewhere that my idea is that the syntax of oot will be easier to remember b/c i try to consolidate related concepts together and then to use the same punctuation sigils (or multi-letter constructs with the sigial in them) for all of them. Eg '$' has to do with variable interpolation, substitution, evaluation strategy.

---

~ as a sigil prefix to indicate misc special literals when the literal is an unbroken alphanumeric, and x"" otherwise

---

todo reread "Erlang vs. Java syntax" in ootSyntaxNotes6, and the following note

(later: umm, there's no note here; there was nothing below this in the file when i came back here, anything below now has been added later)

---

" Bug: Renaming a function and forgetting to rename all of the overrides. Your Automobile class has a brake() function, and you decide to rename it to applyBrakePedal(). You get some compiler errors and fix up all of the callers, but you forget that the Batmobile subclass overrides brake() with logic to deploy the drag parachute, and now when Batman slams on the brakes, the parachute fails to deploy and he smashes into Macy's in spectacular fashion.

The Swift Fix: Override functions must use the override keyword. Now when you rename the superclass's function, all the subclass functions fail to compile because they're not overriding anything. Why did no one think of this before? I know you Eclipse users out there just click Refactor and go back to sleep, but for folks who use more primitive IDEs either by necessity or by choice, this is a great language feature. "

---

require/ensure sounds like good wording

---

Forth uses single backslashes for COMMENT-TO-EOL; intriguing, this might be better than our current ';;':

http://galileo.phys.virginia.edu/classes/551.jvn.fall01/fsm.html

---

some neat syntax for FSMs (finite state machines) :

" ...

>1? id.len @ 7 < DUP 1 AND SWAP NOT 2 AND + ;

...

5 WIDE FSM: (fp#) \ input:

\ state: ------------------------------------------------------- ( 0 ) NOOP >4 NOOP >4 +mant >0? NOOP >4 1+ >1 ( 1 ) NOOP >4 ?1+ >2 +mant >1? #err >4 #err >4 ( 2 ) NOOP >4 #err >4 +exp >3 1+ >3 #err >4 ( 3 ) NOOP >4 #err >4 +exp >3? #err >4 #err >4 ;
other dDeE digit + or -dp

" -- [1]

notes (these are somewhat speculative, but still useful from the point of view of how we should implement FSM syntax):

--- Document quotes (our version of "here documents", inspired by Perl):

syntax for multiline quoting, preserving whitespace including newlines

two variants: """ document """

or

"""delimeter document delimiter

Implementation notes: this is a purely syntactic construct.

---


Footnotes:

1.

 and 

2.

 for lt, <= for leq, 

3.

 and <= are not free, we can't have a single < be a delimiter even without whitespace around it (unlike the typical use of (), e.g. (a+b)*3). And if << is not free then we can't have << 

4.

 be lt but have <<text be a delimiter; but then what about custom lessthan-like comparison operators? you'd want to let the programmer make those start with <<)

if subtraction is -++ or -+ instead of - or --, then we could use -- as the to-EOL comment delimiter. We want to encourage commenting, so it's important that the EOL comment delimiter be easy to type. ;; is another candidate.

we could use / and \ or and
as boundary delimiters. Note that the nesting of boundaries with lexical scopes is not straightforward; e.g.:

transaction a = b if a > 3: c = d doE transaction
else: transaction
doF

in this case the compiler infers that each possible lexical path out of the 'if' is covered by a boundary end. But we may instead want to let the user do:

transaction a = b if a > 3: c = d doE else: transaction
doF transaction

which would be especially helpful if there was a switch with many cases, instead of an if with two cases. Here the compiler has to realize that, for the 'else' branch, the transaction is already closed so the later transaction close outside if the 'if' has no effect in that case.

also, not that with this sort of thing, we might have:

transaction1 transaction2 a = b if a > 3: c = d doE else: transaction2
doF transaction1
transaction2

so now the compiler must track each of transaction1 and transaction2 separately.

so we want to allow the transaction to be specified dynamically?

x = transaction1 y = transaction2 $x $y a = b if a > 3: c = d doE else: z = y $z
doF $x
$y

i have a feeling this may make standard parsing techniques not work. but maybe simple ol' recursive descent still will.

also, we still need a character-based metaregion for other languages (like shell backticks) and for regexs. I used to think that <<