Bayle Shanks's website: proj-oot-ootSyntaxThoughts

want LL(1) at the high-level parsed via recursive descent, plus Pratt parsing (or shunting yard, but i've heard Pratt parsing is supposed to fit together nicer with recursive descent?)

https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html https://matklad.github.io/2020/04/15/from-pratt-to-dijkstra.html https://matklad.github.io/2023/05/21/resilient-ll-parsing-tutorial.html

https://www.reddit.com/r/rust/comments/g0eusf/blog_post_simple_but_powerful_pratt_parsing/ https://www.reddit.com/r/rust/comments/g1p1mn/blog_post_from_pratt_to_dijkstra/ https://lobste.rs/s/o1jwxo/from_pratt_dijkstra https://internals.rust-lang.org/t/proposal-grammar-working-group/8442/46

---

unlike Haskell, we don't want to allow arbitrary operator precedence, because then you have to look up (or have memorized) every function in code that you read just in order to parse it.

like Haskell and Lisp and unlike Python, we separate fn arguments by spaces, not commas, and we don't require them to be surrounded by parens

like Haskell and unlike lisp, we dont require outer fns invocation to be surrounded by parens;

like haskell unlike lisp, we have function composition syntax (e.g. haskell's $)

like python and unlike haskell and lisp, we have syntax for list/array/dict access and mutation

like haskell, we have optional whitespace

like lisp and octave and unlike python, we allow space-separated list literals

like javascript, we have a type of map literal that implicitly quotes the key names

like ruby and perl (supposedly; see http://news.ycombinator.com/item?id=3068819 ), good options for quoting within literals

like octave, we have a way to quickly print out the result of each step (but ours is opt-in, not opt-out; e.g. mb if there IS a semicolon, then print)

(not syntax)

like python and unlike haskell and lisp, our fundamental data structure is dict-like, not list-like

pattern matching partial functions? seems messier than a pattern-matching switch statment to me; that way it's all in one place. is there any benefit to doing it the partial way? also, look at simon's advanced guards thingee -- is there some corecursion thing going on there that makes partial functions look more appropriate?

--- i just wrote:

Read left-to-right (standard orientation)

That is to say, in Oot you write "x = 1++(f y)", not "3++(y f) = x". "3++(y f) = x" (the nonstandard way that we didn't adopt) would be clearer, because when you read it from left to right, that follows the order that data actually flows thru the expression.

However, programming text is usually left-aligned, which makes it easy to scan down the screen with your eye and look at the leftmost words.. and hard to scan down looking at the rightmost words. A common use of scanning is to look for where some variable is defined. So, the lhs of the assignment operator should indeed be on the left hand side.

todo: mb mix and match? x = 3++(y f) ? "

hmm, mb should do:

anything with a side-effect, including assignment, goes on the leftmost side. So, there can only be one side-effect per line. The side-effectful thing is executed last. Everything else is 'backwards', e.g. x f instead of f x, i.e. the left-er stuff is executed earlier.

hmmm but now where/how do we put parens now that everything is backwards?

instead of:

defun (map f xs if (unit? xs unit cons f car xs,, map f, cdr xs

we have

defun ( unit xs cdr, f map,, xs car ..

no no no, the whole idea of doing it backwards is for the order of evaluation to be followed. so you can't put something after a conditional first. so i guess the order of arguments should not be reversed

defun unit? xs unit cons xs car f ,, xs cdr, f map ) map xs f

no, too irregular. since we want the side-effect to go on the left, and the side-effect is usually the last thing, we either have to put the first stuff on the right, as usual, or we do it backwards but then we have an irregularity because we switch directions between the first thing on the left and the second.

so let's stick with the usual direction.

---

i do like the rule that a side-effectful command/function may only be leftmost on each line (and hence there can be at most one per line). But this means that when you change something from pure to side-effectful, callers all the way up the call chain must rewrite the way they call you.

right associative, but left associative with , (which only changes parsing to the right of the comma), newline with parens open is left associative w/r/t the parens opening through the line boundary. In other words its like http://chrisdone.com/z/ except using opening parens instead of indentation, and with commas to indicate multiple arguments.

If there is a newline without unbalanced opening parens on that line, then the line is implicitly surrounded with parens.

empty lines are sugar for {}s which are like 'big parens' that define 'blocks'. Blocks are like big parens which auto-close parens as needed at the end of the block. They also provide scope for block-scoped macros.

since we are using {} and () now for grouping, and we are using [] for data literals, we need another bracket grouping symbol for annotations and/or type system directives. How about <>. Use 1 for comparison.

---

todo read http://en.wikipedia.org/wiki/Bracket#Uses_of_.22.28.22_and_.22.29.22 and subsequent 'uses of' sections.

---

for punctuation used as operators and also as normal symbols, you want the languagey things for single uses and the normal symbols as repetitions, because the languagey uses are much more common, e.g. - is inverse, -- is subtraction. But for punctuation used as grouping, you want the single uses as symbols and repetitions as grouping, because once you think you see a grouping symbol you dont want to have to lookahead to see if its not really one, and also you want weird grouping stuff to stand out when you are skimming code

mb start with right associative, and switch into left association mode when you enter parens, or start with left associative, and switch into right association mode when you enter parens

dont alternate/switch again with nested parens; thats too confusing

in the context of wondering if "a , b , c = 1 , 2 , 3" should work, and if it would that would seem to imply that a, b, c == [a b c], what is really gained from Haskell-ish currying, e.g. always having f(x)(y)(z) instead of f([x,y,z]) ?

http://www.haskell.org/haskellwiki/Currying says "The major advantage of considering all functions as curried is theoretical: formal proofs are easier when all functions are treated uniformly (one argument in, one result out). Having said that, there are Haskell idioms and techniques for which you need to understand currying."

http://www.haskell.org/haskellwiki/Composing_functions_with_multiple_values notes the imbalance between curried inputs and tuple outputs

oh i remember one thing it lets you do; it lets you not distinguish between:

a function f that takes one input and returns a function g that takes one input and returns a number
a function f that takes two inputs and returns a number

can this be easily mimiced with tuples and partial application? partial application can take the second and turn it into the first, but is there a situation where you need to go the other way somehow?

i think that's it. the benifit of currying instead of having tuple arguments is that it lets an arbitrary multiargument function be implemented as a function that takes some of its arguments and returns a function that takes the rest. i guess that's important.

remember that big blocks are also scopes for things like macros and transactions (should we allow transactions with 'dynamic boundaries' too? probably)

-- ppl seem to dislike function scope, and prefer block scope but with closures (e.g. something defined in an ancestor block is available in all descendents)

http://www.adequatelygood.com/JavaScript-Scoping-and-Hoisting.html

Don’t tell me it’s got lexical scope, because JavaScript’s? scoping is an abomination in the face of God. Guy Steele isn’t even dead and JS scope makes him pre-emptively roll in his not-yet-occupied grave.

...

At the same time, we’re ignoring the things about JavaScript? that make it not Scheme. It’s got a much richer syntax including a great notation for data. I’m not a huge fan of prototypes (anymore), but it’s an interesting dispatch model that Scheme doesn’t have. "

-- http://journal.stuffwithstuff.com/2013/07/18/javascript-isnt-scheme/

    "JavaScript's C-like syntax, including curly braces and the clunky for statement, makes it appear to be an ordinary procedural language. This is misleading because JavaScript has more in common with functional languages like Lisp or Scheme than with C or Java. It has arrays instead of lists and objects instead of property lists. Functions are first class. It has closures. You get lambdas without having to balance all those parens." -- http://www.crockford.com/javascript/javascript.html

Avatar danielparks • 4 days ago

Could you expand on your contention that JavaScript? isn’t lexically scoped? 3 • Reply • Share ›

    Avatar
    Calvin Metcalf danielparks • 4 days ago

    It is functionally scoped instead of block scoped and while it is mostly lexically scoped 'this' is dynamically scoped.
    6
    •
    Reply
    •
    Share ›
        Avatar
        munificent Mod Calvin Metcalf • 4 days ago
            −

        Not just that, but thanks to with and the global object, you always have dynamically scoped variables.
        1
        •
        Reply
        •
        Share ›" "

---

let's try to think of a way to use capitalization more usefuly than (a) distinguishing a certain type (the Type type), or (b) scoping.

capitalization to distinguish keywords, e.g. Bob instead of :bob
- or could have all caps for this
capitalization to distinguish one-word lowercase strings, e.g. Bob instead of "bob"
- again, should strings and keywords be equivalent, and keywords are just an internal optimization?
capitalization for grouping that is auto-terminated at EOL, e.g. +1 X / 2 instead of +1 (x / 2
- this doesn't work with uncapitalizable operators
Haskell uses case in type expressions to distinguish between words naming types, and type variables, e.g. in "a -> Bool", a is a type var and Bool is a specific type (we might use "?a" for this)
or for other sorts of grouping; in English, a capitalized word helps you pick out the beginnings of sentences (and periods pick out the end)
http://c2.com/cgi/wiki?SelfUsesCapitalizationForSyntax
capitalization as a shortcut for one-word anti-quote
some commonly used functional modifier/attribute such as strictify
map (e.g. whether a list variable should be treated as vectorized or not when an operation is applied to it)
quoting/meta
(node) labels in graphs
concurrency-related stuff like thread ownership, garbage collected-or-not, etc
mutable
foldl vs foldr
to take a function that takes a list or map argument last, and apply it variadically (as if it were variadic) (the opposite of 'apply' in Clojure, which can be used to apply a truly variadic function to a list of unknown length)
levels
capitalization to get the 'raw' version, e.g. absolute apply instead of ask-the-argument-how-to-apply
capitalization to not follow binds/links, e.g. like when you say "ls ~/aba -dl" to see where the symlink ~/aba points
customizable by user (e.g. different meaning in different scopes)
capitalization to distinguish symbols whose name is accessable/significant by/in metaprogramming frameworks, e.g. the way that table named are inferred from variables in Ruby on Rails. Uppercase to distinguish source position labels (is this different from footnotes? i guess a footnote that inserts code should be further distinguished; mb not an annotation?)

note: many of these may deserve some other special syntax, if not capitalization

so far i like the last one the best. e.g. capitalized names can optionally break through hygenicity of macros. But how to encourage ppl not to use metaprogramming tricks like ruby's method_missing to do the same thing for lowercase names? perhaps a version of method_missing that only works on capitalized names should be provided, and use of the full (original) method_missing that works on all names should be discouraged (at a higher level of the metaprogramming hierarchy).

coq's 'notations' syntax

" We can make numerical expressions a little easier to read and write by introducing "notations" for addition, multiplication, and subtraction.

Notation "x + y" := (plus x y) (at level 50, left associativity) : nat_scope. Notation "x - y" := (minus x y) (at level 50, left associativity) : nat_scope. Notation "x * y" := (mult x y) (at level 40, left associativity) : nat_scope. "

"at level x" is precedence, "nat_scope" is which namespace the notation is declared in

" Notation "( x , y )" := (pair x y). "

" Notation "x :: l" := (cons x l) (at level 60, right associativity). Notation "[ ]" := nil. Notation "[ x ; .. ; y ]" := (cons x .. (cons y nil) ..).

"For example, since we defined + as infix notation for the plus function at level 50, ... The + operator will bind tighter than ::, so 1 + 2 :: [3] will be parsed, as we'd expect, as (1 + 2) :: [3] rather than 1 + (2 :: [3]). "

The right associativity annotation tells Coq how to parenthesize expressions involving several uses of :: so that, for example, the next three declarations mean exactly the same thing:

Definition mylist1 := 1 :: (2 :: (3 :: nil)). Definition mylist2 := 1 :: 2 :: 3 :: nil. Definition mylist3 := [1;2;3]. "

" Notation "x ++ y" := (app x y) (right associativity, at level 60). "

homoiconicity: at some point need to take this more seriously. No "graph data constructor". Grouping constructs, etc, are the same in graphs as in code. Graph node labels are used in code. Etc.

list of syntactic/semantic universals:

all
any
more generally, things that behave like forall, exists; such as obligatory, permissible
at least one, exactly one
{something in a meta language}
self?
me? (different from self?)
=
/ (arrow)
given/where/filter (e.g. the pipe in {x

x > 3}; the pipe in Haskell guards) (note: "implies" (arrow) is related; "b, given a" is the same as "a -> b", so maybe just use that? e.g. instead of Haskell guard syntax "f x

x > 3 = 2" we'd use "x > 3 --> f x = 2"; this unification is also needed to unify Haskell pattern guards, and typeclass contexts)

is a member of/isa
?x for 'variable x'
requires/provides
??what else
look at Kant?
look at OWL?

--- http://www.haskell.org/haskellwiki/GADTs_for_dummies ( http://web.archive.org/web/20130702221947/http://www.haskell.org/haskellwiki/GADTs_for_dummies )brings up an excellent point: type classes are like arbitrary functions on types, with normal Haskell stuff like pattern matching, algebraic data types (multiple constructors), guards, etc, except with a confusing relational (rather than functional) syntax.

this brings up the obvious point: Oot could be like this but use normal syntax

seems like there is really a lot of mileage to be had just by delivering a uniform notation for data, code, patterns, types, typeclasses. Seems like you could get one just by reading through that article and using the 'basic' (base level functions) one.

note: Haskell's data syntax is a little confusing if thought of as just pattern matching; from the above:

" data Either a b = Left a Either a b = Right b

we write just

data Either a b = Left a

Right b

but really we meant "the type of Left a is Either a b, AND the type of Right a is Either a b AND nothing else is Either a b"

note that this is like a Coq match statement.

---

it can be annoying if functions are variadic:

http://stackoverflow.com/questions/7823516/why-are-many-clojure-functions-variadic

---

if we have optional return arguments, then need a way to get all the return arguments sometimes

perhaps similar to the way that we have variadic *args, kw formal parameters at the end of fn declarations

---

so perhaps we have 2 syntactical argument conventions:

a function can be converted to an in-place mutating function on its first argument
a function can be be called variadically in which case unbound args and keywords are assigned to formal parameters (implicit optional keyword arguments) 'args' and 'kw'
- actually mb 'args' is just the last required positional parameter, and 'kw' is an optional keyword argument

---

it's useful to have clojure-style variadic map:

"Returns a lazy sequence consisting of the result of applying f to the set of first items of each coll, followed by applying f to the set of second items in each coll, until any one of the colls is exhausted. Any remaining items in other colls are ignored. Function f should accept number-of-colls arguments."

(map + [1 2 3] [4 5 6]) (5 7 9)

user=> (apply map vector [[:a :b :c] [:d :e :f] [:g :h :i]])

([:a :d :g] [:b :e :h] [:c :f :i])

and

(defn numbered-lines [lines] (map vector (iterate inc 0) lines))

but you also want to do simple maps without enclosing them in []s:

(map inc [1 2 3 4 5]) (2 3 4 5 6)

if you don't have variadicity, and the last positional argument of 'map' is 'args', then you'd have to do:

(map inc [[1_2_3_4_5?]])

and if this could be transformed into a variadic form but it wasn't the default, you'd have to variadicize it explicitly:

(map* inc [1 2 3 4 5])

i like clojure's super lightweight lambda syntax: #(blah %)

could have precendence for custom operators, but defined by their ASCII or something (to match arith, i guess)

later: i think Scala did this; but i think ppl still get confused by precedence in scala, which has 10 levels: http://stackoverflow.com/questions/7618923/actual-precedence-for-infix-operators-in-scala

---

clojure has a weird notation with slashes i guess for accessing member of Java classes:

https://speakerd.s3.amazonaws.com/presentations/2471a370b3610130440476a0f7eede16/2013-05-17-ClojureOOP-Geecon.pdf