ideas-computer-jasper-jasperWhitespaceNotes

whitespace principals:

examples:

f g q
      f g q
      r s t
  is different from
  ||| f g q
      f g q
      r s t

rationale:

WHITESPACE-DISCUSS


---

above, i said that newlime carries an implicit EOL, as if ':' were placed at the end of each line.

But then what about multiline statements?

possibilities:

Javascript has rather implicit semicolons but has complex rules for semicolon insertion: http://bclary.com/2004/11/07/#a-7.9

Mb a char to switch modes so newline is not semicolon until next semicolon?

---

my fav so far for newlines is that \n -> \n; except when within (( ))

actually: \n -> \n; except when the line starts with (, except if the rest of the line is whitespace -- doesn't change the meaning of (

so e.g.

(f x = 3 g x = x + 4 )

is a syntax error, because it's a multiline statement

( f x = 3 g x = x + 4 )

is fine, because it's not multiline (it's equiv to f x = 3 : g x = x + 4)

( f x = x + 4 )

is the same as (f x = x + 4)

mb generalize this to any environment; in [] : is ignored (except for interrupting grouping), so that's fine, and now we can have e.g. matrix constructors that make use of it.

in that case we want to make the rule a little more complex:

\n -> \n; except when the line starts with an environment opener and the environment hasn't ended by the end of the line, except if the rest of the line is whitespace -- doesn't change the meaning of (

so, within each single line that starts with any environment opener, each environment must be parsed, in order to determine if it ends on that line; if not, and if there is something other than whitespace, then it's a multiline; otherwise, it's add implicit colons

that makes the parser a bit more complex; but then, so do literal strings (which may contain the comment delimiter within them without opening a comment).

ok, even simpler:

you are either in a '\n -> \n;' environment or a multiline environment. You are in a '\n -> \n;' environment by default. You enter a multiline environment when you see a line that opens more environments than it closes, not counting an unbroken line of openers at the end of the line. You exit the multiline environment when the first environment on that line that was unclosed at the end of that line closes. The multiline processing stage happens after tokenization. Might want to check for unbalanced parens before doing this. The multiline processing stage goes thru each line and counts the depth of environments on that line (e.g. incrementing upon open, decrementing upon close), but storing increments temporarily and not actually incrementing until a non-opener is encountered (because you want to skip a string of unbroken openers at the end of the line), remembering the positions of the openers. Decrements below 0 are ignored. If EOL is reached with the permanent count = 0, zero the temporary count, then convert the EOL to ':'. If a line ends with the permanent count greater than zero, then switch to multiline mode, in which EOLs are simply removed. Now count openings and closings absolutely. When the count reaches zero, end multiline mode and reenter normal mode, starting at that position. in fact, may as well initially parse via parens, with BOL and EOL symbols which contain the line number they are on, keeping the tree in order. Then, traverse in order (ie depth-first), and when you hit a BOL, if the EOL is not a sister, then turn on multiline (e.g. just throw out all BOL and EOLs underneath), otherwise (if the EOL is a sister), convert the EOL to a ';'.

---

if a==b ( if c == d print 'a == b and c == d' ( if j==k print 'a == b and c != d and j == k' print 'a == b and c != d and j != k' )) print 'a != b'

if a==b (if c == d : print 'a == b and c == d' : (if j==k : print 'a == b and c != d and j == k' : print 'a == b and c != d and j != k' : )) print 'a != b'

if a==b ((if c == d) (print 'a == b and c == d') ((if j==k) (print 'a == b and c != d and j == k') (print 'a == b and c != d and j != k') )) (print 'a != b')

the confusing part is: if you want to put one thing on multiple lines, but still have autocolons in those lines, you must put the opening parens at the end of the previous line, NOT at the beginning of the line of the construct; e.g.

if a==b ( if c == d print 'a == b and c == d' ( if j==k print 'a == b and c != d and j == k' print 'a == b and c != d and j != k' )) print 'a != b'

is right but

if a==b (if c == d print 'a == b and c == d' ( if j==k print 'a == b and c != d and j == k' print 'a == b and c != d and j != k' )) print 'a != b'

is wrong

we would like to be able to do something more like:

if a==b (if c == d print 'a == b and c == d' (if j==k print 'a == b and c != d and j == k' print 'a == b and c != d and j != k' )) print 'a != b'

indeed, if we didn't allow multilines at all, this would work fine, due to left associativity

in this language, f x y can indeed be written f x y

problems arise if you want to use = or / though:

f x = y -> (f x = y)

is not the same as

(f x =) (y)

and

h / g / f / x -> h (g ( f ( x)))

is not the same as

h / g / f / x

(h / g /) (f / x)

an easy fix is to say:

\n -> :\n except when the last non-whitespace symbol on the line is =, /, or an environment opener

(so, we forget about the 'or if more parens were opened than closed')

but then we add: if more parens were opened than closed between two colons, the left parens is added only after the excess opening parens are skipped

so:

if a==b (if c == d print 'a == b and c == d' (if j==k print 'a == b and c != d and j == k' print 'a == b and c != d and j != k' )) print 'a != b'

if a==b : (if c == d : print 'a == b and c == d' : (if j==k : print 'a == b and c != d and j == k' : print 'a == b and c != d and j != k' : )) : print 'a != b' :

(if a==b) ((if c == d) (print 'a == b and c == d') ((if j==k) (print 'a == b and c != d and j == k) (print 'a == b and c != d and j != k') )) (print 'a != b')

which still seems right

---

a 'block' is a piece of text that begins and ends with at least two newlines with nothing in between them except maybe whitespace, eg.

this is the first line of block 1 blah blah this is the last line of block 1

this is the first line of block 2 blah blah this is the last line of block 2


ooo, idea: often in Lisp you see a bunch of closing parens all at once at the end of blocks. e.g. from http://en.wikipedia.org/wiki/Lisp_%28programming_language%29#Examples:

 (defun factorial (n)
   (if (<= n 1)
       1
       (* n (factorial (- n 1)))))

Perhaps have two kinds of parens:

e.g. either of the following compile to the same factorial fn defn as above:

 (defun factorial (n)
   (if (<= n 1)
       1
       (* n (factorial (- n 1
 (defun factorial (n)
   {if (<= n 1)
       1
       (* n (factorial (- n 1
   }

could also specify that blocks are automatically surrounded by parens, except for blocks without a newline between their beginning and their enclosing bigblock, or those without a newline between their end and their enclosing bigblock. Now you can do:

 (defun factorial (n)
   {if (<= n 1)
       1

(the parents at the beginning are still needed so that our auto-colon-adding doesnt add a colon at the end of the first line)

note that in this case the auto-line-delimiting doesn't help us. However, we could also say that, within a big block, colons and semicolons only group within the big block. hmmm... maybe also make the rule about things inside parens not being autocoloned only for big blocks, and only up to the scope of the big block... and say that autocoloning does happen inside parens that begins with a line with a beginning parens (or curly), but just not for that line. in other words we dont need curly braces for this example anymore:

(hmm, just call a bigblock a curly-delimited block, and call the other thing a whitespace-delimited block; they are both blocks, identical syntactically)

Now we can do:

 (defun factorial (n)
   (if (<= n 1)
       1
       (* n (factorial (- n 1
   

the transformation looks like this:

 (defun factorial (n)
   (if (<= n 1)
       1
       (* n (factorial (- n 1
   --> (transform whitespace-delimited blocks into curly-delimited)
 {(defun factorial (n)
   (if (<= n 1)
       1
       (* n (factorial (- n 1
 }

--> (autocoloning)

 {(defun factorial (n)
   (if (<= n 1)
       1 : 
       (* n (factorial (- n 1
 }

--> (colons surround the line it is on with parens)

 {(defun factorial (n)
   (if (<= n 1) 
    (1)
    (* n (factorial (- n 1
  }
 --> (block creates as many closing parens as needed
 (defun factorial (n)
   (if (<= n 1) 
    (1)
    (* n (factorial (- n 1)))))
 
 hmm... we still needed the parens before the *, i see. how to get rid of that? mb say that the last line of a bigblock is autocoloned, regardless of if it leaves parens open. So we'd have
 (defun factorial (n)
   (if (<= n 1)
       1
       * n (factorial (- n 1
   -->
 {(defun factorial (n)
   (if (<= n 1)
       1 : 
       * n (factorial (- n 1 :
 }

-->

 (defun factorial (n)
   (if (<= n 1)
       (1)
       (* n (factorial (- n 1)))))

note the implications of closing all parens at the end of each block: if you want to leave any parens open over blocks, you must enclose those lines in curlies, and if you want the contents of all this to be fed to a function, the curlies must start at the beginning of the line containing that function, e.g.:

 (defun factorial (n)
   {if (<= n 1)
       1

this way, not so many parens are closed after the 1 so as to close off the 'if'.

note: now that colons only autoclose parens up to the enclosing grouping level, we can remove the special case about a parens at the end of a line not counting: now there is no concept of 'multiline mode' to remember; every line is coloned unless it opens unclosed parens (and is not the last line of a block).

---

blocks are used to delimit some other things too:


the more i look at the "Defining some functions" section of http://chrisdone.com/z/ , the more i like it. In Jasper the significant indentation would be replaced by opening parens, but this is still less parens than Lisp:

Z:

defun map f xs if unit? xs unit cons f car xs map f cdr xs

(is defun parsed differently?!?)

Jasper:

defun (map f xs if (unit? xs unit cons (f car xs map f cdr xs

(no closing parens are needed at the end b/c the block end closes them)

maybe the default of Jasper should be right-associative, not left. Could use commas if you want to put multiple args on one line sometimes:

mb if there is a comma then the leftmost thing in the enclosing grouping is taken to be the fn? e.g. "map f, cdr xs" is like map(f, cdr(xs)) in other languages

defun (map f xs if (unit? xs unit cons (f car xs map f, cdr xs

defun (map f xs if (unit? xs unit cons f car xs,, map f, cdr xs

i guess left vs. right associativity in languages without parens/comma fn application comes down to if nesting or if multiple arguments is more common.

---

hmm, how do we disambiguate 0-ary functions from their values? how does Haskell do this? or should we identify them?

   apparently Haskell identifies them: http://stackoverflow.com/questions/5655133/languages-with-immutable-variables-by-default-like-haskell

---

i guess there is one case in which this sort of significant whitespace still makes manual editing a pain: if you take a whitespace-delimited block and copy-and-paste it into the middle of another block, you must prepend and append curly braces to the block that you are pasting, to avoid breaking up the block you are pasting it into

---

note that beginning-of-file and end-of-file must be treated as EMPTYLINES for the purpose of block delimiting