ideas-computer-jasper-jasper

Difference between revision 78 and current revision

No diff available.

Project status

I am just writing down ideas. I have many pages of notes on things i may or may not want to put into Jasper.

At this point, the design is unfinished; there is no coherent concept of a language called 'Jasper', just a bunch of ideas towards such a proposal. There is no timeframe to actually implement this, and it most likely will never get finished.

To be clear, i repeat myself: there is no such thing as Jasper. Jasper has neither been designed nor implemented. These notes about 'Jasper' contain many conflicting proposals.

This document is very out of date and no longer serves as a good introduction to the evolving language proposal.

You may want to look at ideas-computer-jasper-whyJasper.

There are a bunch of other files in this folder with more details and notes on various parts of Jasper: ideas-computer-jasper.

The name Jasper refers to my favorite type of rock, Picture Jasper.

the jasper programming language (under construction)

Lisp is a homeoiconic language built upon lists. Jasper is a homeoiconic language build upon labeled (hyper multi reified) graphs.

Why Jasper? -- built around a powerful data structure: labeled graphs. -- referential transparency "by default" -- interface-based (or "attribute-based") type discipline -- constraint satisfaction -- readable, concise syntax -- memory managed. static typing, type inference. lazy evaluation. higher order functions. -- goal: the referential transparency, higher order functions, and type system of Haskell, the readability and convenience of Python

see [1] for more detail.

audience: general purpose. programmers.

statically typed. memory managed. type inference. lazy evaluation. higher order functions.

support for the following paradigms: imperative, oop, functional (referential transparency), logic, macros

priorities: power, readability, conciseness

syntax that compiles to cleaner core

primary data structure(s): labeled (reifiable) graph. this also provides a syntax for lists, association tables, arrays, trees, graphs, relations, structs, objects with managed attributes.

some goals:

tasks that could drive development

some features:

anti features:

some other design decisions:

A single

Syntax

AnA?

General note on syntax

In some languages punctuation characters are also separators, so you aren't required to put a space before or after them, but in Jasper, many punctuation characters mean something different if they are attached to another token as opposed to being surrounded by whitespace. For example, in Jasper,

  x = 3

is not interchangable with

  x=3

The '=' in "x = 3" is called "freestanding". The '=' in "x=3" is called "attached".

Basic function calling syntax

Put the function arguments to the LEFT of the function, separated by spaces.

Example: if f is a function that takes arguments x and y: y x f

You can pass keyword arguments using keyword=value. The order of keyword arguments doesn't matter. All keyword arguments must go to the right of all positional arguments.

Example:

  x = 3 [] lb="apple" ins
  x == ["apple"=3]

G-constructor syntax

G-constructors are literals used to construct directed graphs (for technical notes on what we mean by directed graph, see [2]).

A graph in Jasper is something that you might draw by drawing a bunch of circles and then drawing some arrows between some of the circles, and then attaching notes to some of the arrows. To non-CS people, a graph might be called a "network". We call the circles "nodes", the arrows "arcs", and the notes on the arrows "edge labels".

G-constructors are constructs surrouded by [] delimiters (can be implicitly closed by indentation).

The outermost node in the constructor is called the "root node". The constructor can only construct one connected component, although other unconnected nodes can be added later by merging graphs.

Example: a graph with only the root node:

  []

Nodes can contain a list of objects, separated by spaces. These objects are considered to be other nodes such that directed edges go from this node to those. These edges are labeled by their position in the list (zero-indexed).

Example: a node that contains a list of the first two even numbers:

  [2 4]

This represents the following directed graph:

   root
    /\
 0 /  \ 1
  /    \
 2      4

If a variable X holds a node, then that node's edges can be traversed using the "." operator.

Example: after putting the above node into "x", x.1 == 4:

  x = [2 4]
  x.1 == 4    # evaluates to t (true)

Edges can have multiple labels. To assign an additional label to an edge, use "label = destination".

Example: same as above, but the edge to 2 is labeled "yellow" as well as 0:

  ["yellow"=2 4]

This represents the following directed graph (note: should we use '/' instead of '=' to make the directedness clearer?):

            root
             /\
 0,"yellow" /  \ 1
           /    \
          2      4

Any of the labels may be used in traversal:

  x = ["yellow"=2 4]
  x."yellow" == 2

An object pointed to by a node may itself be a node that points to other objects. However, primitive values, such as "2", cannot point to anything.

Example: A node that points to two nodes, one that points to strings describing some fruits, and one that points to strings describing some vegetables.

  ["fruits"=["apple" "pear"] "vegetables"=["carrot" "brocoli"]]

                 root
                  /\
      0,"fruits" /  \ 1,"vegetables"
                /    \
               /      \
              /        \
             /          \
            /            \
           /              \
          /\              /\
         /  \            /  \
        /    \          /    \
  "apple"  "pear"  "carrot"  "brocoli"

todo num labels

"." is left-associative and may be used to traverse: x = ["fruits"=["apple" "pear"] "vegetables"=["carrot" "brocoli"]] x."fruits".1 == "pear"

 <code>[name | value1, value2, etc]</code> is notation for a node with a node id (global label) name. It is an error (compile-time if possible, but runtime in general) to assign the same name to two different nodes. 

Example:

 ["x" | fruits=["apple" "pear"] vegetables=["y" | "carrot" "brocoli"]]


                 root, "x"
                  /\
      0,"fruits" /  \ 1,"vegetables"
                /    \
               /      \
              /        \
             /          \
            /            \
           /              \
          *               "y"
          /\              /\
         /  \            /  \
        /    \          /    \
  "apple"  "pear"  "carrot"  "brocoli"

(the * represents the unnamed node in the example)

Labeled nodes may be pointed to using the notation x..name, where x is a variable that refers to the G-constructor.

  x = ["x" | fruits=["apple" "pear"] vegetables=["y" | "carrot" "brocoli"]]
  x.."y".0 == "carrot"

Within the constructor, ".." may be used with no variable to its left. Using node labels, cycles may be created:

 Example: ["s" | .."s"]
 "s"
 / \
 \_/
 

By referencing parts of other G-constructors, one can create nodes with multiple parents without cycles

Example:

  x = ["a" | ["b" | 2], 3]
  y = ["c" | x.."b"]

Now y looks like:

 root,"c"  "a"
        \  / \
        "b"   3
         |
         2

Note: the graph in "x" is unchanged! the values of x and y have not been "linked" together in any way, it is as if a copy was made of some of the contents of x and put into y. If you don't want time and memory to be spent making a copy, and you don't have any more use for the graph in x, then don't use the variable "x" afterwards and the compiler will probably notice and optimize under the hood by just using the part of memory that it used to call "x" for y and adding node "c" in-place. If you want to make sure, use the variable "x" throughout, "x = ["a" | ["b" | 2], 3]; x = ["c" | x.."b"]", which guarantees that the compiler will mutate x in-place rather than copying. In any case, the compiler may choose to defer the copy unless and until it wants to do it. Unless x is a mutuable variable, the compiler will probably simply internally refer to x when "b" is called for.

Note: only the part of the graph that is reachable by a directed path from x.."b" is included in y.

Note: the node labels are only available from variables bound to the G-constructor in which they were created:

Example:

  x = ["a" | ["b" | 2], 3]
  y = ["c" | x.."b"]

  x.."b"    # succeeds
  y.."c"    # succeeds 
  x.."c"    # error
  y.."b"    # error

The reason it works this way is that a G-constructor actually returns not the root node, but rather an "index node object", which is a node that contains an assoc table from node labels to nodes. one of the nodes is labeled "root". for convenience, "." is actually a shortcut for "..root." ".." is used to directly access the index node.

To create a variable with all the labels, use the midx ("merge index") command, which takes a list and returns a list of the same length. Each item in the return list contains the same node as the corresponding item in the input list; but all items in the return list can access any of the node names anywhere in the input list:

  [x y] = ([x y] midx)
  x.."c"    # succeeds

Again, however, it should be noted that the values of the nodes are not "linked".

It is an error if there is a name clash between the labels of any of the nodes being merged (except for the label "root"). This can be avoided when names are hardcoded by using symbols for names (todo). When labels are not symbols, use of midx generates a compiler warning. When labels are dynamically computed at runtime, errors can be avoided by using the midxrn function which renames clashing names and returns a table listing the new names (todo), or by using midxic ("merge index ignore clash"), which resolves name clashes differently for each variable in the list by preferring that index's names over incoming names, and then preferring name bindings from the beginning of the list over names from the end, or by using midxo ("merge index overwrite"), which resolves name clashes uniformally for all variables in the list (i.e. all variables end up with copies of the same index) by preferring name bindings from the beginning of the list.

As you might guess from the compiler warning, my recommendation is that you avoid midx except for when you are using hardcoded labels (preferably symbols). Usually midxrn is the right choice otherwise.

Within a graph, the reserved word "this" always refers to the current node.

Example: a node with a self-loop may also be created as follows: [this]

Within a G-constructor, the reserved word "root" refers to the root node. Example:

[1, [2, root]]


    root _
     /\   |
    /  \  |
   1   /\_|
      2  

Using the "unshadow" token operator, "^", on "this", the (lexical) parent node may be referred to. Repeating the unshadow operator allows reference to the (lexical) grandparent.

Example:

 [1, [2, ^this]]
    root _
     /\   |
    /  \  |
   1   /\_|
      2  
 [1, [2, ^this.0]]  
    root 
     /\   
    /  \  
   1   /\
      2  1
  1. note; the ^this.0 is evaluated at the time of graph construction; if one of the 1s is changed later, the other wont be affected
 [1, [2, [3, ^^this]]]
    root ___
     /\     |
    /  \    |
   1   /\   |
      2 /\  |
       3  \_|

Not all edges of a node have to have integer labels (in fact, none of them do). To separate those edges with integer labels from those without, use the "--" symbol. Edges with integer labels go on the left. Edges on the right must have an explicit label. The integer labels are no different from other labels; the -- syntax just tells the constructor not to automatically add them.

  [1 -- "yellow"=2]
            root
             /\
          0 /  \ "yellow"
           /    \
          1      2
  [-- "yellow"=2]
            root
             /
   "yellow" /  
           /  
          2   

many different edges from the same node may all point to the same destination object:

  [1, ["red"=root, "yellow"=root]]
        <-------------|
    root<--           |
     /\   |           |
    /  \  |2, "yellow"|
   1   /\_|           |
      |               |
      |_______________|  
           1, "red"

(btw the only reason i'm not drawing arrows on most edges here is that it's hard to do -- all edges are directed)

An edge may have multiple edges with the same label, but in this case . will only follow the "first" one (first is the one with the lowest integer label; if all the edges with the given label are non-integer, todo)

To add an edge programmatically, use "ins":

x = [1 2]
x = 3 x ins
x == [1 2 3]
x = x "red"="apple" ins
x == [1 2 3 "red"="apple"]
# alternative
x = [1 2 3]
x = "apple" x lb="red" ins
x == [1 2 3 "red"="apple"]
# use i=n keyword arg to ins to not assign the next integer to new label
x = "banana" x lb="yellow" i=n ins
x == [1 2 3 "red"="apple" -- "yellow"="banana"]
# or, alternatively, if x.__ordered = f, ins will not assign an integer:
x = [1 2 3]
x.__ordered = f
x = x "yellow"="banana" ins
x == [1 2 3 -- "yellow"="banana"]

To get or set a node's label use its "lb" edge:

 x = [1 [2]]
 x.lb = "apple"
 x == ["apple" | 1 [2]]

To operate on nodes other than root, use assignment syntax:

 x = [1 [2]]
 x.1.___lb = "apple"
 x == [1 ["apple" | 2]]

To access a node which represents an edge (this is called "reifying" the edge), use the '^.' operator in place of '.'. The reified edge has labels "src", "dst", "lb". Note that the contents of lb is a list:

 x = [1 "yellow"=2]
 x^.1.dst == 2
 x^."yellow".lb == [1, "yellow"]

You can change an edge's labels or even its source or destination:

 x = [1 "yellow"=2]
 x^."yellow".lb = [3 "red"]
 x == [1 -- [3,"red"]=2]

To insert the contents of one list into another, you can use the @ operator:

  x = [2 3]
  y = [1 @x 4]
  y == [1 2 3 4]

example: a single node with value "10": ex = [10] ex.n == 10

example: a single node that points to itself:

ex = ['s | s]
ex = [this]

todotodo

example: a list

ex = ["apple" "banana" "cherry"]
ex = ["apple" "banana" "cherry"]
ex.0 == "apple"
ex.1:2 = ["banana" "cherry"]
ex = $[apple banana cherry]
ex = "apple", "banana", "cherry"
ex2 = ["grapefruit", @ex]
ex2 = ["grapefruit","apple","banana","cherry"]
fruit2 = "banana"; ex = $[apple `fruit2 cherry]
fruit23 = $[banana cherry]; ex = $[apple `@fruit23]

example: an association table
ex = [
         apple = red
	 banana = yellow
	 cherry = red
ex = assoc $[[apple red] [banana yellow] [cherry red]]
ex = assoc $[
	apple red
	banana yellow
	cherry red
ex = assoc [
	"apple" "red"
	"banana" "yellow"
	"cherry" "red"

todotodo

idea: when multiple edges have the same name, allow to get the set, but also "any" and "all" operators. "any" picks an arbitrary edge out of the set (like, the first one), and "all" does something to all of them (parallel programming). actually, better: the std get is short for getSet composed with any (or should it be all? these are the same in the degenerate case of unique names). use the same mechanism to deal with multiple nodes with the same name. so, when a node is linked to a named node, it can be linked to "any" node with that name (creating one edge), or to "all" of them (creating possibly more than one edge). now you don't have the ugly problem of namespace conflicts when you merge graphs.


Function definition

Just use an =s sign to write an equation. The function on the left will be defined in the current namespace.

To make f work like "+" (although f won't be infix):

 y x f = x y +

Partial function application

If function f takes 2 arguments, and you want to call it with its first argument x and its second one y, then

  result = y x f

is the way to do that. However, you can also do

  g = x f

In this case, the value of "g" is a (partially applied) function, namely, the function f but with its first argument fixed to y. That leaves one argument to be specified, so g is a function that takes one argument. You can call g just like any function. The result of g is defined by

  y g == y x f

Jasper's expressions are right-associative, for example,

  y x f == y (x f)

So, in Jasper, passing multiple arguments to a function is equivalent to passing it just one argument and creating a partial function, then passing that function a second argument to make another function, etc.

Code inside G-constructors

Children of G-constructors are whitespace-delimited. That means that, in order to put code inside of a G-constructor, it must contain no spaces. So

 [y x f]

isn't interpreted as y applied to the result of x applied to f, but rather as three separate children, y, then x, then f. To get y applied to the result of x applied to f, you'd do:

 [(y)(x)(f)]

or

 [y,x,f]

Another example:

 [y,x,f b,a,g] has two children, y applied to the result of x applied to f, and b applied to the result of a applied to g.

Infix operators

An "operator" is any non-reserved symbol which is not alphanumeric. (todo)

Operators are used just like functions, but operators which are at least binary may be used as a tight-binding binary infix operator by attaching it to its left and right operands, when those operands are themselves alphanumeric or surrounded by parens (todo). This has the effect of putting parens around the attached tokens, putting a space in between them, and moving the operator to the right. For example,

 a*b c +
 ->
 (a b *) c +
 (a d +)*(b) c +
 ->
 ((a d +) b *) c +

Operators may be attached on one side, which has the effect of making the attached chain the result of partially applying the operator to the attached operand:

 ((a d +) *b) c +
 ->
 ((a d +) (b *)) c +

In addition, there are "loose binding infix operators". The predefined loose binding infix ops include '==', '!=', '||', "&&', '=~', '>', '<', '>=', '<=', '<==>' ("spaceship operator"), 'in' (todo: find others to include). When freestanding, a loose-binding infix operator has the effect of putting parentheses around everything to its left (within its containing subexpression) and of putting parentheses around everything to its left (within its containing subexpression). Example:

  2 1 + <= (2 5 +) 5 -
  -> 
  (2 1 +) <= ((2 5 +) 5 -)

Loose-infix operators may not be attached on just one side. To partially apply a loose infix to one operand, and present the result as an ordinary function which is waiting for the other side, use parens, e.g.:

 3 (5 <=) == (5 <= 3)

To use a loose-infix like an ordinary function, put parens around it alone, e.g.:

 (5 3 (<=)) == (3 <= 5)

(note: this is like what haskell calls "sections") (todo: reverse the ordering there?)

Surrounding a non-operator with parens and attaching the result to something has the effect of putting parens around the whole thing and putting spaces in between the components:

 (y)(x)(f) == (y x f)

(these spaces do not make this into separate items in a list)

Users can define new loose infix operators but they must be named by symbols starting or ending with of at least one of '=|&~<>' and which don't contain any characters besides those and '!-'. User-defined non-loose-infix operators may not start or end with ''=|&~<>'. This makes it easy for readers to classify unfamiliar operators as loose infix or not by sight. No special declaration is needed to make an operator loose infix; these rules are sufficient to determine that.

If there are multiple distinct loose infix operators in one expression, or multiple distinct tight infix operators in one attached chain, they must be disambiguated with another grouping construct; an expression like "a || b == t" is illegal, as is "a*b+c".

If there are multiple instances of the same loose infix operators in one expression, or multiple instances of the same tight infix ops in one attached chain, then by default they must be disambiguated with another grouping construct, but infix operator may be declared associative, or they may be declared to have an associative combination operator by which their results are combined. Examples:

 && infixassoc
 && <= infixcombine

the former means that (a && b) && c == a && (b && c), which renders an expression like a && b && c unambiguous. The latter means that an expression like a <= b <= c is translated to (a <= b) && (b <= c). With infixcombine, the combination operator (the second, or leftmost argument to infixcombine) must itself be declared infixassoc.

Note: subeq and sub (subset containment) are expressed by <= and <.

Note: if you know other languages with various different "precedence levels" and associativity rules for operators, Jasper can be roughly characterized as having:

^~<>. A loose infix op may no attached args or 2, but not 1.

Subtraction

Negative number literals are written with a '-' attached on the left. A freestanding '-' denotes the subtraction function, which is a normal postfix function.

Fancy grouping syntax

Parentheses group in the usual way.

Without any grouping, expressions are right associative. So

  z y x f = z (y (x f))
  

(actually, mb z y x f should be like z(y,x,f) in other langs; one-way auto-currying, rather than using Haskell's convention of never having tuple arguments)

(Freestanding) '//' is a grouping construct that has the effect of putting parentheses around everything to its left (until it hits an infix operator or the boundary of the containing subexpression). If there are multiple ones on a line, the one on the left "acts first". It is the equivalent of the Haskell's '$'. So, for example,

   x (y g // f) h == x ((y g) f) h

',' is a grouping construct that has the effect of putting parentheses around the clause to its left, and also putting parentheses the clause to its right, where a clause is everything until you hit the beginning or the end of the containing subexpression, or an infix operator, or one of ',', '', skipping over subexpressions.

(after ditching right-associativity, isn't and , the same then? not sure)

','s may be attached or freestanding.

For example,

 (2 1 f  , 4 3 g  , h)   (2 1 f  , 4 3 g  , j)  a // 5 b // 6 c
 ==
 ((2 1 f ,  4 3 g  , h)   (2 1 f  , 4 3 g  , j)  a) 5 b // 6 c
 ==
 (((2 1 f ,  4 3 g  , h)   (2 1 f  , 4 3 g  , j)  a) 5 b) 6 c
 ==
 ((((2 1 f)   (4 3 g) (h))   ((2 1 f) (4 3 g) (j))  a) 5 b) 6 c
 ==
 ((((2 (1 f))   ((4 (3 g)) h))   (((2 (1 f)) ((4 (3 g)) j))  a)) (5 b)) (6 c)

(Attached) '.' is a weird grouping construct and a special infix operator. It "binds tightly", like an infix operator, and which has the effect of putting a '(' to the left of the subexpression or function immediately to its left and a ')' to the right of the subexpression or function immediately to its right. '.'s are left-associative with other '.'s. It must be attached to both its left and right operands. What it does is to evaluate the thing on its right, and then feed it to the thing on its left

So,

 3 + x.y.z.w + 2 == 3 + (w (z (y x))) + 2

So it reverses the things embedded in the sequence. In addition, '.' has a special meaning when it is found on the left hand side ("lhs") of a freestanding '='. We'll see later that '.' is useful for things that work like attribute access (struct field access), or things that work like indexing into a list, or things that work like looking up a value from an association table based on a key, or things that work like tree or graph traversal.

Variadic functions start with *

todo: use a non-chorded prefix like - instead of *?

If a function's name starts with *, for example "*func", then it is a variadic function, meaning that it takes an arbitrary number of arguments. An expressions whose head is a variadic function, and where the rest of the arguments do not contain an uncontained loose infix operator, is parsed differently; it is not right-associative; instead, each item is passed as a separate argument to the function. If the rest of the arguments on the line do contain an uncontained loose infix operator, then it is as if everything but the *func on the line is surrounded by parens, and constitutes a single argument to the *func. Subexpressions within parens (or on subsequent, indented lines, which amount to the same thing) are parsed normally -- but subsequent, indented lines each constitute a separate argument to the *fn. This all is easier to understand by examples:

 a b c d *blah
   -> (a) (b) (c) (d) *blah
   -> (a) (b) (c) (d) *blah
 c d *blah
  b
  a
   -> (a) (b) (c) (d) *blah
 a <= b c d *blah
   -> (a <= (b (c d))) *blah
   -> (a <= (b (c d))) *blah
 e (a <= b c d) f *blah
  -> (e) (a <= (b (c d))) (f) *blah
  -> (e) (a <= (b (c d))) (f) *blah
 f *blah
  a <= b c d
  e
  -> (e) (a <= (b (c d))) (f) *blah
 e f (a <= b c d) *blah
  -> (e) (f) (a <= (b (c d))) *blah
 a <= b c d *blah
  f
  e
  -> (e) (f) (a <= (b (c d))) *blah

Some variadic functions that you should know:

The variadic forms of conditionals. Examples: a <= b c d *if t e

is like "if a <= b c d then t else e" in some other languages, and equivalent to "e t (a <= b c d) if" in jasper.

   a <= b c d *if
     t
     e

optional convention: the "then" is indented extra, to make it easier to distinguish from the "else", so you should write the above like:

   a <= b c d *if
       t
     e
 defaultFoo *cd
  a < b
   foo1
  a == b
   foo2
  a > b
   foo3

->

 [[a<b foo1] [a==b foo2] [a>b foo3]] defaultFoo cd

(remember, cd is like "cond" in lisp)

optional convention: the actions associated with each condition in "cond" are indented an extra step

 defaultFoo x *sw
  'a'
    foo1
  'b'
    foo2
  ->

[['a' foo1] ['b' foo2]] defaultFoo x sw

(sw is "switch" -- it's like a cond where each conditional is an == comparison of the switch expression (x) to the relevant item ('a', 'b'))

also, many basic operators, such as arithmatic operators, have a variadic sister form in the standard library that does a fold over the arguments with that operator.

1 2 3 4 *+ == 1+2+3+4 1 2 3 4 == 1*2*3*4

if you have a graph node, and you want to pass its edges into a function as arguments, use @:

l = [1 2 3 4] @l == 1*2*3*4

the region of arguments to which a variadic function applies goes to the left until up until '', ',', or the edge of the containing subexpression.

Exceptions: the following predefined functions are not variadic: * (fold) (multiplication)

How to read Jasper expressions

If there is just one symbol then you're done, if there's just one subexpression then recurse :)

First, if there is an infix operator, then it divides everything to its left from everything to its right. todo: update

For the most part, the computation proceeds from the left to the right.

Next, note each ')' or '//'. The things immediately to the left of these, and also the last thing on the right side of the line, are the functions that will be acting on arguments, the "active functions". These things may be symbols denoting functions, or thery may be subexpressions which evaluate to functions.

Look at the rightmost active function on the line -- this is the last function that will be applied, taking as arguments all of the other things to its left, up to any //s (if there is a //, then everything to its left is the last argument of the current active function). If there is a //, then look to its left to find the active expression, and look over it. Repeat. Now you have a general, top-down sense of what functions are demanding the various arguments in the expression.

Now start at the left and read left-to-right, to understand how each subexpression is calculated. This is a bottom-up viewpoint, in the sense that you will start out by seeing how each subexpression is calculated, and then after progressing through each set of arguments, you will see the active function that eats those arguments.

When there are ,s, they separate two subexpressions which are each arguments of the same active function.

When there are //s, the things to the left of the // are computed first, and then they are fed to the active function controlling the // as its final argument. So //s represent a pipeline (or composition of functions) which can be read from left to right.

Sequences of things separated by '.'s are, like '//'s, pipelines where the thing on the left is calculated first, and then the result is fed to the thing on the right. The difference is that the "thing on the left" is as narrow as possible, rather than as wide as possible as with '//'. Technically, '.' is an active function, but TODO prob here

The intent of '.' is for it to be used for things where it is easier to think about what is going on by reading the things in between the '.'s from left to right. For example, imagine that lookup in a 2-D array is done like this: pass the row number to a row lookup function, and then, representing that row, you get back a "column lookup function" for that row. Now pass the column number to the column lookup function, and it returns the value at that location. So, to lookup row 5 column 2, you do "2 (5 row_lookup)"; "(5 row_lookup)" evaluated to the column lookup function for row 5, and you pass 2 to that, and get back the value stored at row 5, col 2. You could write this "row_lookup.5.2" which is shorter and seems to be easier to read.

Example:

 (2 1 f  , 4 3 g  , h)   (1+1 1 f  , 4 x.y.z g  , j)  a // 5 b // 6 c

Here we have 3-stage pipeline with active functions a,b,c. First "(2 1 f , 4 3 g , h) (1+1 1 f , 4 x.y.z g , j) a" will be computed, then its result will be used as the last argument of b (with 5 being the other argument to b), and finally the result of that will be used as the last argument of c.

a is taking 2 arguments, each subexpressions. The active function in the subexpression which computes a's first argument is j and the active active function in the subexpression for a's last argument is h.

h itself takes two arguments, given by (4 3 g) and (2 1 f). j also takes two arguments. j's first argument, "4 x.y.z g", is a subexpression which has an active function g which takes two arguments, x.y.z and 4. "x.y.z" means "z (y x)". j's last argument is "(1+1) 1 f".

Why?

Jasper is "backwards" from most programming languages so that the main flow of the computation within an expression goes left-to-right rather than right to left. This means that to get a top-down sense of what is going on, you must read Jasper from right to left, and to get a bottom-up sense, read it from left to right.

I find that for short or easy to understand expressions, I want to read it once top-down and then I understand. For long or hard to understand expressions, I find that I first glace through it to get a top-down overview, and then I slowly read through the expression bottom-up. It is easier to read from left-to-right as compared to right-to-left, because the spelling of each word is left-to-right. For a short expression, the difference doesn't really matter, because it's easy to move your eyes to either the leftmost or the rightmost end of the expression.

So, traditional programming languages are good for the top-down phase but bad for the bottom-up phase, whereas Jasper is good for the bottom-up phase but bad for the top-down phase. Although the top-down phase is more common, this is due to the prevalence of expressions that are short and easy to read in any case; it seems that the long, difficult expressions will be easier in Jasper.

These are just personal observations which have not been thought about much or checked extensively or objectively.

The // operator is inspired by Haskell's $ and is a concise, readable alternative to the common "pipeline" situation described above. The . operator mimics how object attribute access reads in other languages such as Python. The ',' acts much like it does in languages with a f(x,y) syntax; it allows you to separate subexpressions without typing a zillon parentheses (also, it is easier to read something with parens at a high nesting level with commas inside them then it is to read something using nested parens, i.e. parens on both nesting levels).

The approach to infix operators is motivated by a desire to allow users to define infix ops but to avoid having user-defined precedence levels which cannot be immediately inferred by sight, as this makes code hard to read. Although there are in effect three precedence levels, users can tell whether an operator is infix, and what its precedence is if so, by sight (are there attached args? does the operator start or end with the loose infix chars?). If there are multiple infix operators present on the same precedence level, then either their grouping doesn't matter, or it must be specified explicitly; so the reader doesn't have to apply associativity rules to determine how infix operators group. If an operator is not infix, then it is right-associative. So all the reader has to do is to identify operators with attached arguments, identify operators that start or end with the loose infix chars, and apply right-associativity to the rest.

In my view, there are three reasons to want tight-binding infix operators; (1) saving a few keystrokes ("a*b" vs "a b *"); (2) removing parens via associativity ("1*2*3*4" vs "((1 2 *) 3 *) 4 *"); (3) removing parens via tight binding ("a*b c +" vs "(a*b) c +"). #1 is trivial, #2 can be accompilished in a concise manner using the * (fold) token operator.

In my view, the main reason to want loose-binding infix operators is that it seems easier to read "blah blah blah

blah blah blah" than "(blah blah blah) (blah blah blah)", probably because the "" in the middle reminds you what the relationship between the arguments is as you pass from one to the other. Another reason is that symmetric comparison operators are things which compare two things, so it's visually nice to be able to have them symmetrically in between the two things being compared.

Keyword arguments

 exponent base g = base^exponent
 1 2 g # == 2
 base=2 exponent=1 g # == 2
  exponent=1 2 g # == 2
  exponent=1 base=2 g # == 2
  2 exponent=1 g # == 2
  1 base=2 g # ==2

As each keyword argument is encountered, if it matches a keyword slot in the function definition, it is applied to that slot. If it does not match any slot, and the function in question has a *more argument (see below), then the function can access that keyword argument using *more. Otherwise, if it does not match any slot, a compile-time error is emitted.

Note that if you use a keyword to assign to an argument, then that argument is "skipped over" when it's position comes up, requiring care in reading:

 toMultiplyBy toSubtract toStartWith h = (toStartWith toSubtract -)*toMultiplyBy
 
 3 2 1 h # == -3
 3 1 toSubtract=2 h # == -3; the same, because toSubtract was skipped
  1. that is to say, toMultiplyBy toSubtract toStartWith h == toMultiplyBy toStartWith (toSubtract=toSubtract h)

It is possible to give a keyword argument which is discarded if it does not match; to do this, use keyword?=value

e.g. exponent=1 2 unused?="hi" g # == 2 2 exponent=1 unused?="hi" g # == 2

It is possible to give a keyword argument which, if it does not match, is treated as a positional argument, as if there were no keyword; do do this, use keyword-=value

e.g. exponent=1 badKeyword-=2 g # == 2

If no value is specified for a keyword argument in a function call, then 't' is assumed to be the value. This saves you one character when sending optional arguments to functions which are really just binary switches that turn on some non-default behavior, e.g. word ignorecase= find

Usually, once you apply an argument to a function, you can't change it later with a keyword. If you want to do that, use the function applyArgsAsDefault:

 1 base=3 base=2 g # error; base assigned to twice
 1 base=2 base=2 g # error; base assigned to twice
 a = (base=2 g); 1 base=2 a # error; base assigned to twice
 a = (2 g); 1 base=2 a # error; base assigned to twice
 a = [2] g applyArgsAsDefault; 1 base=3 a # == 3
 a = base=2 g applyArgsAsDefault; 1 base=3 a # == 3
 1 2 blah?=4 g # == 2
 A "*more" argument must be the last (leftmost) in the function definition, and signals that the function does something with arbitrary keyword arguments. Within the function body, the unasked-for keywords and their values are edges and children of a node assigned to *more.
 "unasked-for keyword"="val" "hi" printFirstKeyword # hi\nunasked-for keyword

Default arguments

In function definitions, default arguments be the "first", or rightmost arguments. Arguments which have defaults ARE NOT positional, and can only be given by keywords.

 x=1 f = x + 3
 f == 4
 5 f # is an error; the x argument is not positional, only keyword
 x=5 f == 8
 exponent base=e g = base^exponent
 
 1 g == 2.718...
 1 base=2 g == 2

Neutralizing a function

If you have given a function all its required arguments (i.e. those without defaults), but rather than evaluating it, you want it to sit as a partially applied function waiting for possible default overrides, you can "neutralize" it with the "neu" higher order function. A neutralized function eats its arguments like normal, but then when all its required args have been supplied, it yields itself (with the arguments applied), rather than whatever value it is supposed to yield. Then to make it "go" later, us "unneu".

 exponent base=e g = base^exponent
 1 (g neu) #== 1 (g neu)
 (1 (g neu)) unneu # == 2.718...
 a = 1 (g neu); b = base=2 a; b unneu # == 2

sq

todo: NO, this stuff should be always, sq is for strict eval

sq (short for "sequence") labels a body of consecutive lines that will be written as if they will executed in sequence. For example,

x f = sq
  y = x + 3
  z = y * 2
  z + 1

This looks like an imperative sequence of steps but actually the compiler will translate it to an expression; the value of the last line will be returned as the value of the expression, but with variable binding in the previous line substituted in to that, and then the variable binding in the previous line substituted in to that, etc.

The effect is similar to this Haskell code (todo: check 4 mistakes):

f x = 
  let y = x + 3 in
    let z = y * 2 in
      z + 1

sq is similar to Haskell's "do" except that it doesn't use monads or produce a monadic type; it's the let-binding syntactic sugar of Haskell's do without the monads. sq also has some extra syntactic sugar, see below.

Convenience mutation

 x = 5
 x = x + 1
 x pr

->

 x = 5
 x2 = x + 1
 x2 pr

(so, this is basically saying that symbols act like normal "variables", even when statements are otherwise unordered)

In place convenience mutuation

If you want to use a variable within an expression, and then replace the value of that variable with the result of the expression, then attach a postfix '=' to the variable:

x= sort
->
x = x sort
-> (reducing using convenience mutation, as described above)
x2 = x sort
.. ('x's below replaced by 'x2's) ..
x = 0
x= ++1
->
x = x ++1
->
x2 = x ++1
.. ('x's below replaced by 'x2's) ..

In place convenience mutuation shortcuts for ++1 and --1

'x+++' is a shortcut for "x= ++1" and "x---" is a shortcut for "x= --1"

assignment to graph objects

'.' must be the operator on the top level of the l-value

 g = [3 4]
 g.0 = 1
 -> g = 1 0 g __set 
 -> ... (convenience mutation)
 g.0 == 1
 g = [3 4]
 g.1+++
 ->
 g = [3 4]
 g.1 = g.1 ++1
 -> ... (assignment to graph objects)
 g == [3 5]

end of sq section


Pronouns

"it"

When you call a function (or use an infix operator) inside an sq, the symbol "it" may (or may not) be rebound by that function. That is, the function gets to reach out of its lexical context and change the value of "it" in your lexical context.

For example, if you match a string against a regular expression using the =~ comparison operator, "it" is bound to a list of matches.

 "hi there" =~ r<<th..e>>
 it.0.txt == "there"

This is similar to how Perl binds $0, $1, $2, etc when you match a regex.

"it" may be rebound once each line (using convenience mutation).

 "hi there" =~ r<<th..e>>
 "hi dude" =~ r<<d..e>>
 it.0.txt == "dude"

If multiple functions or operators on a single line try to rebind "it", then the outermost one wins.

To convert any function that binds something to "it" into a function that returns an ordered table, where the label of the first element is x' and the label of the second element is it' and the first element is the value that the function returns, and the second element is the value that the function binds to "it", use the "giveit" higher order function. This allows you to get those "it" values when you are not in an sq block:

 g = "hi dude" r<<d..e>> ((=~) giveit)
 g.0 == t
 g.1.0.txt == "dude"

$$

todo: how can this work for fns to call both $$ and non-$$ unless keywords are also positional? mb use -- for keyword only?

$$, outside of a list, is replaced with:

  fn i=n s=n z=n y=n x=n :

i.e.

 f ($$ x g) 
 ->
 f (fn i=n s=n z=n y=n x=n : x g)

in order to make functions passed to higher order functions easy to read, functions that require functional arguments (call these 'subfunctions') are encouraged to call the subfunctions that are passed in with keywords. The following conventions obtain:

For example, foldr is defined like this:

foldr  { [b] a [a b a] = a}
xs0 z0 f foldr = xs0 z0 rgo
  [] s rgo     =  s
  (xs x ins) s rgo = xs (s=s x=x f) rgo

You can write a fold that finds if any item in a list is even like this:

  ($$ s || x 2 mod // !) foldr

(todo: check for mistakes) the 's' and the 'x' help the reader see which argument (s) is the boolean state which is being passed between invocations of the anonymous function, and which argument (x) is the list item.

List comprhensions

<<[1,2,3] | x>=2 >> == [2,3]

<<x*2 : [1,2,3]>> == [2,4,6]

<<x*2 : [1,2,3] | x>=2 >> == [4,6]

Uses x pronoun.

Graph comprehensions

<<< : list comprehension over leaves, any boundary
<flavor<< : list comprehension over leaves, flavor boundary
<!flavor<< : list comprehension over leaves, non-flavor boundary

<<<< : same but all nodes
...
...

<<<<< : same but interior nodes

todo: how to work in graph patterns?

Assertions

To make an assertion, put '!' at the end of the expression to be asserted:

 1 1 ++ == 2 !

another example:

 x printAge
  x > 0 !
  prr "i see that you are $$x years old!"

Depending on the debug mode, the assertion may or may not be evaluated. If the assertion fails, an AssertExc? (short for AssertException?