proj-oot-ootSyntaxNotes7

from Rust:

pub(crate) bar;

means that 'bar' is public, but only within the current crate.

" You can also specify a path, like this:

pub(in a::b::c) foo;

This means “usable within the hierarchy of a::b::c, but not elsewhere.” "

---

dnautics 129 days ago [-]

you should see Julia. There's a lot of syntactic sugar that borrows from the best of other languages, for example the pipe

> operator from elixir and do...end block syntax from ruby.

---

golang seems to use backticks for annotations?

" var d struct { Main struct { Kelvin float64 `json:"temp"` } `json:"main"` } " -- [1]

---

i had said we wanted LL(1) syntax, but then i decided LL(k) is good enough; but some ppl like LALR (eg [2]). LALR(1) (LALR without an index refers to LALR(1)) is a subset of LR(1), and LL(k) and LALR(j) are incomparable [3], although every LL(1) grammar is also LR(1) [4]. This book says "almost all LL(1) grammars excepting some contrived examples are also LALR(1);" [5]. And this one says "class of LALR(1) grammars is large enough to include grammars for most programming languages. (This does not mean that the reference grammars for most programming languages are LALR(1): they are often ambiguous.)" [6].

There's a slide in [7] labeled "Hierarchy of grammar classes" that shows some of this as a diagram.

So i guess we want our language to be both LL(k) and LALR(k); preferably LL(1) and LALR(1). [8] has some comments on how to detect this; notably, a sufficent but not necessary condition for an LL(1) language to be LALR(1) is "if all symbols with empty derivations have non-empty derivations".

this page says "there is no way to decide if a grammar can be converted to LL(1) or to LR(1) except by trying to do it - if you succeed, then it was". [www.cs.man.ac.uk/~pjj/complang/grammar.html]

these posts talk about how to convert LALR(1) grammars to LL(k):

here's a possibly related very technical post: http://etymon.blogspot.com/2006/09/why-i-prefer-lalr-parsers.html

and another: https://compilers.iecc.com/comparch/article/01-10-069

we might also consider SLR instead of LALR(1): " LALR(1) is used in most parser generators like Yacc/Bison

We will nevertheless only see SLR in details: ((i think they mean, in this slide presentation))

---

this thread gives various examples of type-dependency in C++ parsing: https://news.ycombinator.com/item?id=11148436

---

some languages have something called 'using static CLASSNAME' which is kinda like Python's 'import * from FILENAME', except that instead of importing top-level stuff from a module, any static methods within class CLASSNAME are all imported to the top-level namespace.

sounds to me like a great way of giving the benefits of top-level functions (eg for defining new arithmetic operators) while also doing the OOP way of having everything defined inside some class

---

"#region" for documentation, and IDE expansion/collapse

---

" upper-casing exported identifiers in Go packages "

---

anywhere you can have an expression or statement, you can have a block enclosed by {}s

---

" Now, I won’t claim that C has a great syntax. If we wanted something elegant, we’d probably mimic Pascal or Smalltalk. If we wanted to go full Scandinavian-furniture-minimalism, we’d do a Scheme. Those all have their virtues.

I’m surely biased, but I think Lox’s syntax is pretty clean. C’s most egregious grammar problems are around types. Dennis Ritchie had this idea called “declaration reflects use” where variable declarations mirror the operations you would have to perform on the variable to get to a value of the base type. Clever idea, but I don’t think it worked out great in practice.

Lox doesn’t have static types, so we avoid that.

What C-like syntax has instead is something you’ll find is often more valuable in a language: familiarity "

---

" Is a language built all out of parens simple... is it free of interleaving and braiding? And the answer is no; Common Lisp and Scheme are not simple is this sense, in their use of parens. Because the use of parentheses in those languages is overloaded; parens wrap calls, they wrap grouping, they wrap data structures; and that overloading is a form of complexity... We can fix that; we can just add another data structure, it doesn't make Lisp not Lisp to have more data structures. It's still a language defined in terms of its own data structures, but having more datastructures in play means that we can get rid of this overloading in this case... " -- Rich Hickey, https://www.infoq.com/presentations/Simple-Made-Easy 0:26:03. Note: the slides say more specifically how Clojure addresses this by adding another data structure; they say, "Adding a data structure for grouping, e.g. vectors, makes each simpler"

---

complaints about C:

" For starters, hiding identifiers after arbitrarily long type expressions, instead of starting a line/block/expression with a name, is an un-fixable PITA. (Sorry, AT&T, Algol had it right)

Requiring a “break” in a case statement is a botch, instead of some kind of “or” / “set” / “range” test. "

---

[10]

if Conditional

Extend if conditional with declaration, similar to for conditional.

if ( int x = f() ) ...

case Clause

Extend case clause with list and subrange.

switch ( i ) { case 1, 3, 5: ... list case 6~9: ... subrange: 6, 7, 8, 9 case 12~17, 21~26, 32~35: ... list of subranges }

switch Statement

Extend switch statement declarations and remove anomalies.

switch ( x ) { int i = 0; allow declarations only at start, local to switch body case 0: ... int j = 0; disallow, unsafe initialization case 1: { int k = 0; allow at lower nesting levels ... case 2: disallow, case in nested statements } ... }

choose Statement

Alternative switch statement with default break from a case clause.

choose ( i ) { case 1, 2, 3: ... fallthrough; explicit fall through case 5: ... implicit end of choose (switch break) case 7: ... break explicit end of choose (redundant) default: j = 3; }

Non-terminating and Labelled fallthrough

Allow fallthrough to be non-terminating in case clause or have target label providing common code. non-terminator labelled

choose ( ... ) { case 3: if ( ... ) { ... fallthru; goto case 4 } else { ... } implicit break case 4:

choose ( ... ) { case 3: ... fallthrough common; case 4: ... fallthrough common; common: below fallthrough at case level ... common code for cases 3 and 4 implicit break case 4:

choose ( ... ) { case 3: choose ( ... ) { case 4: for ( ... ) { ... fallthru common; multi-level transfer } ... } ... common: below fallthrough at case-clause level

Labelled continue / break

Extend break/continue with a target label to support static multi-level exit (like Java).

LS: switch ( ... ) { ...

					... break LS; ...          // terminate switch

Exception Handling

Exception handling provides dynamic name look-up and non-local transfer of control.

exception_t E {}; exception type void f(...) { ... throw E{}; ... termination ... throwResume E{}; ... resumption } try { f(...); } catch( E e ; boolean-predicate ) { termination handler recover and continue } catchResume( E e ; boolean-predicate ) { resumption handler repair and return } finally { always executed }

with Clause/Statement

Open an aggregate scope making its fields directly accessible (like Pascal).

struct S { int i, j; }; int mem( S & this ) with( this ) { with clause i = 1; this->i j = 2; this->j } int foo() { struct S1 { ... } s1; struct S2 { ... } s2; with( s1 ) { with statement access fields of s1 without qualification with( s2 ) { nesting access fields of s1 and s2 without qualification } } with( s1, s2 ) { scopes open in parallel access unambiguous fields of s1 and s2 without qualification } }

waitfor Statement

Dynamic selection of calls to mutex type.

void main() { waitfor( r1, c ) ...; waitfor( r1, c ) ...; or waitfor( r2, c ) ...; waitfor( r2, c ) { ... } or timeout( 1 ) ...; waitfor( r3, c1, c2 ) ...; or else ...; when( a > b ) waitfor( r4, c ) ...; or when ( c > d ) timeout( 2 ) ...; when ( c > 5 ) or else ...; when( a > b ) waitfor( r5, c1, c2 ) ...; or waitfor( r6, c1 ) ...; or else ...; when( a > b ) waitfor( r7, c ) ...; or waitfor( r8, c ) ...; or timeout( 2 ) ...; when( a > b ) waitfor( r8, c ) ...; or waitfor( r9, c1, c2 ) ...; or when ( c > d ) timeout( 1 ) ...; or else ...; }

Tuple

Formalized lists of elements, denoted by [ ], with parallel semantics.

int i; double x, y; int f( int, int, int ); f( 2, x, 3 + i ); technically ambiguous: argument list or comma expression? f( [ 2, x, 3 + i ] ); formalized (tuple) element list [ i, x, y ] = 3.5; i = 3.5, x = 3.5, y = 3.5 [ x, y ] = [ y, x ]; swap values [ int, double, double ] t; tuple variable

Alternative Declaration Syntax

Left-to-right declaration syntax, except bit fields. C∀: * int p; C: int * p;

References

Multi-level rebindable references, as an alternative to pointers, which significantly reduces syntactic noise.

int x = 1, y = 2, z = 3; int * p1 = &x, p2 = &p1, * p3 = &p2, pointers to x & r1 = x, && r2 = r1, &&& r3 = r2; references to x int * p4 = &z, & r4 = z;

A reference is a handle to an object, like a pointer, but is automatically dereferenced the specified number of levels. Referencing (address-of &) a reference variable cancels one of the implicit dereferences, until there are no more implicit references, after which normal expression behaviour applies.

0 / 1

Literals 0 and 1 are special in C: conditional ⇒ expr != 0 and ++/-- operators require 1.

struct S { int i, j; }; void ?{}( S * s, zero_t ) with( s ) { i = j = 0; } zero_t, no parameter name required void ?{}( S * s, one_t ) with( s ) { i = j = 1; } one_t, no parameter name required int ?!=?( S * op1, S * op2 ) { return op1->i != op2->i

S ?+?( S op1, S op2 ) { return ?{}( op1->i + op2->i, op1->j + op2->j; }
op1->j != op2->j; }

S s0 = { 0, 1 }, s1 = { 3, 4 }; implict call: ?{}( s0, zero_t ), ?{}( s1, one_t ) if ( s0 ) rewrite: s != 0 ⇒ S temp = { 0 }; ?!=?( s, temp ) s0 = s0 + 1; rewrite: S temp = { 1 }; ?+?( s0, temp );

(my note: in the above, ?{} is syntax for constructor definition)

Postfix Function/Call

Alternative call syntax (postfix: literal argument before routine name) to convert basic literals into user literals, where ?` denotes a postfix-function name and ` denotes a postfix-function call.. postfix function constant argument call variable argument call postfix routine pointer

int ?`h( int s ); int ?`h( double s ); int ?`m( char c ); int ?`m( const char * s ); int ?`t( int a, int b, int c );

0 `h; 3.5`h; '1'`m; "123" "456"`m; [1,2,3]`t;

int i = 7; i`h; (i + 3)`h; (i + 3.5)`h;

int (* ?`p)( int i ); ?`p = ?`h; 3`p; i`p; (i + 3)`p;

Routine

Routine names within a block may be overloaded depending on the number and type of parameters and returns.

selection based on type and number of parameters void f( void ); (1) void f( char ); (2) void f( int, double ); (3) f(); select (1) f( 'a' ); select (2) f( 3, 5.2 ); select (3)

selection based on type and number of returns char f( int ); (1) double f( int ); (2) [ int, double ] f( int ); (3)

Extending Types

Existing types can be augmented; object-oriented languages require inheritance. (@ ⇒ use C-style initialization.)

  1. include <time.h> void ?{}( timespec & t ) {} void ?{}( timespec & t, time_t sec ) { t.tv_sec = sec; t.tv_nsec = 0; } void ?{}( timespec & t, time_t sec, time_t nsec ) { t.tv_sec = sec; t.tv_nsec = nsec; } void ?{}( timespec & t, zero_t ) { t.tv_sec = 0; t.tv_nsec = 0; } timespec ?+?( timespec & lhs, timespec rhs ) { return (timespec)@{ lhs.tv_sec + rhs.tv_sec, lhs.tv_nsec + rhs.tv_nsec }; } _Bool ?==?( timespec lhs, timespec rhs ) { return lhs.tv_sec == rhs.tv_sec && lhs.tv_nsec == rhs.tv_nsec; } timespec ?=?( timespec & t, zero_t ) { return t{ 0 }; }

timespec tv0, tv1 = 0, tv2 = { 3 }, tv3 = { 5, 100000 }; tv0 = tv1; tv1 = tv2 + tv3; if ( tv2 == tv3 ) ... tv3 = tv2 = 0;

Trait

Named collection of constraints.

trait sumable( otype T ) { void ?{}( T &, zero_t ); constructor from 0 literal T ?+?( T, T ); assortment of additions T ?+=?( T &, T ); T ++?( T & ); T ?++( T & ); };

Polymorphic Routine

Routines may have multiple type parameters each with constraints.

forall( otype T

T sum( T a[ ], size_t size ) { T total = 0; instantiate T from 0 by calling its constructor for ( size_t i = 0; i < size; i += 1 ) total = total + a[i]; select appropriate + return total; } int sa[ 5 ]; int i = sum( sa, 5 ); use int 0 and +
sumable( T ) ) polymorphic, use trait

Polymorphic Type

Aggregate types may have multiple type parameters each with constraints.

forall( otype T

struct Foo { T * x, * y; }; Foo( int ) foo; int i = sum( foo.x, 5 );
sumable( T ) ) polymorphic, use trait

(my note: i like the <T> syntax better for polymorphism)

Remove Definition Keyword

Keywords struct and enum are not required in a definition (like C++).

struct S { ... }; enum E { ... }; S s; "struct" before S unnecessary E e; "enum" before E unnecessary

---

"Of course, if you go the forth-lite route and have nearly completely consistent tokenization along a small set of special characters, this is much easier. Forth-lite languages can be split by whitespace and occasionally re-joined when grouping symbols like double quote are found; lisp-lite languages with only a few paired grouping symbols can easily be parsed into their own ASTs." [11]

---

"

Another important C feature we can see in this example is the presence of preprocessor macros. Macros are part of a pre-compilation step. With them it is possible to #define global variables and do some basic conditional operation (with #ifdef and #endif). All the macro comands begin with a hashtag (#). Pre-compilation happens right before compiling and copies all the calls to #defines and check #ifdef (is defined) and #ifndef (is not defined) conditionals. In our "hello world!" example above, we only insert the line 2 if GL_ES is defined, which mostly happens when the code is compiled on mobile devices and browsers.

"

---

the table "Key Features Lisp Features Python Features" [12] is great. Reread it after you have a bit of an implementation of a bit of candidate syntax.

---

CJefferson on Aug 7, 2013 [-]

Most methods in C read like an assignment,

so if you are trying to remember what order the arguments to strcpy go, it's

    strcpy(x,y) is like x = y

Then remember specially that typedef is the wrong way around to the way you would like it to be :)

belovedeagle on Aug 7, 2013 [-]

The trick with typedef is it is exactly the same syntax, in all cases, as declaring a variable of that type. This is essentially the only way you're going to ever remember how to do function pointer typedefs: typedef int (function_t)(int,int); or something to that effect

---

" 1. Coding JS quite a bit these days, I greatly miss every form returning a value. 2. Python's lambda is just sad, and GvR? keeps threatening to remove even that. "

"in Lisp, you can (and it's normal to) give docstrings to variables"

---

" Donald Knuth reports that

    The lack of operator priority (often called precedence or hierarchy) in the IT language was the most frequent single cause of errors by the users of that compiler.[26]"

---

todo [13] seems to claim that recursive descent cannot parse operator expressions with precedence and associativity? But [14] claims that Python in LL(1); i thought recursive descent can efficiently parse LL(1) [15]? Note that [16] suggests that it's just that recursive descent cannot EFFICIENTLY parse associativity? But [17] claims .

so which is it? Is Python LL(1) and recursive descent CAN parse operator expressions with precedence and associativity

ah, i think i got it, see https://jeffreykegler.github.io/personal/timeline_v3#h1-the_operator_issue_as_of_1968 . I bet Python used the BASIC-OP->LIST-OP type transformation, and parses expressions as lists, and adds associativity later.

Note also that http://garethrees.org/2011/07/17/grammar/ says that Python grammar cannot be expressed as an operator precedence grammar.

---

so we probably want our grammar to be expressible as an operator precedence grammar; see http://garethrees.org/2011/07/17/grammar/

---

this regex syntax from JS looks good:

  const parseExpr = () => /\d/.test(peek()) ? parseNum() : parseOp();

(from [18])

---

effect name ideas: reads, modifies, requires, ensures

---

https://www.python.org/dev/peps/pep-0572/

" In most contexts where arbitrary Python expressions can be used, a named expression can appear. This is of the form NAME := expr ...

  1. Share a subexpression between a comprehension filter clause and its output filtered_data = [y for x in data if (y := f(x)) is not None] "

sametmax 23 hours ago [-]

I will be happy to be able to do:

    while (bytes := io.get(x)): 

and:

    [bar(x) for z in stuff if (x := foo(z))] 

Every time Python adds an expression counterpart to an existing statement (lambdas, intensions, ternary...) there is a (legit) fear it will be abused.

But experience tells that the slow and gradual pace of the language evolution combined with the readability culture of the community don't lead that way.

While we will see code review breaking materials in the wild, I believe that the syntax will mostly be used sparingly, as other features, when the specific needs arise for it.

After all, it's been, as usual, designed with this in mind: "=" and ":=" are mutually exclusive. You don't use them in the same context.

The grammar makes sure of it most of the time, and for the rare ambiguities like:

    a = b

vs

    (a := b)

The parenthesis will discourage pointless usage.

gshulegaard 12 hours ago [-]

I agree that I would have preferred "as"...but that said I am struggling to think of a reason this is needed.

    while (bytes := io.get(x)):

Would currently be written:

    bytes = io.get(x)
    while bytes:

And likewise:

    [bar(x) for z in stuff if (x := foo(z))]

is equivalently:

    [bar(foo(z)) for z in stuff if foo(z)]

Perhaps this is just my personal opinion but I don't really think the ":=" (or "as" for that matter) adds much in the way of clarity or functionality. I guess at the end of the day I am neutral about this addition...but if there isn't a clear upside I usually think it's better to have less rather than add more.

reply

Dunnorandom 12 hours ago [-]

The first example would actually be equivalent to something like

    while True:
        bytes = io.get(x)
        if not bytes:
            break
        ...

---

" Parametric type constructors are now always called with the same syntax as they are declared. This eliminates an obscure but confusing corner of language syntax. " [19]

---

mb .symbol instead of :symbol like in clojure, or instead of SYMBOL like i currently have (on the one hand, capitalized words take two extra keypresses; on the other hand, they are not chorded and they are easy to read; on the other hand, my colemak seems to remap 'caps lock' to 'delete', making it hard for me to type capitalized words (except by holding down shift, which is too much chording); on the other hand, there's probably some way to change that; on the other hand, i could use caps lock for escape from viper mode in emacs; on the other hand, i don't do that now, so why worry about it)

---

if we automatically put a ';' at the end of each line, then we can't do stuff like:

x.a() .b();

alternatives:

x.a(). b()

(Python does not do this; a line like x.a(). ending with '.' is a syntax error)

i think i like the third option. Keep lines open for unmatched grouping constructs like parens, but also if it ends in a '.'.

---

in Rust, 'break' can return a value

(also, like many langs, in Rust 'if' is an expression)

---

in Rust, when a variable name is the same as a field name, you can construct a struct succinctly, eg:

" fn build_user(email: String, username: String) -> User { User { email: email, username: username, active: true, sign_in_count: 1, } } "

" fn build_user(email: String, username: String) -> User { User { email, username, active: true, sign_in_count: 1, } } " -- [20]

---

Rust's struct update syntax (that is, creating a second immutable struct instance by copying some of the fields of an older instance and then changing other of the fields) is succinct:

" let user2 = User { email: String::from("another@example.com"), username: String::from("anotherusername567"), ..user1 }; " -- [21]

---

Rust has 'tuple structs' to allow creating distinct named types for tuples (nominative typing), e.g.:

" struct Color(i32, i32, i32); struct Point(i32, i32, i32);

let black = Color(0, 0, 0); let origin = Point(0, 0, 0); " -- [22]

---

" I had some extended notes here about "less-mainstream paradigms" and/or "things I wouldn't even recommend pursuing", but on reflection, I think it's kinda a bummer to draw too much attention to them. So I'll just leave it at a short list: actors, software transactional memory, lazy evaluation, backtracking, memoizing, "graphical" and/or two-dimensional languages, and user-extensible syntax. If someone's considering basing a language on those, I'd .. somewhat warn against it. Not because I didn't want them to work -- heck, I've tried to make a few work quite hard! -- but in practice, the cost:benefit ratio doesn't seem to turn out really well. Or hasn't when I've tried, or in (most) languages I've seen. " [23]

---

i don't think we want variadic functions in Oot, but if i change my mind, see the section 'Arity' in [24].

---

" Another neat feature is positional destructuring, commonly used with vectors....head and tail destructuring of sequences...

[x & xs] [1 2 3]

Here, x is 1 and xs is [2 3]. ... You can also destructure more than one binding before the tail like so:

[x y & zs] [1 2 3]

Here, x is 1, y is 2, and zs is [3]. " [25]

---

" {:keys [foo] :or {foo 1} :as m}

foo will have the map’s value for key :foo, or 1 if the map doesn’t have a :foo key, and m is bound to the entire map. " [26]

---

hamstergene on Jan 12, 2015 [-]

Certainly not for that reason. Actually, as a non-native English speaker, I consider unicode identifiers one of the most useless features for a language.

There are reasons why unicode identifiers have never been widely used, and probably will never be, even though they've been around for more than a decade:

1. You're unlikely to be allowed to use them. Unicode identifiers stagger international collaboration. Do you want your Chinese colleague to commit some hieroglyphical identifiers into your code, and then an Arabic colleague to add right-to-left curvatures? I doubt that :) They are unwanted in outsourcing, freelancing, open-source, any company which does foreign hire, they are a problem when asking on StackOverflow?, etc.

2. There is just no problem to use English in the first place. Programmers already use English every day. 80-100% of information (documentation, QA, discussion forums) on any programming topic is in English; native-language sources are lacking at best, often nonexistent. If person is a professional programmer, it's way too late for them to have a couple of words translated.

3. They are not as appealing as you might think. Many languages don't work the same way as English. For example, in expression `print(line)` the word `line` may have to have different morphological form than in `var line = ...`, and yet another form for `if '_' in line`; one form for all uses is very unnatural and requires getting used to (and if you're going to adjust anyway, why not adjust to English then).

4. Mixed-language text is simply harder to type when second alphabet is not latin. It's like 3x harder when foreign words need to be inserted. I have even seen colleagues discussing code in chat using English just for that reason. And since at least language keywords and standard libraries are already using English, mixed is what it gonna be. A person must have some real serious trouble with English to tolerate that.

Even in strictly local projects where all comments are in native language, unicode identifiers are rarely used.

P.S. When I was reading Swift book about unicode identifiers, I immediately thought: "first paragraph of every Swift coding conventions on the planet is going to be, don't use unicode identifiers".

---

ausjke 30 days ago [-]

I'm learning Go, just a naive question, why does Go put the variable type at the end of declaration, is this an absolute need? no other widely usage language does that, and it just feels odd to me.

klodolph 30 days ago [-]

C++ is the weird one out

mmastrac 30 days ago [-]

Pascal did it:

     procedure SetColor
         (const NewColor: TColor; const NewFPColor: TFPColor); virtual;

Rust does it too:

   fn do_twice(f: fn(i32) -> i32, arg: i32) -> i32

Go is similar, but less symbols:

   f func(func(int,int) int, int) func(int, int) int

matt_kantor 30 days ago [-]

Also Scala, Haskell, TypeScript?, Swift, Kotlin, Visual Basic, Python (PEP 526), and many others.

---

interesting struct initializer syntax from C (i think):

" struct kvm_userspace_memory_region region = { .slot = 0, .flags = 0, .guest_phys_addr = 0, .memory_size = mem_size, .userspace_addr = (size_t)mem }; "

---

barbecue_sauce 14 hours ago [-]

There is one thing I really dislike about Ruby (and other languages like Scala that have the same feature), and it's non-parenthetical function invocation. I get that it's good for writing DSLs and similar endeavors, and you're only supposed to use them where its implicit what you're doing, but in my opinion, it really hurts readability (though Scala is much worse when it comes to flexible yet opaque syntax).

reply

jaggederest 12 hours ago [-]

Unfortunately you really can't get the refactoring capabilities without being able to substitute a method for a local variable transparently, which is extremely useful over the long run.

---

what would it take to make a programming language easier to program on on a mobile phone?

i think the first problem is that mobile phone touchscreen keyboards are so small that they are hard to type on. This is a consequence of trying to make a keyboard such that every key is reachable by a thumb while that hand is holding a phone. I think 26 letters are just too many to fit in that area without making each virtual key small.

So how about a restricted alphabet for language keywords?

from conLangsCommonSyllablesAndPhones.txt, we come up with:

a,b,d,e,g,h,i,j,k,l,m,n,o,p,r,s,t,u,w

which letters are missing from the normal alphabet?

c,f,q,v,x,y,z

We still have a lot of letters left. But it's about a 25% reduction in alphabet size. Also since we got under 20, this might be more amenable to future 'air typing' input methods than the standard alphabet.

You can see how to deal with these removed letters; replace 'c' with 'k' or 's' phonetically; replace 'q' with 'kw' when phonetic; etc. Just leave out the others and deal with the ambiguity when there are collisions. The only one that will be sorely missed is probably 'f'.

So we should not use the removed letters in language keywords/reserved words or in the language libraries (persuading other programmers to not to demand to use certain letters in their own programs and third-party libraries seems like a hopeless task though; let them use all ASCII). In order to allow phone IDEs that let you easily select a language keyword, let's keep the number of such keywords to <= 19. Also let's use <= 19 punctuation symbols (~`!@#$%^&*()-=_+[]{}o'O",./\<>?

is 32, so that means that 13 of those have got to go; let's go take another look at that analysis of which symbols are most hard to find/hard to type/least common on phone and international keyboards, and the analysis of which symbols are shifted on QWERTY, to help decide which ones to eliminate; a first pass would be to eliminate the shifted symbols on QWERTY symbol-only keys, that is, ~_+{}O"<>?, plus two more).

Note that if we had <= 19 keywords, plus <= 19 symbols, that would be amenable to an encoding of the tokenized program text with <= 32 token classes.

---

on my current Android phone's default keyboard ('Gboard'), the primary screen (with the lowercase letters) has the following symbols:

/.

and the secondary number/symbol screen has:

@#$_&-+()/*"':;!?,.

the others are all on the tertiary symbol screen. (note: on the same level as the teriary symbol screen in Gboard is the 'calculator' screen, which has +-*/=% plus the numbers and space)

so, the unshifted symbols on QWERTY are (11 symbols):

`-=[];',./\

so if we keep all of those, then we can add 8 more. We should choose those from the Android secondary number/symbol screen, which is, after removing those above:

@#$_&+()*":!?

now what is available on the iphone keyboard? i think the iphone default keyboard secondary symbol page is (todo check):

-/:;()$&@".,?!'

on the ipad it is:

-/:;()$&@".,'#%+=

removing those which we already have (b/c they are unshifted on QWERTY):

:()$&@" ?!

  1. %+

taking the intersection of those and the remaining Android secondary number/symbol keyboard page (from above, @#$_&+()*":!?):

:()$&@" ?!

  1. +

(the first row plus ?! is the same; that means, everything in the iPhone secondary number/symbol keyboard page is also in either or both of unshifted QWERTY and the Android secondary number/symbol keyboard page)

that's 9 (+4) characters, but if we want to have 19 total, we can only add 8 to the those already available as unshifted QWERTY. Unless -- we want to remove something from unshifted QWERTY. Here are the unshifted QWERTYs not available on either the Android or iPhone secondary number/symbol keyboard page: `=[]\

` is the obvious choice to remove -- = is common in programming, we want some matched grouping characters, and \ is useful as an escape.

So we end up with the following 19 symbols (the top row is the remaining unshifted QWERTY, and the bottom is the remaining iphone/android intersection):

-=[];',./\ :()$&@"?!

i think it's kind of absurd to leave out + and *. We can instead remove the parens (which are really nice to read but also really annoying to type alot b/c they are shifted), to get:

-=[];',./\ :+*$&@"?!

i'm not sure if that's ideal though; parens are probably really useful. It may be better to put the parens back in and remove some of the other 'weird' symbols, the ones often used as variable sigils $&@?!. So maybe something like:

-=[];',./\ :+*@()"?!

we should maybe also consult ASCII character frequencies:

https://reusablesec.blogspot.com/2009/05/character-frequency-analysis-info.html thinks the most frequent symbols in passwords are:

.!*@-_$#,/+?;^ %~=&'\)[]:<(

so the most frequent ones we've left out there are _$#. _ and # arent on iphone 2ndard keyboard page, and @ is more frequent than $. So seems like we're on the right track.

http://xahlee.info/comp/computer_language_char_distribution.html has (spaces to show big frequency dropoffs)

golang: . (), =/{}:", *[]+;%&!<->_

js: = )(.,; /{}'_:*[]+-"$!&
\?#^
?`<>%@\#^'~
    or
  = .';/)(, {} :_*-!|&`+[]$?@%"<>\#^
    or
  ()=,.; {}[]"+)$:/-*?<&|!>\%^
    or
  .)(/=,;"{}':*|-][&0+1!$_`?<>\@23#
    C: _ )( *,; =-/.0>"1#&{}'+2
C++: _ \ : )(,.><-;*=#{}&012!\~"@3'+46Java: . () ; * ,/"=}{ -0><_@2+1:[]'\!&3(there were a bunch more, including Python, but i omitted them)
!:[]\<384%65?97`$~@^
5[]87%9$?^`
?84#6$957%^~`

.... all languages combined (~20% C, ~18.5% Python, ~13.5% PHP, etc): _ )( ., '=-/";*:>$0#{}1<&\2[]@

+3!4%586?97`~^

observations: + is really uncommon ?!@ are really uncommon only one matched separator pair is really really needed, and it is very common _ is really common (but i dont mind, we'll use -) ., are really common

 in general, things that we left out that are most common are:
  _ ><$#
 things that we left in that are least common are:
 []\+@?!

what do i think of that?

 _ i dont mind leaving out, we'll use -
 <> we may want to put back in
 $# we may want to put back in, although not that important
 we are switching () and [], but the second one could still be useful
 \ isnt used much but when it is, you need it
 + seems sillly to leave out
 ?! we could take out, although i kind of like having those rather than $#
 @ we could leave out

so mb:

things to leave in for sure: -=[];',./\ (from unshifted qwerty) :+*" (from phone keyboards)

so that's 14. So we have 5 spaces left. Candidates are: ` (unshifted QWERTY, but very uncommon, and probably hard to type on some international keyboards) ()$&@?! (phone keyboards)

probably we want 5 out of the 7: ()$&@?! so we need to take out 2 of those.

hmmm... let's allow 20, not 19.

Also, on phone keyboards, {} is as easy as [], and we would really like {}, so let's have that.

and we want ()

so the remaining choice is to select 2 out of: $&@?!

i think we want ?!. Out of the other 3, on my vertical qwerty keyboard, it's easier for me to find @ than $ or &, and & is the hardest for me to locate for some reason. But i think ? and ! are most important. But <> are even more important So let's choose:

{}()<>

So in total we have:

all letters except c, q, v, x, y, z (20 letters) 20 punctuation characters: -=[];',./\ :+*" {}()<>

hmm, i dunno about that. I think we really want at least: ?!

but those would definitely take us over 20.

and maybe also: @#$

hmm...

and what about one of our favorites, the pipe character

? also ~ seems useful sometimes.

hmm... i might have to giveup on this idea of having only 20 letters and 20 symbols.

i guess since we have 10 numerical digits, we have 10 spots left on a third screen right? But note that my android phone has 10 numbers plus 19 symbols, and my ipad has 10 numbers plus 18 symbols on the second screen (plus 2 symbols on the primary screen with the letters). So if anything we already have too many.

on the other hand, realistically no one is probably going to make a keyboard with cqvxyz on a second screen (and certainly not one without those letters at all), and both android and ipad already relegate <>{}[] to the third screen. So maybe i should just give up with this stuff, and think of this as a prioritization rather than an absolute limit.

so... i guess the prioritization is:

don't use letters cqvxyz very much

use this punctuation the most:

., (unshifted on qwerty, and on primary screen of both android and ipad)

and then:

-;'/ (unshifted on qwerty, and on primary or secondary screen of both android and (ipad or iphone))

and then:

[]\ (unshifted on qwerty, but on the tertiary screen of either android or ipad or iphone)

:+*"() (shifted on qwerty, and on the secondary screen of both android and (ipad or iphone))

and then:

?! (shifted on qwerty, and on the secondary screen of both android and (ipad or iphone), but lower priority for me. Note: due to their use in English i think these seem less 'weird' than @#$&, and they are also easier to find on a qwerty keyboard, so i am putting them first, even though they are on the tertiary screen of the ipad) @#$& (shifted on qwerty, and on the secondary screen of android, ipad and iphone, but lower priority for me)

and don't use this punctuation as often:

%_ (shifted on qwerty, and on the secondary screen of either android or (ipad or iphone) but on the tertiary screen of the other)

and use this punctuation the least: `~^{}<>

(shifted on qwerty, and on the tertiary screen of android, ipad and iphone)

some difficulties here are: {} <> are very useful for programming languages for scoping and less-than/greater-than

is useful as a 'pipe' because of its visual character

we could use '$' as a pipe, like Haskell, but i feel that's ugly/hard to remember. We could use something other than <> but i feel that's hard to remember. We could use more than one symbol for scoping but i feel that takes up too much space and is hard to remember.

so if we had to choose 20 they would probably be:

., -;'/

[]\

:+*"() {}<>

leaving out: ?! @#$& %_ `~^

note that that choice of 20 leaves us to no 'sigils' (eg to mark pointers/references/non-referentially-transparent-data), and nothing for 'pipes' (you could do double-characters like -> though), and nothing for our ideas like random variables/metavariables (? is the obvious choice) and syntactic sugar for default anonymous parameters (? or _ is the obvious choice). Otoh maybe we SHOULD forget all that stuff; as cool as it is, it's probably harder to remember, and it does make code look more like 'line noise'.

i kinda feel like ' could be used for 'marking' (attaching metadata to one item within a data structure; e.g. (1 2'mark1 3). In that case it's hard to also use it for my Maybe stuff though? Well not really, could just use for that i guess.

it burns up a useful punctuation character, but i kinda feel that identifiers should treat '-' just like an alphabetic character (the way _ usually works). In that case, however, we have to come up with something else for unary negation, right? well maybe not. but either - or -- is a subtraction operator, right? which is not the same as an alphanumeric. So maybe not. Otoh if we are omitting _ then - is our Pythonic 'privacy prefix', so it should be able to prefix alphanumeric identifiers. So i guess the rule is just, don't mix operators and alphanumeric identifiers (operators can't have alphabetic letters in them, and identifiers must have alphabetic letters in them, but either can start with -)

? would be useful as the trinary conditional operator, though (like in C). i guess that doesn't need to be punctuation tho? could use a single capital for that?

---

[27]

" def find(xs, fun): for(xs) x, i: if fun(x): return i return -1

var r = 2 var i = find [ 1, 2, 3 ]: _ > r "

" Blocks / anonymous function arguments are always written directly after the call they are part of, and generally have the syntax of a (possibly empty) list of arguments (separated by commas), separated from the body by a :. The body may either follow directly, or start a new indentation block on the next line. Additionally, if you don’t feel like declaring arguments, you may use variable names starting with an _ inside the body that are automatically declared. ... for and if look like language features, but they have no special syntactical status compared to find. Any such functions you add will work with the same syntax. ... blocks/functions may refer to “free variables”, i.e. variables declared outside of their own scope, like r. This is essential to utilize the full potential of blocks. ... return returns from find, not from the enclosing function (which would be the block passed to if). In fact, it can be made to return from any named function, which makes it powerful enough to implement exception handling in Lobster code, instead of part of the language. "