proj-oot-old-150618-ootNotes

---

handy list of symbols convenient for freq usage:

unshifted, double unshifted, shifted, double shifted

`-=()\;',./ `` -- == (( ))
;; ,, .. ~!@#$%^&*[]_+{}

:"<>? ~~ !! @@ ## $$ %% ^^ && [[?]] __ ++ [[image: ?]]:: "" 1 ??

i am wondering which of these are hard to type on non-US keyboards. the second post here http://www.cpptalk.net/5-vt10808.html?postdays=0&postorder=asc&start=60 opines that it would have been better if "I don't know. Had the problem been addressed from the start, if for example, Kernighan and Richie had refused to use any character which wasn't in the invariant part of ISO 646, I think it would have been a good thing. I've had to develop C on terminal which only supported ISO 646-DE. ". A quoted comment on that page also gave some examples of common characters which are hard to type in italy: " I've to admit that it's difficult to find PCs in italy with an US keyboard; looks like italians are not considered as potential programmers (it's hard to type "{") or internet citizens, for that matter (it's hard to type "@" or "~" too, with no standard for it). "

So maybe i should look at ISO 646? according do http://en.wikipedia.org/wiki/ISO/IEC_646 , there is the invariant subset, but there is also T.61, which gives you more punctuation, but leaves out {,~, which the italian guy found hard (but T.61 has @; but i've gotta belive that @ at least will be changing in italy soon tho! that post was from 2004 btw). the punctuation still not in T.61 is: \ ^ ` {} ~

the ones in T.61 but not INV are #$@[]

C deals with this with http://en.wikipedia.org/wiki/C_Trigraph

http://stackoverflow.com/questions/1234582/purpose-of-trigraph-sequences-in-c :

"It may happen that some terminals and/or virtualization doesn't let you access easily to some characters. In my experience the main offender is the tilde. – Francesco Nov 3 at 19:24"

see also http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2910.pdf , although it mostly talks about backwards-compatibility and doesn't give much info useful for someone designing a new language

http://www.wikicreole.org/wiki/Talk.EscapeCharacterProposal says that tilde is difficult on italian and german keyboards

i searched some more but didn't find much else. i guess i'll assume that mainly ~ is the problem. mb curly braces, too.

backslash isn't very common so that just prevents me from treating it like an easily-typed unshifted character.

as for the ones in T.61, the only one of those that i expect to be real common is []. but i cant very well leave out both [] and {}.

Exploring Regularity in Source Code: Software Science and Zipf's Law Hongyu Zhang

lists the most common tokens and identifiers in some java-related situations:

                                          Table 4. Top twelve most common tokens
           Rank    1     2    3      4    5         6    7          8       9         10       11         12
     Jena          ()    .    ;      ,    {}        =    public     new     return    if       +          String
     Tomcat        ()    .    ;      {}   =         ,    public     if      String    null     +          return
     Ant           ()    .     ;     {}   =         ,    public     String  if        new      +          void
     Swing         ()    ;     .     ,    {}        =    if         int     public    return   null       0
     jEdit         ()    ;     .     ,    =         {}   if         int     return    public   new        i
     Jetty         ()    ;     .    =     {}        ,    public     if      String    null     return     import
     jHotdraw      ()    .     ;    {}    ,         =    public     void    int       return   new        if
     DrJava        ()    .     ;    ,     {}        =    public     new     void      String   return     +
     Protégé       ()    ;     .     ,    {}        =    public     return  slot      void     private    String
     Cocoon        .     ()    ;     {}   ,         =    this       String  import    if       org        null
     JavaCC        ()    .     ;     ostr println   =    ,          {}      +         if       i          []
     jUnit         ()    ;     .    ,     {}        =    public     new     void      return   String     0
                                          Table 5. Top ten most common identifiers
    Rank     1        2          3            4        5        6          7       8              9          10Jena         String   i          jena         om       hp       hp1 n       m              node       resource Tomcat       String   i          org          apache   name     log        java    javax          request    append Ant          String   org        apache       tools    ant      i          File    build- java       project Exception Swing        i        g          c            x        y        String     e       java           a          width jEdit        i        String     jEdit        name     buffer   length     log     Object         e          path Jetty        String   i          log          java     org      e          name    IOException    length     mortbay jHotdraw     x        y          draw         r        CH       ifa        point   Figure         java       i DrJava? String   assert- doc          File     cs       edu        rice    i              e          drjava Equals Protégé      slot     String     cls          Slot     i        frame      Cls     Collection     edu        Stanford Cocoon       String   org        apache       cocoon   i        getLogger  java    name           avalon     framework JavaCC? ostr     println    i            0 i        j          String  java           Vector     Options jUnit        String   e          GridBag?- test     Test     i          junit   expected       result     message Costraints

and from "CSteg: Talking in C code"

              Table 1: Frequency of C tokens in cryptographic software.
                   Token type                   Appearance in %
                   Punctuator                               51.59
                   Identifier                               30.02
                   Numerical literal                        11.63
                   Reserved word                             4.77
                   String literal                            1.29
                   Preprocessor directive                     0.7Measures have been made with tools taken from (?). Comments have n ounted. Frequency distribution of C tokens gathered in our tests is descr le 1. Table 2: Freq. of punctuator tokens in analyzed software. Token      Frequency     Token      Frequency , 21.52 -> 2.05 ; 13.21 . 1.82 ( 12 * 1.73 ) 12
1.34
                   =                5.41    #                1.19
                   ]                  4.8   v++              1.11
                   [                  4.8   +                   1
                   {                2.21    *v               0.92
                   }                2.21    Other           11.68Most used punctuator tokens are described in Table 2. Reserved words fre ore homogeneous (Table 3). We have found that inside each group of possible tokens (punctuators and r ds) there are only a few tokens which are commonly used. The rest of the Table 3: Freq. of reserved words in cryptographic software. Word           Frequency      Word       Frequency if                  14.84 static           2.93 int                 13.79 register         2.84 unsigned             9.25 case             2.83 char                 8.84 while            2.60 for                  8.30 break            2.54 void                 5.85 sizeof           1.54 else                 5.09 extern           1.21 return               5.02 short            1.14 long                 3.74 struct           0.98 const                3.49 Other            3.15 6

---

todo, read http://stackoverflow.com/questions/tagged/language-design


newtype vs data with a single strict field: newtype is just type coercion, takes no time at runtime. how to do in oot? compiler that recognizes when constant fields are only referred to in types?

"new" constructor to construct pattern with constants in pattern? or just constructor?

all caps are keywords (global symbols)

how to simplify things like this (Java Android): ((AlarmManager?)context.getSystemService(Context.ALARM_SERVICE)).cancel(pendingIntentAlarm);

dependent types? i.e. context.getSystemService(Context.ALARM_SERVICE) returns something of type AlarmManager??


in haskell, = is the only symbol needed to defn a fn w/ args:

f x y = exp

in oot currently, u need : also:

f = x y : exp (or should it be f = y x : exp ?)

should we make it like in haskell, where multiple things on the lhs signifies a fn w/ inputs? or should we save that form for computed lhs's i.e. in assignments? also note that we want the fn name to be on the very left, b/c text editors left justify, as noted earlier. so

f x y = exp is a fn? or

y x f = exp

means "assign exp to the result of y x f"?

and what about graph assignment?

so far i think we should keep it as is

to avoid typing the :, we could make it so that either all of the variables have to be indented, or none of them, and that if there is a single thing at the end indented more than the vars, then there is an implicit colon in between the vars and it. so both of these are equiv to "f = x y : exp":

f = x y exp

f = x y exp

mb the general rule is stated: "after there has been at least one thing after the equals, if there is another thing indented more, then put a : in between them". of course, if there is no other thing indented more, then there is no : at all.

the following is illegal:

f = x exp y

hmm, we should probably turn that around actually, since we dont want the fn body, which is bigger, to be the thing which is indented. so

f = x y exp

is the way to go; and

f = x exp y

or

f = x exp

are both illegal (after something with a lesser indent, i.e. the body, you cannot have something of a greater indent, i.e. a variable


do we want to replace multiplication and exponentiation with something like knuth up arrow notation or the hyperoperator? seem like it would annoy ppl. but so elegant!

also, as noted, could make "-" prefix denote inverse by convention, and have -+ for subtraction, -* for division.


lazy or strict patterns?

strict patterns let you define case-statements in an ad-hoc way for ad-hoc polymorphism. lazy patterns seem more natural, but arent conditional

haskell "pattern bindings" are like "graph assignment". gentle intro points out the need for some laziness there:

" fib@(1:tfib) = 1 : 1 : [ a+b

(a,b) <- zip fib tfib ]

This version of fib has the (small) advantage of not using tail on the right-hand side, since it is available in "destructured" form on the left-hand side as tfib.

[This kind of equation is called a pattern binding because it is a top-level equation in which the entire left-hand side is a pattern; i.e. both fib and tfib become bound within the scope of the declaration.]

Now, using the same reasoning as earlier, we should be led to believe that this program will not generate any output. Curiously, however, it does, and the reason is simple: in Haskell, pattern bindings are assumed to have an implicit ~ in front of them, reflecting the most common behavior expected of pattern bindings, and avoiding some anomalous situations which are beyond the scope of this tutorial. Thus we see that lazy patterns play an important role in Haskell, if only implicitly."


thought about typing syntax from an attempt to write down the abstract data structures used in a todo list program i saw:

activeness = ['active 'inactive]

projectp.actions.# is action actions are in projects categoryp.projects.# is project projects are in categories

projectp.name is str categoryp.name is str projectp.status is activeness categoryp.status is activeness

project = projectp proto category = categoryp proto


is single-line comment

convention: capitalized identifier means the set of things that fit the prototype assigned to the corresponding lowercase identifier

convention: when you say "x is a", this means tell the type system to prove that "x in A"

Activeness = ['active 'inactive] could also have said $'[active, inactive]

project.actions.# is action actions are in projects category.projects.# is project projects are in categories

project.name :: str category.name :: str project.status :: activeness category.status :: activeness


old:

convention: "is" means "isa", and is like : in haskell (mb should be ::)

project.name is str category.name is str project.status is activeness category.status is activeness


want quick way to represent one-to-many relations; equiv to these type statements :

1-to-many relation b/t projects and actions project.actions.# is action action.project is project

1-to-many relation b/t categories and projects category.projects.# is project project.category is category

but also the assumption that:

& a in p.action if hmmm dont i mean "implies" instead of "if"? action.project == p action.project == p if hmmm dont i mean "implies" instead of "if"? a in p.action

hmm, rewrite:

& (a in p.action)- or action.project == p (action.project == p)- or a in p.action

heck, i guess that stuff might be common in assertions. let's have "implies" and "biconditional" logical operators, "->" and "<->" (note: see if this conflicts whith whatever notation we come up with for special uses of '-'; i dont think it does b/c - is only negation at the end):

a in p.action <-> (action.project == p)

anyhow, so need a notation for that, and for 1-1 and many-many. how about

many-1 project.actions action.project 1-1 wife.husband husband.wife many-many fan.idols celebrity.fans

and also variants to indicate if the corresponding sets can be empty:

many-1 project.actions action.project 0 < #project.actions (action.project is a single value of type "project") many-1' project.actions action.project (action.project is a single value of type "project") many'-1' project.actions action.project (action.project is a single value of type "project'") (note: #x = x len)

i guess these are shortcuts for graph pattern assertions augmented with some assertions using "in" (if those aren't allowed anyway; i guess they should be; dude so now graph patterns express FOL and set theory?!? seems too expressive):

many-1 project.actions action.project

-->

project.actions.# is action 0 < #project.actions action.project is project !a in !p.action <-> action.project == !p

many-1' project.actions action.project

-->

project.actions.# is action action.project is project !a in !p.action <-> action.project == !p

many'-1' project.actions action.project

-->

project.actions.# is action action.project' is project' two conflicting uses of ' syntax -- if types are values, then is project' a value which is of type "set of projects, unioned with null", or of type "either null, or a set" -- TODO !a in !p.action <-> action.project == !p which, because action.project is used, not action.project, actually compiles to action.project' -> !a in !p.action <-> action.project == !p

so these many-1 guys are just macros, is that it?

syntax to add: postfix ' on types for nullable type postfix ' on values for a value of a nullable type

  1. alone at end for "len"
  2. prefix for what i used to call @ <->, -> ! prefix for universally quantified variable ? prefix for variable or existentially quantified variable

--

note inspired by the above: are assertions and predicates the same? that is, can the above be interpreted as an assertion? no, something is a predicate if its the last nonindented thing in a definition, but an assertion if it's not the last. it's easy enuf to distinguish booleans being returned from boolean assertions; the last nonindented line, and the lines below it, are "real", the ones above are assertions. of course, an assertion can "call" a predicate, so the same subroutine could be used to calcuate boolean return values and to calculate an assertion's bool, e.g.

greaterthan5 = x : x > 5 plus6 = x : x 6 +

or in point-free notation: greaterthan5 = > 5 plus6 = how to assert something about the result of the fn? how to indicate variables? some alternatives: x plus6 > 5 ?x plus6 > 5 won't this cause a constraint check? or should we disallow constraint satisfaction within assertions? the return value 6 +

or.. plus6 = the return value 6 + ! ?x plus6 > 5 seems redundant; from the use of plus6, we KNOW it's a postcondition

main = 8 greaterthan5 seq 8 plus6 pr

but then how to do assertions within sequences? or assertions that hold during or after the seq? hmmm mb should indicate assertions somehow afterall? ! on every assertion? or only within sequences? or "ass" or "asr" or something? i guess ! would be easiest. at the beginning of the line. optional outside seq?

so far, i think the best option is: !! at beginning of line in sequences, optional without. assertions before seqs mean conditions that hold eternally/persistently. assertions themselves can't have sequences (i.e. they cant do logging) -- but they can do nondeterminism and input i.e. they are not ref trans. ?-variables within assertions don't mean "solve this constraint", they mean, "if you should happen to to apply this fn enuf times to bind these variables, then this assertion must hold" (i.e. assertions are lazy). that is, within normal program logic, ?-variables are implicitly existentially quantified, i.e. ?x f > 5 means "find an x such that f(x) is greater than 5, and set ?x to that", but in an assertion it means "for every value of x, f(x) must be greater than 5" no wait, i was using assertion syntax to write constraints, huh. so how to write lazy assertions, i.e. "?x plus6 > 5"? or should we use syntax to disambiguate constraints and assertions? mb "?x" is existentially quantified x and "!x" is universally quantified x? so

plus6 = !x plus6 > 5 6 +

means "for every x, plus6(x) must be > 5", and

plus6ButEven? = x : !x plus6ButEven? 2 mod == 0 evener in [0 1] x plus6 evener +

but this could be rewritten

plus6ButEven? = x : res 2 mod == 0 evener in [0 1] res = x plus6 evener + res

mb we could provide "res" as a keyword so it would look like:

plus6ButEven?