Bayle Shanks's website: proj-oot-old-150618-ootNotes

---

handy list of symbols convenient for freq usage:

unshifted, double unshifted, shifted, double shifted

`-=()\;',./ `` -- == (( ))
;; ,, .. ~!@#$%^&*[]_+{}

:"<>? ~~ !! @@ ## $$ %% ^^ && [[?]] __ ++ [[image: ?]]

:: "" 1 ??

i am wondering which of these are hard to type on non-US keyboards. the second post here http://www.cpptalk.net/5-vt10808.html?postdays=0&postorder=asc&start=60 opines that it would have been better if "I don't know. Had the problem been addressed from the start, if for example, Kernighan and Richie had refused to use any character which wasn't in the invariant part of ISO 646, I think it would have been a good thing. I've had to develop C on terminal which only supported ISO 646-DE. ". A quoted comment on that page also gave some examples of common characters which are hard to type in italy: " I've to admit that it's difficult to find PCs in italy with an US keyboard; looks like italians are not considered as potential programmers (it's hard to type "{") or internet citizens, for that matter (it's hard to type "@" or "~" too, with no standard for it). "

So maybe i should look at ISO 646? according do http://en.wikipedia.org/wiki/ISO/IEC_646 , there is the invariant subset, but there is also T.61, which gives you more punctuation, but leaves out {,~, which the italian guy found hard (but T.61 has @; but i've gotta belive that @ at least will be changing in italy soon tho! that post was from 2004 btw). the punctuation still not in T.61 is: \ ^ ` {} ~

the ones in T.61 but not INV are #$@[]

C deals with this with http://en.wikipedia.org/wiki/C_Trigraph

http://stackoverflow.com/questions/1234582/purpose-of-trigraph-sequences-in-c :

"It may happen that some terminals and/or virtualization doesn't let you access easily to some characters. In my experience the main offender is the tilde. – Francesco Nov 3 at 19:24"

see also http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2910.pdf , although it mostly talks about backwards-compatibility and doesn't give much info useful for someone designing a new language

http://www.wikicreole.org/wiki/Talk.EscapeCharacterProposal says that tilde is difficult on italian and german keyboards

i searched some more but didn't find much else. i guess i'll assume that mainly ~ is the problem. mb curly braces, too.

backslash isn't very common so that just prevents me from treating it like an easily-typed unshifted character.

as for the ones in T.61, the only one of those that i expect to be real common is []. but i cant very well leave out both [] and {}.

Exploring Regularity in Source Code: Software Science and Zipf's Law Hongyu Zhang

lists the most common tokens and identifiers in some java-related situations:

                                          Table 4. Top twelve most common tokens
           Rank    1     2    3      4    5         6    7          8       9         10       11         12
     Jena          ()    .    ;      ,    {}        =    public     new     return    if       +          String
     Tomcat        ()    .    ;      {}   =         ,    public     if      String    null     +          return
     Ant           ()    .     ;     {}   =         ,    public     String  if        new      +          void
     Swing         ()    ;     .     ,    {}        =    if         int     public    return   null       0
     jEdit         ()    ;     .     ,    =         {}   if         int     return    public   new        i
     Jetty         ()    ;     .    =     {}        ,    public     if      String    null     return     import
     jHotdraw      ()    .     ;    {}    ,         =    public     void    int       return   new        if
     DrJava        ()    .     ;    ,     {}        =    public     new     void      String   return     +
     Protégé       ()    ;     .     ,    {}        =    public     return  slot      void     private    String
     Cocoon        .     ()    ;     {}   ,         =    this       String  import    if       org        null
     JavaCC        ()    .     ;     ostr println   =    ,          {}      +         if       i          []
     jUnit         ()    ;     .    ,     {}        =    public     new     void      return   String     0
                                          Table 5. Top ten most common identifiers
    Rank     1        2          3            4        5        6          7       8              9          10Jena         String   i          jena         om       hp       hp1 n       m              node       resource Tomcat       String   i          org          apache   name     log        java    javax          request    append Ant          String   org        apache       tools    ant      i          File    build- java       project Exception Swing        i        g          c            x        y        String     e       java           a          width jEdit        i        String     jEdit        name     buffer   length     log     Object         e          path Jetty        String   i          log          java     org      e          name    IOException    length     mortbay jHotdraw     x        y          draw         r        CH       ifa        point   Figure         java       i DrJava? String   assert- doc          File     cs       edu        rice    i              e          drjava Equals Protégé      slot     String     cls          Slot     i        frame      Cls     Collection     edu        Stanford Cocoon       String   org        apache       cocoon   i        getLogger  java    name           avalon     framework JavaCC? ostr     println    i            0 i        j          String  java           Vector     Options jUnit        String   e          GridBag?- test     Test     i          junit   expected       result     message Costraints

and from "CSteg: Talking in C code"

              Table 1: Frequency of C tokens in cryptographic software.
                   Token type                   Appearance in %
                   Punctuator                               51.59
                   Identifier                               30.02
                   Numerical literal                        11.63
                   Reserved word                             4.77
                   String literal                            1.29
                   Preprocessor directive                     0.7Measures have been made with tools taken from (?). Comments have n ounted. Frequency distribution of C tokens gathered in our tests is descr le 1. Table 2: Freq. of punctuator tokens in analyzed software. Token      Frequency     Token      Frequency , 21.52 -> 2.05 ; 13.21 . 1.82 ( 12 * 1.73 ) 12

1.34

                   =                5.41    #                1.19
                   ]                  4.8   v++              1.11
                   [                  4.8   +                   1
                   {                2.21    *v               0.92
                   }                2.21    Other           11.68Most used punctuator tokens are described in Table 2. Reserved words fre ore homogeneous (Table 3). We have found that inside each group of possible tokens (punctuators and r ds) there are only a few tokens which are commonly used. The rest of the Table 3: Freq. of reserved words in cryptographic software. Word           Frequency      Word       Frequency if                  14.84 static           2.93 int                 13.79 register         2.84 unsigned             9.25 case             2.83 char                 8.84 while            2.60 for                  8.30 break            2.54 void                 5.85 sizeof           1.54 else                 5.09 extern           1.21 return               5.02 short            1.14 long                 3.74 struct           0.98 const                3.49 Other            3.15 6

---

todo, read http://stackoverflow.com/questions/tagged/language-design

newtype vs data with a single strict field: newtype is just type coercion, takes no time at runtime. how to do in oot? compiler that recognizes when constant fields are only referred to in types?

"new" constructor to construct pattern with constants in pattern? or just constructor?

all caps are keywords (global symbols)

how to simplify things like this (Java Android): ((AlarmManager?)context.getSystemService(Context.ALARM_SERVICE)).cancel(pendingIntentAlarm);

dependent types? i.e. context.getSystemService(Context.ALARM_SERVICE) returns something of type AlarmManager?