---
handy list of symbols convenient for freq usage:
unshifted, double unshifted, shifted, double shifted
`-=()\;',./ `` -- == (( ))
;; ,, .. ~!@#$%^&*[]_+{}
| :"<>? ~~ !! @@ ## $$ %% ^^ && [[?]] __ ++ [[image: ?]] | :: "" 1 ?? | |
i am wondering which of these are hard to type on non-US keyboards. the second post here http://www.cpptalk.net/5-vt10808.html?postdays=0&postorder=asc&start=60 opines that it would have been better if "I don't know. Had the problem been addressed from the start, if for example, Kernighan and Richie had refused to use any character which wasn't in the invariant part of ISO 646, I think it would have been a good thing. I've had to develop C on terminal which only supported ISO 646-DE. ". A quoted comment on that page also gave some examples of common characters which are hard to type in italy: " I've to admit that it's difficult to find PCs in italy with an US keyboard; looks like italians are not considered as potential programmers (it's hard to type "{") or internet citizens, for that matter (it's hard to type "@" or "~" too, with no standard for it). "
So maybe i should look at ISO 646? according do http://en.wikipedia.org/wiki/ISO/IEC_646 , there is the invariant subset, but there is also T.61, which gives you more punctuation, but leaves out {,~, which the italian guy found hard (but T.61 has @; but i've gotta belive that @ at least will be changing in italy soon tho! that post was from 2004 btw). the punctuation still not in T.61 is: \ ^ ` {} ~
the ones in T.61 but not INV are #$@[]
C deals with this with http://en.wikipedia.org/wiki/C_Trigraph
http://stackoverflow.com/questions/1234582/purpose-of-trigraph-sequences-in-c :
"It may happen that some terminals and/or virtualization doesn't let you access easily to some characters. In my experience the main offender is the tilde. – Francesco Nov 3 at 19:24"
see also http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2910.pdf , although it mostly talks about backwards-compatibility and doesn't give much info useful for someone designing a new language
http://www.wikicreole.org/wiki/Talk.EscapeCharacterProposal says that tilde is difficult on italian and german keyboards
i searched some more but didn't find much else. i guess i'll assume that mainly ~ is the problem. mb curly braces, too.
backslash isn't very common so that just prevents me from treating it like an easily-typed unshifted character.
as for the ones in T.61, the only one of those that i expect to be real common is []. but i cant very well leave out both [] and {}.
Exploring Regularity in Source Code: Software Science and Zipf's Law Hongyu Zhang
lists the most common tokens and identifiers in some java-related situations:
Table 4. Top twelve most common tokens
Rank 1 2 3 4 5 6 7 8 9 10 11 12
Jena () . ; , {} = public new return if + String
Tomcat () . ; {} = , public if String null + return
Ant () . ; {} = , public String if new + void
Swing () ; . , {} = if int public return null 0
jEdit () ; . , = {} if int return public new i
Jetty () ; . = {} , public if String null return import
jHotdraw () . ; {} , = public void int return new if
DrJava () . ; , {} = public new void String return +
Protégé () ; . , {} = public return slot void private String
Cocoon . () ; {} , = this String import if org null
JavaCC () . ; ostr println = , {} + if i []
jUnit () ; . , {} = public new void return String 0
Table 5. Top ten most common identifiers
Rank 1 2 3 4 5 6 7 8 9 10Jena String i jena om hp hp1 n m node resource Tomcat String i org apache name log java javax request append Ant String org apache tools ant i File build- java project Exception Swing i g c x y String e java a width jEdit i String jEdit name buffer length log Object e path Jetty String i log java org e name IOException length mortbay jHotdraw x y draw r CH ifa point Figure java i DrJava? String assert- doc File cs edu rice i e drjava Equals Protégé slot String cls Slot i frame Cls Collection edu Stanford Cocoon String org apache cocoon i getLogger java name avalon framework JavaCC? ostr println i 0 i j String java Vector Options jUnit String e GridBag?- test Test i junit expected result message Costraintsand from "CSteg: Talking in C code"
Table 1: Frequency of C tokens in cryptographic software.
Token type Appearance in %
Punctuator 51.59
Identifier 30.02
Numerical literal 11.63
Reserved word 4.77
String literal 1.29
Preprocessor directive 0.7Measures have been made with tools taken from (?). Comments have n ounted. Frequency distribution of C tokens gathered in our tests is descr le 1. Table 2: Freq. of punctuator tokens in analyzed software. Token Frequency Token Frequency , 21.52 -> 2.05 ; 13.21 . 1.82 ( 12 * 1.73 ) 12| 1.34 |
= 5.41 # 1.19
] 4.8 v++ 1.11
[ 4.8 + 1
{ 2.21 *v 0.92
} 2.21 Other 11.68Most used punctuator tokens are described in Table 2. Reserved words fre ore homogeneous (Table 3). We have found that inside each group of possible tokens (punctuators and r ds) there are only a few tokens which are commonly used. The rest of the Table 3: Freq. of reserved words in cryptographic software. Word Frequency Word Frequency if 14.84 static 2.93 int 13.79 register 2.84 unsigned 9.25 case 2.83 char 8.84 while 2.60 for 8.30 break 2.54 void 5.85 sizeof 1.54 else 5.09 extern 1.21 return 5.02 short 1.14 long 3.74 struct 0.98 const 3.49 Other 3.15 6---
todo, read http://stackoverflow.com/questions/tagged/language-design
newtype vs data with a single strict field: newtype is just type coercion, takes no time at runtime. how to do in jasper? compiler that recognizes when constant fields are only referred to in types?
"new" constructor to construct pattern with constants in pattern? or just constructor?
all caps are keywords (global symbols)
how to simplify things like this (Java Android): ((AlarmManager?)context.getSystemService(Context.ALARM_SERVICE)).cancel(pendingIntentAlarm);
dependent types? i.e. context.getSystemService(Context.ALARM_SERVICE) returns something of type AlarmManager?