---
handy list of symbols convenient for freq usage:
unshifted, double unshifted, shifted, double shifted
`-=()\;',./ `` -- == (( ))
;; ,, .. ~!@#$%^&*[]_+{}
:"<>? ~~ !! @@ ## $$ %% ^^ && [[?]] __ ++ [[image: ?]] | :: "" 1 ?? |
i am wondering which of these are hard to type on non-US keyboards. the second post here http://www.cpptalk.net/5-vt10808.html?postdays=0&postorder=asc&start=60 opines that it would have been better if "I don't know. Had the problem been addressed from the start, if for example, Kernighan and Richie had refused to use any character which wasn't in the invariant part of ISO 646, I think it would have been a good thing. I've had to develop C on terminal which only supported ISO 646-DE. ". A quoted comment on that page also gave some examples of common characters which are hard to type in italy: " I've to admit that it's difficult to find PCs in italy with an US keyboard; looks like italians are not considered as potential programmers (it's hard to type "{") or internet citizens, for that matter (it's hard to type "@" or "~" too, with no standard for it). "
So maybe i should look at ISO 646? according do http://en.wikipedia.org/wiki/ISO/IEC_646 , there is the invariant subset, but there is also T.61, which gives you more punctuation, but leaves out {,~, which the italian guy found hard (but T.61 has @; but i've gotta belive that @ at least will be changing in italy soon tho! that post was from 2004 btw). the punctuation still not in T.61 is: \ ^ ` {} ~
the ones in T.61 but not INV are #$@[]
C deals with this with http://en.wikipedia.org/wiki/C_Trigraph
http://stackoverflow.com/questions/1234582/purpose-of-trigraph-sequences-in-c :
"It may happen that some terminals and/or virtualization doesn't let you access easily to some characters. In my experience the main offender is the tilde. – Francesco Nov 3 at 19:24"
see also http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2910.pdf , although it mostly talks about backwards-compatibility and doesn't give much info useful for someone designing a new language
http://www.wikicreole.org/wiki/Talk.EscapeCharacterProposal says that tilde is difficult on italian and german keyboards
i searched some more but didn't find much else. i guess i'll assume that mainly ~ is the problem. mb curly braces, too.
backslash isn't very common so that just prevents me from treating it like an easily-typed unshifted character.
as for the ones in T.61, the only one of those that i expect to be real common is []. but i cant very well leave out both [] and {}.
Exploring Regularity in Source Code: Software Science and Zipf's Law Hongyu Zhang
lists the most common tokens and identifiers in some java-related situations:
Table 4. Top twelve most common tokens Rank 1 2 3 4 5 6 7 8 9 10 11 12 Jena () . ; , {} = public new return if + String Tomcat () . ; {} = , public if String null + return Ant () . ; {} = , public String if new + void Swing () ; . , {} = if int public return null 0 jEdit () ; . , = {} if int return public new i Jetty () ; . = {} , public if String null return import jHotdraw () . ; {} , = public void int return new if DrJava () . ; , {} = public new void String return + Protégé () ; . , {} = public return slot void private String Cocoon . () ; {} , = this String import if org null JavaCC () . ; ostr println = , {} + if i [] jUnit () ; . , {} = public new void return String 0 Table 5. Top ten most common identifiers Rank 1 2 3 4 5 6 7 8 9 10Jena String i jena om hp hp1 n m node resource Tomcat String i org apache name log java javax request append Ant String org apache tools ant i File build- java project Exception Swing i g c x y String e java a width jEdit i String jEdit name buffer length log Object e path Jetty String i log java org e name IOException length mortbay jHotdraw x y draw r CH ifa point Figure java i DrJava? String assert- doc File cs edu rice i e drjava Equals Protégé slot String cls Slot i frame Cls Collection edu Stanford Cocoon String org apache cocoon i getLogger java name avalon framework JavaCC? ostr println i 0 i j String java Vector Options jUnit String e GridBag?- test Test i junit expected result message Costraints
and from "CSteg: Talking in C code"
Table 1: Frequency of C tokens in cryptographic software. Token type Appearance in % Punctuator 51.59 Identifier 30.02 Numerical literal 11.63 Reserved word 4.77 String literal 1.29 Preprocessor directive 0.7Measures have been made with tools taken from (?). Comments have n ounted. Frequency distribution of C tokens gathered in our tests is descr le 1. Table 2: Freq. of punctuator tokens in analyzed software. Token Frequency Token Frequency , 21.52 -> 2.05 ; 13.21 . 1.82 ( 12 * 1.73 ) 12
1.34 |
= 5.41 # 1.19 ] 4.8 v++ 1.11 [ 4.8 + 1 { 2.21 *v 0.92 } 2.21 Other 11.68Most used punctuator tokens are described in Table 2. Reserved words fre ore homogeneous (Table 3). We have found that inside each group of possible tokens (punctuators and r ds) there are only a few tokens which are commonly used. The rest of the Table 3: Freq. of reserved words in cryptographic software. Word Frequency Word Frequency if 14.84 static 2.93 int 13.79 register 2.84 unsigned 9.25 case 2.83 char 8.84 while 2.60 for 8.30 break 2.54 void 5.85 sizeof 1.54 else 5.09 extern 1.21 return 5.02 short 1.14 long 3.74 struct 0.98 const 3.49 Other 3.15 6
---
todo, read http://stackoverflow.com/questions/tagged/language-design
newtype vs data with a single strict field: newtype is just type coercion, takes no time at runtime. how to do in oot? compiler that recognizes when constant fields are only referred to in types?
"new" constructor to construct pattern with constants in pattern? or just constructor?
all caps are keywords (global symbols)
how to simplify things like this (Java Android): ((AlarmManager?)context.getSystemService(Context.ALARM_SERVICE)).cancel(pendingIntentAlarm);
dependent types? i.e. context.getSystemService(Context.ALARM_SERVICE) returns something of type AlarmManager??
in haskell, = is the only symbol needed to defn a fn w/ args:
f x y = exp
in oot currently, u need : also:
f = x y : exp (or should it be f = y x : exp ?)
should we make it like in haskell, where multiple things on the lhs signifies a fn w/ inputs? or should we save that form for computed lhs's i.e. in assignments? also note that we want the fn name to be on the very left, b/c text editors left justify, as noted earlier. so
f x y = exp is a fn? or
y x f = exp
means "assign exp to the result of y x f"?
and what about graph assignment?
so far i think we should keep it as is
to avoid typing the :, we could make it so that either all of the variables have to be indented, or none of them, and that if there is a single thing at the end indented more than the vars, then there is an implicit colon in between the vars and it. so both of these are equiv to "f = x y : exp":
f = x y exp
f = x y exp
mb the general rule is stated: "after there has been at least one thing after the equals, if there is another thing indented more, then put a : in between them". of course, if there is no other thing indented more, then there is no : at all.
the following is illegal:
f = x exp y
hmm, we should probably turn that around actually, since we dont want the fn body, which is bigger, to be the thing which is indented. so
f = x y exp
is the way to go; and
f = x exp y
or
f = x exp
are both illegal (after something with a lesser indent, i.e. the body, you cannot have something of a greater indent, i.e. a variable
do we want to replace multiplication and exponentiation with something like knuth up arrow notation or the hyperoperator? seem like it would annoy ppl. but so elegant!
also, as noted, could make "-" prefix denote inverse by convention, and have -+ for subtraction, -* for division.
lazy or strict patterns?
strict patterns let you define case-statements in an ad-hoc way for ad-hoc polymorphism. lazy patterns seem more natural, but arent conditional
haskell "pattern bindings" are like "graph assignment". gentle intro points out the need for some laziness there:
" fib@(1:tfib) = 1 : 1 : [ a+b
(a,b) <- zip fib tfib ] |
This version of fib has the (small) advantage of not using tail on the right-hand side, since it is available in "destructured" form on the left-hand side as tfib.
[This kind of equation is called a pattern binding because it is a top-level equation in which the entire left-hand side is a pattern; i.e. both fib and tfib become bound within the scope of the declaration.]
Now, using the same reasoning as earlier, we should be led to believe that this program will not generate any output. Curiously, however, it does, and the reason is simple: in Haskell, pattern bindings are assumed to have an implicit ~ in front of them, reflecting the most common behavior expected of pattern bindings, and avoiding some anomalous situations which are beyond the scope of this tutorial. Thus we see that lazy patterns play an important role in Haskell, if only implicitly."
thought about typing syntax from an attempt to write down the abstract data structures used in a todo list program i saw:
activeness = ['active 'inactive]
projectp.actions.# is action actions are in projects categoryp.projects.# is project projects are in categories
projectp.name is str categoryp.name is str projectp.status is activeness categoryp.status is activeness
project = projectp proto category = categoryp proto
is single-line comment
convention: capitalized identifier means the set of things that fit the prototype assigned to the corresponding lowercase identifier
convention: when you say "x is a", this means tell the type system to prove that "x in A"
Activeness = ['active 'inactive] could also have said $'[active, inactive]
project.actions.# is action actions are in projects category.projects.# is project projects are in categories
project.name :: str category.name :: str project.status :: activeness category.status :: activeness
old:
convention: "is" means "isa", and is like : in haskell (mb should be ::)
project.name is str category.name is str project.status is activeness category.status is activeness
want quick way to represent one-to-many relations; equiv to these type statements :
1-to-many relation b/t projects and actions project.actions.# is action action.project is project
1-to-many relation b/t categories and projects category.projects.# is project project.category is category
but also the assumption that:
& a in p.action if hmmm dont i mean "implies" instead of "if"? action.project == p action.project == p if hmmm dont i mean "implies" instead of "if"? a in p.action
hmm, rewrite:
& (a in p.action)- or action.project == p (action.project == p)- or a in p.action
heck, i guess that stuff might be common in assertions. let's have "implies" and "biconditional" logical operators, "->" and "<->" (note: see if this conflicts whith whatever notation we come up with for special uses of '-'; i dont think it does b/c - is only negation at the end):
a in p.action <-> (action.project == p)
anyhow, so need a notation for that, and for 1-1 and many-many. how about
many-1 project.actions action.project 1-1 wife.husband husband.wife many-many fan.idols celebrity.fans
and also variants to indicate if the corresponding sets can be empty:
many-1 project.actions action.project 0 < #project.actions (action.project is a single value of type "project") many-1' project.actions action.project (action.project is a single value of type "project") many'-1' project.actions action.project (action.project is a single value of type "project'") (note: #x = x len)
i guess these are shortcuts for graph pattern assertions augmented with some assertions using "in" (if those aren't allowed anyway; i guess they should be; dude so now graph patterns express FOL and set theory?!? seems too expressive):
many-1 project.actions action.project
-->
project.actions.# is action 0 < #project.actions action.project is project !a in !p.action <-> action.project == !p
many-1' project.actions action.project
-->
project.actions.# is action action.project is project !a in !p.action <-> action.project == !p
many'-1' project.actions action.project
-->
project.actions.# is action action.project' is project' two conflicting uses of ' syntax -- if types are values, then is project' a value which is of type "set of projects, unioned with null", or of type "either null, or a set" -- TODO !a in !p.action <-> action.project == !p which, because action.project is used, not action.project, actually compiles to action.project' -> !a in !p.action <-> action.project == !p
so these many-1 guys are just macros, is that it?
syntax to add: postfix ' on types for nullable type postfix ' on values for a value of a nullable type
--
note inspired by the above: are assertions and predicates the same? that is, can the above be interpreted as an assertion? no, something is a predicate if its the last nonindented thing in a definition, but an assertion if it's not the last. it's easy enuf to distinguish booleans being returned from boolean assertions; the last nonindented line, and the lines below it, are "real", the ones above are assertions. of course, an assertion can "call" a predicate, so the same subroutine could be used to calcuate boolean return values and to calculate an assertion's bool, e.g.
greaterthan5 = x : x > 5 plus6 = x : x 6 +
or in point-free notation: greaterthan5 = > 5 plus6 = how to assert something about the result of the fn? how to indicate variables? some alternatives: x plus6 > 5 ?x plus6 > 5 won't this cause a constraint check? or should we disallow constraint satisfaction within assertions? the return value 6 +
or.. plus6 = the return value 6 + ! ?x plus6 > 5 seems redundant; from the use of plus6, we KNOW it's a postcondition
main = 8 greaterthan5 seq 8 plus6 pr
but then how to do assertions within sequences? or assertions that hold during or after the seq? hmmm mb should indicate assertions somehow afterall? ! on every assertion? or only within sequences? or "ass" or "asr" or something? i guess ! would be easiest. at the beginning of the line. optional outside seq?
so far, i think the best option is: !! at beginning of line in sequences, optional without. assertions before seqs mean conditions that hold eternally/persistently. assertions themselves can't have sequences (i.e. they cant do logging) -- but they can do nondeterminism and input i.e. they are not ref trans. ?-variables within assertions don't mean "solve this constraint", they mean, "if you should happen to to apply this fn enuf times to bind these variables, then this assertion must hold" (i.e. assertions are lazy). that is, within normal program logic, ?-variables are implicitly existentially quantified, i.e. ?x f > 5 means "find an x such that f(x) is greater than 5, and set ?x to that", but in an assertion it means "for every value of x, f(x) must be greater than 5" no wait, i was using assertion syntax to write constraints, huh. so how to write lazy assertions, i.e. "?x plus6 > 5"? or should we use syntax to disambiguate constraints and assertions? mb "?x" is existentially quantified x and "!x" is universally quantified x? so
plus6 = !x plus6 > 5 6 +
means "for every x, plus6(x) must be > 5", and
plus6ButEven? = x : !x plus6ButEven? 2 mod == 0 evener in [0 1] x plus6 evener +
but this could be rewritten
plus6ButEven? = x : res 2 mod == 0 evener in [0 1] res = x plus6 evener + res
mb we could provide "res" as a keyword so it would look like:
plus6ButEven?