---
one thing to do is to look at frequent words and patterns in existing code corpora in order to see what is common, so that we can consider thinking about these things as fundamental, and also, more pedestrianly, we can optimize for making them easy to read and write in oot:
text mining on Java source code. Most frequent lexemes: " The most commonly occurring word is the + operator (305,685) followed by the scoping block (295,726) and the = operator (124,813). If we exclude operators and scoping blocks from our analysis, the most frequent words are public (124,399), if (119,787), and int (108,709). The most common identi er (the programming language equivalent of a lexical word in natural language as discussed in Section 3.2), is String . It is the ninth most frequently occurring word overall with 71,504 occurrences. This pseudo-primitive type in Java is a special case of a non-primitive that has nearly achieved primitive status in the language and may well do so in either a future version of Java or a derivative language it spawns. The next three most frequent lexical words are length (19,312), Object (18,506), and IOException (11,322). " -- http://flosshub.org/sites/flosshub.org/files/21st-delorey.pdf
" Top Idioms
Figure 6 shows the top idioms mined in the Library data set, ranked by the number of files in the test sets where each idiom has appeared in. The reader will observe their immediate usefulness. Some idioms capture how to retrieve or instantiate an object. For example, in Figure 6, the idiom 6a captures the instantiation of a message channel in RabbitMQ?, 6q retrieves a handle for the Hadoop file system, 6e builds a SearchSourceBuilder? in Elasticsearch and 6l retrieves a URL using JSoup. Other idioms capture important transactional properties of code: idiom 6h demonstrates proper use of the memory-hungry RevWalk? object in JGit and 6i is a transaction idiom in Neo4J. Other idioms capture common error handling, such as 6d for Neo4J and 6p for a Hibernate transaction. Finally, some idioms capture common operations, such as closing a connection in Netty (6m), traversing through the database nodes (6n), visiting all AST nodes in a JavaScript? file in Rhino (6k) and computing the distance between two locations (6g) in Android. The reader may observe that these idioms provide a meaningful set of coding patterns for each library, capturing semantically consistent actions that a developer is likely to need when using these libraries. In Figure 7 we present a small set of general Java idioms mined across all data sets by Haggis . These idioms represent frequently used patterns that could be included by default in tools such as Eclipse’s SnipMatch? [ 43 ] and IntelliJ’s? live templates [ 23 ]. These include idioms for defining constants (Figure 7c), creating loggers (Figure 7b) and iterating through an iterable (Figure 7a).
Figure 6: Top cross-project idioms for L ibrary projects (Figure 4). Here we include idioms that appear in the test set files. We rank them by the number of distinct files they appear in and restrict into presenting idioms that contain at least one library-specific ( i.e. API-specific) identifier. The special notation $(TypeName?) denotes the presence of a variable whose name is undefined. $BODY$ denotes a user-defined code block of one or more statements, $name a freely defined (variable) name, $methodInvoc a single method invocation statement and $ifstatement a single if statement. All the idioms have been automatically identified by Haggis
channel=connection. createChannel();
Elements $name=$(Element). select($StringLit?);
Transaction tx=ConnectionFactory?. getDatabase().beginTx();
catch (Exception e){ $(Transaction).failure(); }
SearchSourceBuilder? builder= getQueryTranslator().build( $(ContentIndexQuery?));
LocationManager? $name = (LocationManager?)getSystemService( Context.LOCATION_SERVICE);
Location.distanceBetween( $(Location).getLatitude(), $(Location).getLongitude(), $...);
try { $BODY$ } finally { $(RevWalk?).release(); }
try { Node $name=$methodInvoc(); $BODY$ } finally { $(Transaction).finish(); }
ConnectionFactory? factory = new ConnectionFactory?(); $methodInvoc(); Connection connection = factory.newConnection();
while ($(ModelNode?) != null ){ if ($(ModelNode?) == limit) break ; $ifstatement $(ModelNode?)=$(ModelNode?