Difference between revision 14 and current revision
No diff available.---
one thing to do is to look at frequent words and patterns in existing code corpora in order to see what is common, so that we can consider thinking about these things as fundamental, and also, more pedestrianly, we can optimize for making them easy to read and write in oot:
text mining on Java source code. Most frequent lexemes: " The most commonly occurring word is the + operator (305,685) followed by the scoping block (295,726) and the = operator (124,813). If we exclude operators and scoping blocks from our analysis, the most frequent words are public (124,399), if (119,787), and int (108,709). The most common identi er (the programming language equivalent of a lexical word in natural language as discussed in Section 3.2), is String . It is the ninth most frequently occurring word overall with 71,504 occurrences. This pseudo-primitive type in Java is a special case of a non-primitive that has nearly achieved primitive status in the language and may well do so in either a future version of Java or a derivative language it spawns. The next three most frequent lexical words are length (19,312), Object (18,506), and IOException (11,322). " -- http://flosshub.org/sites/flosshub.org/files/21st-delorey.pdf
" Top Idioms
Figure 6 shows the top idioms mined in the Library data set, ranked by the number of files in the test sets where each idiom has appeared in. The reader will observe their immediate usefulness. Some idioms capture how to retrieve or instantiate an object. For example, in Figure 6, the idiom 6a captures the instantiation of a message channel in RabbitMQ?, 6q retrieves a handle for the Hadoop file system, 6e builds a SearchSourceBuilder? in Elasticsearch and 6l retrieves a URL using JSoup. Other idioms capture important transactional properties of code: idiom 6h demonstrates proper use of the memory-hungry RevWalk? object in JGit and 6i is a transaction idiom in Neo4J. Other idioms capture common error handling, such as 6d for Neo4J and 6p for a Hibernate transaction. Finally, some idioms capture common operations, such as closing a connection in Netty (6m), traversing through the database nodes (6n), visiting all AST nodes in a JavaScript? file in Rhino (6k) and computing the distance between two locations (6g) in Android. The reader may observe that these idioms provide a meaningful set of coding patterns for each library, capturing semantically consistent actions that a developer is likely to need when using these libraries. In Figure 7 we present a small set of general Java idioms mined across all data sets by Haggis . These idioms represent frequently used patterns that could be included by default in tools such as Eclipse’s SnipMatch? [ 43 ] and IntelliJ’s? live templates [ 23 ]. These include idioms for defining constants (Figure 7c), creating loggers (Figure 7b) and iterating through an iterable (Figure 7a).
Figure 6: Top cross-project idioms for L ibrary projects (Figure 4). Here we include idioms that appear in the test set files. We rank them by the number of distinct files they appear in and restrict into presenting idioms that contain at least one library-specific ( i.e. API-specific) identifier. The special notation $(TypeName?) denotes the presence of a variable whose name is undefined. $BODY$ denotes a user-defined code block of one or more statements, $name a freely defined (variable) name, $methodInvoc a single method invocation statement and $ifstatement a single if statement. All the idioms have been automatically identified by Haggis
channel=connection. createChannel();
Elements $name=$(Element). select($StringLit?);
Transaction tx=ConnectionFactory?. getDatabase().beginTx();
catch (Exception e){ $(Transaction).failure(); }
SearchSourceBuilder? builder= getQueryTranslator().build( $(ContentIndexQuery?));
LocationManager? $name = (LocationManager?)getSystemService( Context.LOCATION_SERVICE);
Location.distanceBetween( $(Location).getLatitude(), $(Location).getLongitude(), $...);
try { $BODY$ } finally { $(RevWalk?).release(); }
try { Node $name=$methodInvoc(); $BODY$ } finally { $(Transaction).finish(); }
ConnectionFactory? factory = new ConnectionFactory?(); $methodInvoc(); Connection connection = factory.newConnection();
while ($(ModelNode?) != null ){ if ($(ModelNode?) == limit) break ; $ifstatement $(ModelNode?)=$(ModelNode?) .getParentModelNode(); }
Document doc=Jsoup.connect(URL). userAgent("Mozilla"). header("Accept","text/html"). get();
if ($(Connection) != null ){ try { $(Connection).close(); } catch (Exception ignore){} }
Traverser traverser
for (Node $name : traverser){ $BODY$ }
Toast.makeText( this , $stringLit,Toast.LENGTH_SHORT) .show()
try { Session session
.currentSession(); $BODY$ } catch (HibernateException? e){ throw new DaoException?(e); }
FileSystem? $name
$(Path).toUri(),conf);
(token=$(XContentParser?) .nextToken()) != XContentParser? .Token.END_OBJECT
Figure 7: Sample language-specific idioms. $StringLit? denotes a user-defined string literal, $name a (variable) name, $methodInvoc a method invocation statement, $ifstatement an if statement and $BODY$ a code block.
(a) Iterate through the elements of an Iterator: (Iterator iter=$methodInvoc; iter.hasNext(); ) {$BODY$}
(b) Creating a logger for a class:
private final static Log $name= LogFactory?.getLog($type. class );
(c) Defining a constant String:
public static final String $name = $StringLit?;
(d) Looping through lines from a BufferedReader?:
while (($(String) = $(BufferedReader?). readLine()) != null ) {$BODY$}
-- http://homepages.inf.ed.ac.uk/csutton/publications/idioms.pdf
One interesting observation is that 50% of Java methods are 3 lines or less. Manually inspecting these methods we find accessors (setters and getters) or empty methods (e.g. constructors).
-- http://homepages.inf.ed.ac.uk/csutton/publications/msr2013.pdf
Table 2: The attribute catalogue
Name Formal definition Returns void The return descriptor is V . No parameters The list of parameter descriptors is empty. Field reader GETFIELD or GETSTATIC instruction. Field writer PUTFIELD or PUTSTATIC instruction. Contains loop Jump instructions that allow for instructions to be executed more than once in the same method invocation. Creates object NEW instruction. Throws exception ATHROW instruction. Type manipulator INSTANCEOF or CHECKCAST instruction. Local assignment One of the STORE instructions (for instance, ISTORE ). Same name call Calls a method of the same name.
The name get is interesting because it is by far the most common one; nearly a third of all Java methods in the corpus are get-methods.
Lexicon Entries.
ACCEPT. Methods named accept very seldom read state. Furthermore, theyrarely throw exceptions, call methods of the same name, create objects, manipulate state, use local variables, have no parameters, perform type- checking or contain loops. The name accept has a precise use. A similar name is visit . Generalisations of accept are handle and initialize . Somewhat related names are set , end , is and insert .
ACTION. Methods named action never call methods of the same name. Further- more, they very often read state. Finally, they often return void, and rarely throw exceptions, have no parameters or contain loops. The name action has a precise use. Similar names are remove and add.
ADD. Among the most common method names. Methods named add often read state. Similar names are remove and action .
CHECK. Methods named check very often throw exceptions. Furthermore, they often create objects and contain loops, and rarely call methods of the same name. Unfortunately, check is an imprecise name for a method.
CLEAR. Methods named clear very often have no parameters. Furthermore, they often return void, call methods of the same name and manipulate state, and rarely create objects, use local variables or perform type-checking. A generalisation of clear is reset . A somewhat related name is close .
CLOSE. Methods named close often return void, call methods of the same name, manipulate state, read state and have no parameters, and rarely create objects or perform type-checking. A generalisation of close is validate . A somewhat related name is clear .
CREATE. Among the most common method names. Methods named create very often create objects. Furthermore, they rarely call methods of the same name, read state or contain loops.
DO. Methods named do often throw exceptions and perform type-checking, and rarely call methods of the same name. Unfortunately, do is an imprecise name for a method.
DUMP. Methods named dump never throw exceptions. Furthermore, they very often create objects and use local variables, and very seldom read state. Finally, they often call methods of the same name and contain loops, and rarely manipulate state. The name dump has a precise use.
END. Methods named end often return void, and rarely create objects, use local variables, read state or contain loops. Generalisations of end are handle and initialize . A specialisation of end is insert . Somewhat related names are accept , set , visit and write .
EQUALS. Methods named equals never return void, throw exceptions, create objects, manipulate state or have no parameters. Furthermore, they very often call methods of the same name and perform type-checking. Finally, they often use local variables and read state. The name equals has a precise use.
FIND. Methods named find very often use local variables and contain loops. Furthermore, they often perform type-checking, and rarely return void.
GENERATE. Methods named generate often create objects, use local variables and contain loops, and rarely call methods of the same name. Unfortunately, generate is an imprecise name for a method.
GET. The most common method name. Methods named get often read state and have no parameters, and rarely return void, call methods of the same name, manipulate state, use local variables or contain loops. A similar name is has . Specialisations of get are is and size . A somewhat related name is hash .
HANDLE. Methods named handle often read state, and rarely call methods of the same name. A similar name is initialize . Specialisations of handle are accept , set , visit , end and insert .
HAS. Methods named has often have no parameters, and rarely return void, throw exceptions, create objects, manipulate state, use local variables or perform type-checking. The name has has a precise use. A similar name is get . Specialisations of has are is and size . A somewhat related name is hash
HASH. Methods named hash always have no parameters, and never return void, throw exceptions, create objects or perform type-checking. Furthermore, they very often call methods of the same name. Finally, they often read state, and rarely manipulate state or use local variables. The name hash has a precise use. Somewhat related names are has , is , get and size .
INIT. Methods named init very often manipulate state. Furthermore, they often return void, create objects and have no parameters, and rarely call methods of the same name.
INITIALIZE. Methods named initialize often return void and manipulate state, and rarely call methods of the same name or read state. A similar name is handle . Specialisations of initialize are accept , set , visit , end and insert .
INSERT. Methods named insert often throw exceptions, and rarely create objects, read state, have no parameters or contain loops. Generalisations of insert are handle , end and initialize . Somewhat related names are accept , set , visit and write .
IS. The third most common method name. Methods named is often have no parameters, and rarely return void, throw exceptions, call methods of the same name, create objects, manipulate state, use local variables, perform type- checking or contain loops. The name is has a precise use. Generalisations of is are has and get . Somewhat related names are accept , visit , hash and size .
LOAD. Methods named load very often use local variables. Furthermore, they often throw exceptions, create objects, manipulate state, perform type-checking and contain loops. Unfortunately, load is an imprecise name for a method.
MAKE. Methods named make very often create objects. Furthermore, they rarely return void, throw exceptions, call methods of the same name or contain loops.
NEW. Methods named new never contain loops. Furthermore, they very seldom use local variables. Finally, they often call methods of the same name and create objects, and rarely return void, manipulate state or read state.
NEXT. Methods named next very often manipulate state and read state. Furthermore, they often throw exceptions and have no parameters, and rarely return void.
PARSE. Among the most common method names. Methods named parse very often call methods of the same name, read state and perform type-checking. Furthermore, they rarely use local variables. The name parse has a precise use.
PRINT. Methods named print often call methods of the same name and contain loops, and rarely throw exceptions or manipulate state.
PROCESS. Methods named process very often use local variables and contain loops. Furthermore, they often throw exceptions, create objects, read state and perform type-checking, and rarely call methods of the same name. Unfortunately, process is an imprecise name for a method.
READ. Methods named read often throw exceptions, call methods of the same name, create objects, manipulate state, use local variables and contain loops. Unfortunately, read is an imprecise name for a method.
REMOVE. Among the most common method names. Methods named remove often throw exceptions. Similar names are add and action .
RESET. Methods named reset very often manipulate state. Furthermore, they often return void and have no parameters, and rarely create objects, use local variables or perform type-checking. A specialisation of reset is clear .
RUN. Among the most common method names. Methods named run very often read state. Furthermore, they often have no parameters, and rarely call methods of the same name.
SET. The second most common method name. Methods named set very often manipulate state, and very seldom use local variables or read state. Furthermore, they often return void, and rarely call methods of the same name, create objects, have no parameters, perform type-checking or contain loops. The name set has a precise use. Generalisations of set are handle and initialize . Somewhat related names are accept , visit , end and insert .
SIZE. Methods named size always have no parameters, and never return void, create objects, manipulate state, perform type-checking or contain loops. Furthermore, they very seldom use local variables. Finally, they rarely read state. The name size has a precise use. Generalisations of size are has and get . Somewhat related names are is and hash .
START. Methods named start often return void, manipulate state and read state.
TO. Among the most common method names. Methods named to very often call methods of the same name and create objects. Furthermore, they often have no parameters, and rarely return void, throw exceptions, manipulate state or perform type-checking.
UPDATE. Methods named update often return void and read state.
VALIDATE. Methods named validate very often throw exceptions. Furthermore, they often create objects and have no parameters, and rarely manipulate state. A specialisation of validate is close .
VISIT. Methods named visit rarely throw exceptions, use local variables, read state or have no parameters. A similar name is accept . Generalisations of visit are handle and initialize . Somewhat related names are set , end , is and insert .
WRITE. Among the most common method names. Methods named write often return void and call methods of the same name, and rarely have no parameters. Somewhat related names are end and insert .
-- The Programmer’s Lexicon, Volume I: The Verbs
---
(at least) 3 ways to loop: while, jump, hof search function (but is that same as while?). also colllection-oriented looping is not universal for control, but does most of it. also for long-lasting things where you want to just 'loop forever and respond to events until i decide to terminate', instead of having that loop in your program, you could register with a manager that calls you upon each iteration (but the manager still has to loop) (is this really that different from a while loop?).
---
examples of things that cannot directly be 'inlined' in some languages:
---
fleshing out the idea of a general, fundamental Search operator (generalization of a fixpoint operator) a little:
the search operator takes parameters in two stages, that is, it is a higher-order function that takes two arguments, each of which are 'packages' of functions and parameters (or it could just take a bunch of arguments, with no'packaging'). First, it takes a group of parameters that specify a search strategy. This includes functions that say how to initialize the search's internal state, how to choose the next search position given some internal state (perhaps the previous search position and its score), and when to terminate the search. For example, by giving different functions for these inputs, you can create a depth-first search, a breadth-first search, an A* search, a fixpoint operator (terminate upon idempotency of "next search position"), or a search that quits when it hits a plateu (a near-fixpoint) even if the objective function is still moving some tiny amount.
Second, it takes another group of parameters to choose the objective function, and to set the items to be searched through, and to choose the initial location of the search.
note that this is equivalent to an OOP system where there is an abstract base class (the Search operator), and concrete subclasses (breadth-first search, depth-first search, A*, fixpoint search, a search that quits when it hits a plateua). The class defines/satisfies an interface that has one method (do_search or the like).
---
the example of a general fundamental Search operator shows us that, when an OOP base class's purpose is just to be Called through one primary method call, it corresponds to a higher-order function.
---
was talking about goal-oriented programming with my friend DR. I explained my idea that you could specify a goal in terms of preconditions and postconditions (eg defining a sort) and the compiler could find a subroutine to satisfy them. And then you could add time and space complexity requirements, eg "cannot require more than O(n^2) space". And then you could add time and space complexity hints, eg "i am going to read this data structure a lot but rarely write to it". DR pointed out that means giving priorities. I said the trouble with priorities is that in a formal mathematical sense if you say "maximize x at the expense of y" you might end up with some solution that is EXTREMELY costly in y for a tiny gain in x, which is usually not what humans mean when they say to another human "prioritize a over b"; DR noted this doesn't mean priority is not useful here, it just means that we are looking to explore this fuzzier definition of priority. Also i mentioned Alan Key's phrase policy-oriented programming (or something like that). DR then pointed out that so far in this convo we have three concepts to think about for goal-oriented programming:
---
another interesting goal for a simple language is to think of what would be desired for a post-apocalyptic scenario. I think this is unlikely (even conditional upon an apocalypse, which is already unlikely), but imagine if a few individuals have working computers but only a few of them, and so much has been lost that no one has the complete of the toolchain, eg. whatever is required to cross-compile gcc onto a new architecture; or (even less likely) imagine if gcc is available, but not the gcc source code; and (even less likely) imagine that there is no comprehensive C specification or even documentation around; in such a situation, programming language implementations would have to be re-implemented by ordinary programmers (not compiler specialists) based on what they remember about the language (they can maybe refer to some code samples from a few personal projects they happened to have on their personal machine at the time of the apocalypse). Assume that communication is initially spotty enough that there are multiple re-implementors who do not know of each other until much later. Most likely these different re-implementors would misremember different things about the language definition and we'd get a family of mutually incompatible, C-like new languages.
Contrast with e.g. BASIC; i bet everyone would remember BASIC pretty well and the result would be a family of real BASIC dialects, not just vaguely related new languages.
If you substitute 'oot' for 'C' here, we would want Oot to be easier to remember than C; more like BASIC.
As noted above, i think this is unlikely to happen in the real world, even if there were an apocalypse, but it's a good thought experiment to push the language to be 'simple' and to think of whether a language 'fits in your head'.
---
http://stackoverflow.com/questions/10858787/what-are-the-uses-for-tags-in-go
---
http://www.geeksforgeeks.org/write-a-function-to-reverse-the-nodes-of-a-linked-list/
---
assertion of fact (opposite: query fact (and then match pattern to multiassign results)) (you could assert an equation rather than a special value assignment, but you could assert a value assignment too) vs command vs assignment (which is like pronouns) and also (although mb not separate from the above) RESTful interactions like GET address, SET (PUT) document=value; CRUD; VERBs applied to NOUNs, possibly with other arguments too (eg the value being assigned to a document in PUT)
---
evaluation strategy relates to variable substitution, but also to time, as variable substitution is an analog of time (the sequence of computation) within the timeless realm of purity (referential transarency)
---
jcrites 2 days ago
The article is discussing documentation for the AWS Flow Framework specifically. The Flow Framework is a Java framework built on top of the SWF API, and it provides a completely different programming model than the SWF API.
The C# example being discussed, as well as the Java example at the end, are examples of using that SWF API directly. The SWF API is indeed simpler for trivial examples. Flow is a power tool that handles complex workflows better than any alternative I've seen, but the framework itself is complex and incurs cost to set up and use. The documentation could do a better job of explaining this, and of providing Java API examples.
The Flow framework provides something that's a mix of Java code and a domain-specific language for building SWF applications that's expressed as Java code. (For an analogy, consider EasyMock?