Lua closures vs OOP object instances vs Haskell partially applied functions (vs clojure closures)



thinking about data constructors as a type-within-a-type:

yeah, we could abolish them and replace with typeclasses, and prohibit switches (case statements) on them, but then how would we implement destructuring binds?

this suggests three things:

1) when we say we want a language based around labeled graphs/dicts/trees instead of lists, perhaps a lot of what we mean is: we want a structural type system that supports labeled arcs and that supports destructuring bind based on these

2) if there is a structural type system which is fixed/special/privilaged (as opposed to ADTs), then this provides further support for the need for something like Views

3) but perhaps we also still want to be able to do stuff like: f(list): case list is NIL: do something case list is CONS(x): do something with x

  even if we have typeclasses (interface-like open data types) instead of ADTs with a closed (final) list of constructors for each type defined when the type is defined. How to do that? Do we allow the implementations of the typeclass to expose an interface to say that a given object 'acts like CONS(x) in case statements/destructuring binds, and to get x call a specific function on it...')
  if this is feasible it seems preferable to hardcoding a structural type system and allowing only that for destructuring bind
  hmm, yeah, i guess you can see the data constructors as constructors for a typeclass, that form part of its interface. then you could have a __which function for each objects that asks it which constructor it wants to be (or even __areYou that allows it to say True or False to each constructor)
    if you ue __which instead of __areYou then you can statically guarantee that each object is at least one of the allowed constructors at any time, which allows for 'closed' data types and probably exhaustive case statements.
  how do ADTs and destructuring bind and induction over a sequence mesh with dicts?
    maybe just like lists, in that we extract an iterator by pretending that dicts have two constructors, addItem and oldDict, and each dict is constructed by addItem(key, value, olddict), starting with emptyDict()?
       but if we have __which is that similar to imposing an ordering on the keys? it doesn't have to but it seems uncomfortably close..
           we don't have tail() as a constructor? i guess not, b/c we don't for lists



Date: Wed, 24 Sep 2014 18:40:30 -0700 Subject: [h-dev] Proposal: Best Practices for Modules From: Randall Leeds


The guiding document I favor at this point and from which I'm taking my cues is the "Best Practices for App Structure" guidelines [1] Google seems to be using internally.

These are the desired properties I have for our best practices:

1. Simple, familiar, memorable, guidelines for descriptive file names and hierarchies 2. DRY [2] 3. server-side module system ready 4. Bower friendly 5. Isolation of components 6. Smooth transition from in-tree to separated component repos as we grow 7. Less build maintenance 8. Simpler dependency graph for angular imports through component opacity.

These drove me to select the above referenced document. The example at the bottom is most comprehensive.

Steps I'd like to take:

1. Tear down /vendor and add /components

Our main application routing and configuration can live in a top-level still, but most of our directives, services, filters, etc should be broken into components and placed in /components. It's important that no .coffee files are in /components, but that everything has its own directory. The motivation is that by doing it this way there is a subtree that contains the changes to each component, making it much easier to break components out into their own repo when that inevitably becomes useful. It also means that we can treat this same directory as our bower components directory and use bower to fetch, install, and update our vendorized libs right alongside the ones in tree.

2. Adopt naming conventions for angular component files

I'm starting to warm up to and It makes it super easy to know where things are when you're looking at the files.

3. Work on our inter-component dependency granularity

This is easiest to see by example. Right now we have a directory scripts/helpers. Inside this is "form-helpers" and it defines the angular module "h.helpers.formHelpers".

That module name is fine if we were simply using it to obviate the need for a particular order of concatenation when exporting the "h.helpers" module by having it depend on each of its subcomponents. However, our other modules outside helpers should never refer to this, but only to "h.helpers" otherwise the public/private line for a module gets really blurry.

4. Prepare to move to a server-side module system and build step

Our assets.yaml is getting unwieldy. Even if we were doing our build in JS, as we've talked about previously, whatever system we use would have to list all these components anyway. Totally separate from the language issue, we can vastly improve the situation by using a module system. I'm extremely partial to browserify, so unless there's a strong objection for some reason, I'm going to eventually bring that into our stack.

This point relates to number 3, above. Having an angular module like "h.helpers" declare "h.helpers.formHelpers" as a dependency is quite redundant if its file also had "require('./form-helpers')". Using a module system here means we can declare the angular module at the top of the file using the 3-arity call signature of angular.module, then require() the rest of the files, each of which can append to the module using the 1-arity form.

No: angular.module('foo', 'foo.barDirective', [optional configure function]) angular.module('foo.barDirective', ['ui.bootstrap'], [optional configure function]) .directive('bar', -> ...)

Yes: angular.module('foo', ['ui.bootstrap'], [optional configure function]) require('./barDirective') angular.module('foo').directive('bar', -> ...)

Notice how there's no circular import because can use the retrieval, 1-arity form of angular.module since it was defined already by before barDirective was required and there's no need for a separate angular module for barDirective.

This also forces all configuration into a single configure callback for the angular module and lifts all the dependencies into its declaration ('ui.bootstrap' in this example).

The result is that our build system need only point to and the rest is handled by browserify and one can know all the dependencies of the foo module and see all its bootstrap configuration code by looking at one file,

5. Use bower for our vendor components.

Enough said.

6. Consistent namespaces, prefix injection names, and opaque components

No: angular.module('').service('foo') angular.module('h.auth.session').service('session') angular.module('h', ['h.helpers.formHelpers'])

Yes: angular.module('h').service('hFoo') angular.module('h.auth').service('hSession') angular.module('h', ['h.helpers'])

Three guidelines, one for each of the above lines:

The last of these three guidelines does not apply to situations where the namespace is actually split across separately distributed modules that relate to one another but do not necessarily have dependency relationships. For instance, we might have 'h.auth.persona' and 'h.auth.openid' which are separate components where 'auth' is used as a namespace only and there is no 'h.auth' component.



the Haskell GHC STG core language, as mostly described in and section 3 ('language') of the paper Simon Marlow and Simon Peyton Jones, How to make a fast curry: push/enter vs eval/apply, made an impact on me. This appears to be 'the roots of Haskell' and it's pretty simple.

i woke up today feeling like there is a thought buried in my mind regarding the guile VM, but i don't know what it is. Maybe it is just an obsessive goal of inventorying the remaining VM instructions. For reference, here is what my instruction looks like as of this time (i guess i should look over this and see if i have any thoughts):

Lexical Environment Instructions:

These instructions access and mutate the lexical environment of a compiled procedure—its free and bound variables.

Top-Level Environment Instructions:

These instructions access values in the top-level environment: bindings that were not lexically apparent at the time that the code in question was compiled. The location in which a toplevel binding is stored can be looked up once and cached for later. The binding itself may change over time, but its location will stay constant. Currently only toplevel references within procedures are cached, as only procedures have a place to cache them, in their object tables.

Procedure Call and Return Instructions: todo

Function Prologue Instructions:

Trampoline Instructions:

Branch Instructions:

Data Constructor Instructions:

Loading Instructions:

Dynamic Environment Instructions:

Miscellaneous Instructions:

Inlined Scheme Instructions:

Inlined Mathematical Instructions:

Inlined Bytevector Instructions:

thoughts: types of instructions are:

"inlined scheme instructions" catches my eye; i glanced at them yesterday but didnt read them; they are:

" Inlined Scheme Instructions

The Scheme compiler can recognize the application of standard Scheme procedures. It tries to inline these small operations to avoid the overhead of creating new stack frames.

Since most of these operations are historically implemented as C primitives, not inlining them would entail constantly calling out from the VM to the interpreter, which has some costs—registers must be saved, the interpreter has to dispatch, called procedures have to do much type checking, etc. It’s much more efficient to inline these operations in the virtual machine itself.

All of these instructions pop their arguments from the stack and push their results, and take no parameters from the instruction stream. Thus, unlike in the previous sections, these instruction definitions show stack parameters instead of parameters from the instruction stream. " Instruction: not x Instruction: not-not x: what is this? Instruction: eq? x y: " #t if x and y are the same object, except for numbers and characters...Numbers and characters are not equal to any other object, but the problem is they’re not necessarily eq? to themselves either. This is even so when the number comes directly from a variable...(let ((n (+ 2 3))) (eq? n n)) ⇒ *unspecified*...Generally eqv? below should be used when comparing numbers or characters...It’s worth noting that end-of-list (), #t, #f, a symbol of a given name, and a keyword of a given name, are unique objects. Instruction: not-eq? x y Instruction: null? Instruction: not-null? Instruction: eqv? x y: "Return #t if x and y are the same object, or for characters and numbers the same value....On objects except characters and numbers, eqv? is the same as eq? above, it’s true if x and y are the same object." Instruction: equal? x y: "Return #t if x and y are the same type, and their contents or value are equal. For a pair, string, vector, array or structure, equal? compares the contents, and does so using the same equal? recursively, so a deep structure can be traversed....For other objects, equal? compares as per eqv? above, which means characters and numbers are compared by type and value (and like eqv?, exact and inexact numbers are not equal?, even if their value is the same)." Instruction: pair? x y Instruction: list? x Instruction: set-car! pair x: "Stores value in the car field of pair." Instruction: set-cdr! pair x Instruction: cons x y Instruction: car x Instruction: cdr x Instruction: vector-ref x y Instruction: vector-set x n y Instruction: struct? x Instruction: struct-ref x n Instruction: struct-set x n v Instruction: struct-vtable x: i think more than just the methods: "The vtable is effectively the type of the structure...Every vtable has a field for the layout of their instances, a field for the procedure used to print its instances, and a field for the name of the vtable itself" Instruction: class-of x: "Return the GOOPS class of any Scheme value." Instruction: slot-ref struct n Instruction: slot-set struct n x

" Inlined implementations of their Scheme equivalents.

Note that caddr and friends compile to a series of car and cdr instructions. "

my reduced version of those:

later i also looked at Data Constructors, here is my reduced version:

also note:


Next: rnrs unicode, Previous: Library Usage, Up: R6RS Standard Libraries [Contents][Index] rnrs base

The (rnrs base (6)) library exports the procedures and syntactic forms described in the main section of the Report (see R6RS Base library in The Revised^6 Report on the Algorithmic Language Scheme). They are grouped below by the existing manual sections to which they correspond.

Scheme Procedure: boolean? obj Scheme Procedure: not x

    See Booleans, for documentation. 

Scheme Procedure: symbol? obj Scheme Procedure: symbol->string sym Scheme Procedure: string->symbol str

    See Symbol Primitives, for documentation. 

Scheme Procedure: char? obj Scheme Procedure: char=? Scheme Procedure: char<? Scheme Procedure: char>? Scheme Procedure: char<=? Scheme Procedure: char>=? Scheme Procedure: integer->char n Scheme Procedure: char->integer chr

    See Characters, for documentation. 

Scheme Procedure: list? x Scheme Procedure: null? x

    See List Predicates, for documentation. 

Scheme Procedure: pair? x Scheme Procedure: cons x y Scheme Procedure: car pair Scheme Procedure: cdr pair Scheme Procedure: caar pair Scheme Procedure: cadr pair Scheme Procedure: cdar pair Scheme Procedure: cddr pair Scheme Procedure: caaar pair Scheme Procedure: caadr pair Scheme Procedure: cadar pair Scheme Procedure: cdaar pair Scheme Procedure: caddr pair Scheme Procedure: cdadr pair Scheme Procedure: cddar pair Scheme Procedure: cdddr pair Scheme Procedure: caaaar pair Scheme Procedure: caaadr pair Scheme Procedure: caadar pair Scheme Procedure: cadaar pair Scheme Procedure: cdaaar pair Scheme Procedure: cddaar pair Scheme Procedure: cdadar pair Scheme Procedure: cdaadr pair Scheme Procedure: cadadr pair Scheme Procedure: caaddr pair Scheme Procedure: caddar pair Scheme Procedure: cadddr pair Scheme Procedure: cdaddr pair Scheme Procedure: cddadr pair Scheme Procedure: cdddar pair Scheme Procedure: cddddr pair

    See Pairs, for documentation. 

Scheme Procedure: number? obj

    See Numerical Tower, for documentation. 

Scheme Procedure: string? obj

    See String Predicates, for documentation. 

Scheme Procedure: procedure? obj

    See Procedure Properties, for documentation. 

Scheme Syntax: define name value Scheme Syntax: set! variable-name value

    See Definition, for documentation. 

Scheme Syntax: define-syntax keyword expression Scheme Syntax: let-syntax ((keyword transformer) …) exp1 exp2 … Scheme Syntax: letrec-syntax ((keyword transformer) …) exp1 exp2 …

    See Defining Macros, for documentation. 

Scheme Syntax: identifier-syntax exp

    See Identifier Macros, for documentation. 

Scheme Syntax: syntax-rules literals (pattern template) ...

    See Syntax Rules, for documentation. 

Scheme Syntax: lambda formals body

    See Lambda, for documentation. 

Scheme Syntax: let bindings body Scheme Syntax: let* bindings body Scheme Syntax: letrec bindings body Scheme Syntax: letrec* bindings body

    See Local Bindings, for documentation. 

Scheme Syntax: let-values bindings body Scheme Syntax: let*-values bindings body

    See SRFI-11, for documentation. 

Scheme Syntax: begin expr1 expr2 ...

    See begin, for documentation. 

Scheme Syntax: quote expr Scheme Syntax: quasiquote expr Scheme Syntax: unquote expr Scheme Syntax: unquote-splicing expr

    See Expression Syntax, for documentation. 

Scheme Syntax: if test consequence [alternate] Scheme Syntax: cond clause1 clause2 ... Scheme Syntax: case key clause1 clause2 ...

    See Conditionals, for documentation. 

Scheme Syntax: and expr ... Scheme Syntax: or expr ...

    See and or, for documentation. 

Scheme Procedure: eq? x y Scheme Procedure: eqv? x y Scheme Procedure: equal? x y Scheme Procedure: symbol=? symbol1 symbol2 ...

    See Equality, for documentation.
    symbol=? is identical to eq?. 

Scheme Procedure: complex? z

    See Complex Numbers, for documentation. 

Scheme Procedure: real-part z Scheme Procedure: imag-part z Scheme Procedure: make-rectangular real_part imaginary_part Scheme Procedure: make-polar x y Scheme Procedure: magnitude z Scheme Procedure: angle z

    See Complex, for documentation. 

Scheme Procedure: sqrt z Scheme Procedure: exp z Scheme Procedure: expt z1 z2 Scheme Procedure: log z Scheme Procedure: sin z Scheme Procedure: cos z Scheme Procedure: tan z Scheme Procedure: asin z Scheme Procedure: acos z Scheme Procedure: atan z

    See Scientific, for documentation. 

Scheme Procedure: real? x Scheme Procedure: rational? x Scheme Procedure: numerator x Scheme Procedure: denominator x Scheme Procedure: rationalize x eps

    See Reals and Rationals, for documentation. 

Scheme Procedure: exact? x Scheme Procedure: inexact? x Scheme Procedure: exact z Scheme Procedure: inexact z

    See Exactness, for documentation. The exact and inexact procedures are identical to the inexact->exact and exact->inexact procedures provided by Guile’s code library. 

Scheme Procedure: integer? x

    See Integers, for documentation. 

Scheme Procedure: odd? n Scheme Procedure: even? n Scheme Procedure: gcd x ... Scheme Procedure: lcm x ... Scheme Procedure: exact-integer-sqrt k

    See Integer Operations, for documentation. 

Scheme Procedure: = Scheme Procedure: < Scheme Procedure: > Scheme Procedure: <= Scheme Procedure: >= Scheme Procedure: zero? x Scheme Procedure: positive? x Scheme Procedure: negative? x

    See Comparison, for documentation. 

Scheme Procedure: for-each f lst1 lst2 ...

    See SRFI-1 Fold and Map, for documentation. 

Scheme Procedure: list elem …

    See List Constructors, for documentation. 

Scheme Procedure: length lst Scheme Procedure: list-ref lst k Scheme Procedure: list-tail lst k

    See List Selection, for documentation. 

Scheme Procedure: append lst … obj Scheme Procedure: append Scheme Procedure: reverse lst

    See Append/Reverse, for documentation. 

Scheme Procedure: number->string n [radix] Scheme Procedure: string->number str [radix]

    See Conversion, for documentation. 

Scheme Procedure: string char ... Scheme Procedure: make-string k [chr] Scheme Procedure: list->string lst

    See String Constructors, for documentation. 

Scheme Procedure: string->list str [start [end]]

    See List/String Conversion, for documentation. 

Scheme Procedure: string-length str Scheme Procedure: string-ref str k Scheme Procedure: string-copy str [start [end]] Scheme Procedure: substring str start [end]

    See String Selection, for documentation. 

Scheme Procedure: string=? s1 s2 s3 … Scheme Procedure: string<? s1 s2 s3 … Scheme Procedure: string>? s1 s2 s3 … Scheme Procedure: string<=? s1 s2 s3 … Scheme Procedure: string>=? s1 s2 s3 …

    See String Comparison, for documentation. 

Scheme Procedure: string-append arg …

    See Reversing and Appending Strings, for documentation. 

Scheme Procedure: string-for-each proc s [start [end]]

    See Mapping Folding and Unfolding, for documentation. 

Scheme Procedure: + z1 ... Scheme Procedure: - z1 z2 ... Scheme Procedure: * z1 ... Scheme Procedure: / z1 z2 ... Scheme Procedure: max x1 x2 ... Scheme Procedure: min x1 x2 ... Scheme Procedure: abs x Scheme Procedure: truncate x Scheme Procedure: floor x Scheme Procedure: ceiling x Scheme Procedure: round x

    See Arithmetic, for documentation. 

Scheme Procedure: div x y Scheme Procedure: mod x y Scheme Procedure: div-and-mod x y

    These procedures accept two real numbers x and y, where the divisor y must be non-zero. div returns the integer q and mod returns the real number r such that x = q*y + r and 0 <= r < abs(y). div-and-mod returns both q and r, and is more efficient than computing each separately. Note that when y > 0, div returns floor(x/y), otherwise it returns ceiling(x/y).
    (div 123 10) ⇒ 12
    (mod 123 10) ⇒ 3
    (div-and-mod 123 10) ⇒ 12 and 3
    (div-and-mod 123 -10) ⇒ -12 and 3
    (div-and-mod -123 10) ⇒ -13 and 7
    (div-and-mod -123 -10) ⇒ 13 and 7
    (div-and-mod -123.2 -63.5) ⇒ 2.0 and 3.8
    (div-and-mod 16/3 -10/7) ⇒ -3 and 22/21

Scheme Procedure: div0 x y Scheme Procedure: mod0 x y Scheme Procedure: div0-and-mod0 x y

    These procedures accept two real numbers x and y, where the divisor y must be non-zero. div0 returns the integer q and mod0 returns the real number r such that x = q*y + r and -abs(y/2) <= r < abs(y/2). div0-and-mod0 returns both q and r, and is more efficient than computing each separately.
    Note that div0 returns x/y rounded to the nearest integer. When x/y lies exactly half-way between two integers, the tie is broken according to the sign of y. If y > 0, ties are rounded toward positive infinity, otherwise they are rounded toward negative infinity. This is a consequence of the requirement that -abs(y/2) <= r < abs(y/2).
    (div0 123 10) ⇒ 12
    (mod0 123 10) ⇒ 3
    (div0-and-mod0 123 10) ⇒ 12 and 3
    (div0-and-mod0 123 -10) ⇒ -12 and 3
    (div0-and-mod0 -123 10) ⇒ -12 and -3
    (div0-and-mod0 -123 -10) ⇒ 12 and -3
    (div0-and-mod0 -123.2 -63.5) ⇒ 2.0 and 3.8
    (div0-and-mod0 16/3 -10/7) ⇒ -4 and -8/21

Scheme Procedure: real-valued? obj Scheme Procedure: rational-valued? obj Scheme Procedure: integer-valued? obj

    These procedures return #t if and only if their arguments can, respectively, be coerced to a real, rational, or integer value without a loss of numerical precision.
    real-valued? will return #t for complex numbers whose imaginary parts are zero. 

Scheme Procedure: nan? x Scheme Procedure: infinite? x Scheme Procedure: finite? x

    nan? returns #t if x is a NaN value, #f otherwise. infinite? returns #t if x is an infinite value, #f otherwise. finite? returns #t if x is neither infinite nor a NaN value, otherwise it returns #f. Every real number satisfies exactly one of these predicates. An exception is raised if x is not real. 

Scheme Syntax: assert expr

    Raises an &assertion condition if expr evaluates to #f; otherwise evaluates to the value of expr. 

Scheme Procedure: error who message irritant1 ... Scheme Procedure: assertion-violation who message irritant1 ...

    These procedures raise compound conditions based on their arguments: If who is not #f, the condition will include a &who condition whose who field is set to who; a &message condition will be included with a message field equal to message; an &irritants condition will be included with its irritants list given by irritant1 ....
    error produces a compound condition with the simple conditions described above, as well as an &error condition; assertion-violation produces one that includes an &assertion condition. 

Scheme Procedure: vector-map proc v Scheme Procedure: vector-for-each proc v

    These procedures implement the map and for-each contracts over vectors. 

Scheme Procedure: vector arg … Scheme Procedure: vector? obj Scheme Procedure: make-vector len Scheme Procedure: make-vector len fill Scheme Procedure: list->vector l Scheme Procedure: vector->list v

    See Vector Creation, for documentation. 

Scheme Procedure: vector-length vector Scheme Procedure: vector-ref vector k Scheme Procedure: vector-set! vector k obj Scheme Procedure: vector-fill! v fill

    See Vector Accessors, for documentation. 

Scheme Procedure: call-with-current-continuation proc Scheme Procedure: call/cc proc

    See Continuations, for documentation. 

Scheme Procedure: values arg … Scheme Procedure: call-with-values producer consumer

    See Multiple Values, for documentation. 

Scheme Procedure: dynamic-wind in_guard thunk out_guard

    See Dynamic Wind, for documentation. 

Scheme Procedure: apply proc arg … arglst

    See Fly Evaluation, for documentation. "


if we are going to interpret by walking the AST, how to efficiently represent the AST in memory? how to efficiently represent S-exprs in memory? but Oot has dicts (labeled arcs), not just lists (sequences). So how to efficiently represent a nested dict/labeled tree in memory? do we want to assume that each node has only a few children, so just enumerate them and search them all (or maybe sort alphabetically and then binary search?), rather than making a hash table for each dict?

according to , Lisp S-exprs are:

so i guess S-exprs fundamentally use linked lists. This gives O(1) insertion and deletion in the middle, but we may only care about pushing and popping at head, esp. if we just want unordered dicts. But maybe we want ordered dicts. But if we assume the outdegree of each node is small (for the AST use-case), then it's not too costly to rearrange the list when needed.

sometimes backspace backspace backspace fixChar retype retype is easier than relocateCursor backspace relocateCursor. any analogies in data structuree manipulation notation?

as i look in programming language implementation documentation, there is a lot of defining operations and abstract machines by drawing a diagram of what's on the stack, what's in memory, and what's in registers before the operation and after the operation. e.g. DUP might be written "a -> a a" (which is assumed to mean "a ... -> a a ..."; POP might be written:

op before: stack acc after: stack acc POP x ... ... -> ... x

An alternative might be to use slice notation, but that requires more explanation and more characters per operation: pop: acc = stack[0]; stack = stack[1:]

it seems like reading these 'what the stack, accumulator, etc look like' diagrams is just a matter of doing a destructuring bind and assignment in your head. Should Oot support a syntax for coding using these sorts of diagrams?

" As for looking upwards being inconvenient and inefficient, this would only be true if you've done a poor job of designing your AST data structure. If each AST node has a pointer to its parent node, this will be extremely efficient, and trivial to develop. "

" In the other case, we're taking an AST as input, and perhaps a different AST as output. Here, the situation is not nearly as clear. The mapping from one AST structure to another is quite complex (see XSLT, TXL, or TreeDL? for example), and a single BNF-like grammar doesn't capture it. "


i decided to quickly skim the question of whether making Oot suitable for genetic programming would lead to any cool ideas of important constraints.

a quick read of and a skim of notes from Koza's class at lead me to believe:

so far i haven't seen any demand for interesting language features or constraints that would push Oot one way or another.

one interesting tidbit; at least one genetic programming project used directed labeled graphs as its memory representation:


after one more skim of Google: suggests a functional language, as opposed to imperative, defined as " Imperative programming languages, the ones most of us are more familiar with like Java and C have some computer state with variables and values and the various commands alter the computer's state. In functional programs, the idea is to take some input and simply return a value without dealing with computer state. You can think of functional programming more like mathematical expressions than instructions for somebody to follow." (the authors wrote Darwin GP which works in OCaml) is one proposed language looks like a great read, todo Genetic Mapping slides implicitly suggests that it might be good to have a way for the language to skip syntax errors rather than crashing when being used in GP

Genetic Algorithms and Genetic Programming: Modern Concepts and Practical ... By Michael Affenzeller, Stefan Wagner, Stephan Winkler, Andreas Beham notes the following:

these slides also make the point that instead of working on the AST directly, the genotype could specify a sequence of grammatical operations that write the program, thereby avoiding syntactic errors. also mentions this and calls it "grammatical evolution" and cites [Ryan et al., 1998];

2.2.1 Hierarchical labeled structure trees


note: a big deal is made of the terminology "function set" and "terminal set"; "function set" is just constructs that are interior nodes of the AST and "terminal set" is just constructs that are leaves of the AST

2.2.2 automatically defined functions and modular genetic programming

cites "genetic programming II: automatic discovery of reusable programs' Koz94 for the initial inspiration, and also extensions in KIAK99 (Koza et al Genetic Programming III), KKS+03b (Koza et al Genetic Programming IV), and also two techniques for automatical extraction of subroutines into libraries, genetic libraries (Angeline in Ang93 and Ang94) and adaptive representation through learning (ARL) (Rosca in Ros95a and RB96 (Rosca and Ballard, Discovery of subroutines in genetic programming, in Angeline and Kinnear, ???). Also cites "other advanced GP concepts" in this area in Gru94, KBAK99 (Koza et al, the design of analog circuits by means of genetic programming, in Bentley, Evolutionary design by computers, chaper 16), Jac99 (C. Jacob, Lindenmayer systems and growth program evolution, in Hussain, Advanced Grammar Techniques within Genetic Programming and Evolutionary Computation), WC99 (Whigham and Crapper, Time Series Modeling using Genetic Programming). linear genetic programming

a sequence of imperative instructions. can be stack-based or register-based. graphical genetic programming

looks like dataflow, not 100% clear though.

cites Parallel Distributed Genetic Programming (PDGP), Pol97, Pol99b (R. Poli, Parallel Distributed Genetic Programming. In Come, Dorigo, Glover, New Ideas in Optimization, Advanced topics in computer science, chapter 27)

pretty amazing that someone is even working on this: Automatic Quantum Computer Programming A Genetic Programming Approach

 6.5.1  Linguistic Approaches of also talks about:

section 6.5.2 Representational Approaches of also talks about: Predicting Financial Time Series by Genetic Programming ... by R Schwaerzel adds the functions sin, cos, tan, log, exp, mean, standard deviation, skewness, and kurtosis. the last 4 (the stats fns) take two parameters, "LAG and the LENGTH . The LAG parameter is a value between 1 and 20 specifying how many time steps back from the prediction this statistics should be calculated on. For example, LAG=1 would in clude the previous data point, LAG=5 would include the data point one week prior the prediction. The LENGTH parameter specifies the number of data points to include for the statistics, starting from the LAG value backwards. Thus, Mean(1,10) woul d calculate the average of the last 10 trading days."

dragonwriter 10 hours ago


> But the key difference is that those who "program" spreadsheets can see all of the steps involved.

I would say that that's almost precisely backwards: compared to typical plain-text code, spreadsheets make intermediate results more obvious, while obscuring the steps. (Things like, e.g., scala worksheets or ipython notebooks bridge this gap in a way which makes both the steps and the results clear.)



easy syntax for fileglobs and calling out to other programs, as in Shell


an argument for immutable data and against OOP:

"More generally, whenever a data structure has multiple futures, such as when backtracking or exploiting parallelism, ephemeral data structures get in the way. Indeed, the bulk of object-oriented programming, with its absurd over-emphasis on the “message passing” metaphor, stress the alteration of objects as the central organizing principle, confounding parallelism and complicating simple algorithms." --


" It is well known that Haskell is not type safe. The most blatant violation is the all too necessary, but aptly named, unsafePerformIO operation. You are enjoined not to use this in an unsafe manner, and must be careful to ensure that the encapsulated computation may be executed at any time because of the inherent unpredictability of lazy evaluation. (The analogous operation in monadic ML, safePerformIO, is safe, because of the value restriction on polymorphism.) " --

interesting bit on combinator parsing from this paper:

Lenient evaluation is neither strict nor lazy by G Tremblay - ‎2000 - ‎Cited by 2 - ‎Related articles

4.4. Combinator parsing An interesting approach to building parsers in functional languages, described in [29,30], is called combinator parsing . In this approach, a parser is de3ned as a function which, from a string, produces a list of possible results together with the unconsumed part of the input; various combining forms (higher-order functions = combinators) are then used to combine parsers in ways which mimic the di1erent grammar constructs, e.g., sequencing, choice, repetition. For example, suppose we have some parsers p1 and p2 which recognize, respectively, the non- terminals t 1 and t 2 . Then, the parser that recognizes t 1

t 2 could be de3ned as follows ( ‘alt’ denotes the in3x form of the alt function): > (p1 ‘alt’ p2) inp = p1 inp ++ p2 inp Similarly, the parser that recognizes t 1 t 2 (sequencing) could be de3ned as follows: > (p1 ‘then’ p2) inp >

[((v1, v2), out2) | (v1, out1)

← p1 inp; (v2, out2) ← p2 out1] Here, laziness is required because of the way repetition is handled: given a parser p that recognizes the non-terminal t , t ∗ would be recognized by many p de3ned as follows [30]: G. Tremblay/Computer Languages 26 (2000) 43–66 59 > many p >

((p ‘then’ many p) ‘using’ cons)

> ‘alt’ > (succeed []) In function many , one of the subexpression appearing as an argument expression is a call made to many p ; this is bound to create an in3nite loop in a language not using call-by-need, which is e1ectively what happens when such combinator parsers are translated directly into a strict language such as SML or a non-strict language such as Id . Instead, in a non-lazy language, eta -abstraction would need to be used or explicit continuations would have to be manipulated in order to handle backtracking [

" One of the reasons for the success of R is that it caters to the needs of the first group, end users. Many of its features are geared towards speeding up interactive data analysis. The syntax is intended to be concise. Default arguments and partial keyword matches reduce coding effort. The lack of typing lowers the barrier to entry, as users can start working without understanding any of the rules of the language. The calling convention reduces the number of side effects and gives R a functional flavor. But, it is also clear that these very features hamper the development of larger code bases. For robust code, one would like to have less ambiguity and would probably be willing to pay for that by more verbose specifications, perhaps going as far as full-fledged type declarations. So, R is not the ideal language for developing robust packages. Improving R will require increasing encapsulation, providing more static guarantees, while decreasing the number and reach of reflective features. Furthermore, the language specification must be divorced from its implementation and implementation-specific features must be deprecated. The balance between imperative and functional features is fascinating. We agree with the designers of R that a purely functional language whose main job is to manipulate massive numeric arrays is unlikely to be a success. It is simply too useful to be able to perform updates and have a guarantee that they are done in place rather than hope that a smart compiler will be able to optimize them. The current design is a compromise between the functional and the imperative; it allows local side effects, but enforces purity across function boundaries. It is unfortunate that this simple semantics is obscured by exceptions such as the super-assignment operator ( <<- ) which is used as a sneaky way to implement non-local side effects. One of the most glaring shortcomings of R is its lack of concurrency support. Instead, there are only native libraries that provide behind-the-scenes parallel execution. Concurrency is not exposed to R programmers and always requires switching to native code. Adding concurrency would be best done after removing non-local side effects, and requires inventing a suitable concurrent programming model. One intriguing idea would be to push on lazy evaluation, which, as it stands, is too weak to be of much use outside of the base libraries, but could be strengthened to support parallel execution. The object-oriented side of the language feels like an afterthought. The combination of mutable objects without references or cyclic structures is odd and cumbersome. The simplest object system provided by R is mostly used to provide printing methods for different data types. The more powerful object system is struggling to gain acceptance. The current implementation of R is massively inefficient. We believe that this can, in part, be ascribed to the combination of dynamism, lazy evaluation, and copy semantics, 26 Morandat et al. but it also points to major deficiencies in the implementation. Many features come at a cost even if unused. That is the case with promises and most of reflection. Promises could be replaced with special parameter declarations, making lazy evaluation the exception rather than the rule. Reflective features could be restricted to passive introspection which would allow for the dynamism needed for most uses. For the object system, it should be built-in rather than synthesized out of reflective calls. Copy semantics can be really costly and force users to use tricks to get around the copies. A limited form of references would be more efficient and lead to better code. This would allow structures like hash maps or trees to be implemented. Finally, since lazy evaluation is only used for language extensions, macro functions ` a la Lisp, which do not create a context and expand inline, would allow the removal of promises " --


AliceML?'s SEAM VM stores data in an 'abstract store' which is a graph representation of all data -- like the single graph data type that i want for oot


what do we need to add to reified labeled directed etc hypergraphs for eg. function calling?

one thing you want to be able to have is argument default values. So maybe you need to be able to read the argument graph in one of three modes (views?): (a) where arguments not given are just not there, (b) where arguments not given have an ARGUMENT-MISSING (None) sentinel, (c) where arguments not given have been assigned their default

(another related thing is my recent idea of multiple return values in FRP stream)


in the genetic programming language PUSH3, in CODE.EXTRACT, CODE.NTH, CODE.INSERT, the indexing is depth-first; Nock's breadth-first is probably better.

also: does PUSH3 not have an instruction to push the empty list?


PUSH3 has an interesting combo of stack-based and list-based


have more memorable names for delimited continuations than shift and prompt;

if stack is an ordinary graph and you can annotate it, including with terminators (terminators act like prompt), then you don't have to have special names for this, you can use ordinary stack ops

note: mentions in passing 'first-class stack frames'


so frp's streams are like multiple-value promises. could they be good for multiple return values? could events have keywords attached to them as well as a position? could they be generalized to erlang message queues, with pattern matching and possible 'facted' types? perhaps with a go-like synchronous message send option too? could function argument lists be generalized first to dicts, with keywords, and then to message queues?


shared main memory can be represented as a read/write device in the core library (of course must optimize/inline memory accesses)



order status; in a stock trading program, consider the status of an order that has been sent to the broker. In some contexts all you care about is that the order has been sent. In others, you care about whether the order has been filled or not, and if not, perhaps you care about whether it has been submitted by the broker to the exchange and whether it is active on the exchage. Similarly, with a hierarchy of exceptions, sometimes you care about whether something is a ClientException? (client's fault) or a ServerException? (server's fault), and sometimes you care about the exact exception type. So, even when there is a single data variable, the values that variable can take on could be grouped in different ways.

You can deal with this by transforming the data value into a value representing which category it is in, or you can deal with this by writing a predicate function for each category, or you can deal with this by using 'isa'/'isinstance' to compare the value to the category. Can the language support this any better?

In essence what we have here is shades of meaning of a state value. This seems kinda like views, but on primitive values rather than on structures.

Note that oop inheritance may be involved here, since we are talking about grouping together values (but, unlike single inheritance, we have multiple ways to group those values; even multiple inheritance isn't quite up to the task here, because the different categories are over the same value domain, they just group it in different ways; by which i mean, with the integers you could group them into "evens" or "odds", or into "numbers below 10" and "10 and numbers above it"; these would correspond to two different 'classes' of integers, each with a different equality relation.

is there a unification of these 'data views' or 'value views' with inheritance? Perhaps use the notion of 'levels' within inheritance hierarchy, to temporarily coerce value to parents, like dog = mammal = cat. generalize values to cosets (equiv classes)? or, a value view as a subscript on equality operator (dog ==_linneusClass cat); or, as homomorphic transform applied to each value (linneusClass(dog) == linneusClass(cat)); or, as lexically scoped (or even dynamically scoped) 'reasoning mode' which redefines the meaning of '==' within its scope.

Seems to me the better way to do it, rather than a scoped 'reasoning mode', would be to attach a tag of some sort to the values that says that if you apply the == operator to these values, how will that be computed. ideally, this tag would be orthogonal to other (dynamic) type info on the values.

vs == vs === vs ==== could be: assign, equals or isInstance mb-or isSubclass (see above), traditional struct equals, pointer eq

  note: ppl complain about Js and PHP's multiple ==s, but afaict that's because the 'syntactically encouraged' one, ==, means 'do type coercion and then check for structural equality'. It seems to be the type coercion that bothers ppl, especially because it seems to render == non-symmetric (or non-transitive or both, but i bet it's just non-symmetric). My proposal involves no type coercion.

hmm instead of having '==' be 'isa' (== or isInstance mb-or isSubclass), it could be 'equals-under-equivalence-class'. The active equivalence class class could be set on the values (or as a lexical or dynamic 'mode', but as noted above maybe it's better on the values). Note that changing the active equivalence class is like changing the __eq protocol attribute in Python; which is like changing a typeclass instance in Haskell; which is presumably like what Robert Harper says that ML modules do in [1] (the part where the integers could be ordered, e.g. made an instance of a PartialOrder? typeclass, by the natural ordering or by divisibility). Using my 'view' language, we might say that saying that equality on a value should be taken with respect to this or that equivalence class is like asking for a given 'view' of the object.


'isa' is not symmetric


perhaps we should be lazy in expressions and strict in imperative code?

lazy I/O seems annoying to reason about to me. But see , which seems to be saying that there are better solutions than switching to strict, eager I/O, namely using 'pipes' and 'conduits' and possibly 'Iteratees' in place of doing low-levels folds over the I/O.


October 6, 2014 Greg Weber

View source on Github

Haskell's Prelude is changing to favor using Foldable/Traversable instead of just lists. Many Haskellers are concerned that upcoming changes to the Prelude could

    break existing code
    make maintaining code more difficult
    decrease beginner friendliness

Lets discuss these concerns



Q: What's the point of map in Haskell, when there is fmap?

Everywhere I've tried using map, fmap has worked as well. Why did the creators of Haskell feel the need for a map function? Couldn't it just be what is currently known as fmap and fmap could be removed from the language?


Historical reasons.

First came map, because, hey, there were lists.

Then someone said: "Let there be functors!". And was somewhat miffed, b/c map was already taken. So they said "screw it, call it fmap."

And it was so.

Then Functor became a part of the standard library, and everbody said "this fmap name is lame, but we don't want to change the name of map, because that might break stuff."

So they did not.

Edit: Or, as the case actually is, I'm wrong: see augustss's comment below.

-- rampion

That's not actually how it happens. What happened was that the type of map was generalized to cover Functor in Haskell 1.3. I.e., in Haskell 1.3 fmap was called map. This change was then reverted in Haskell 1.4 and fmap was introduced. The reason for this change was pedagogical; when teaching Haskell to beginners the very general type of map made error messages more difficult to understand. In my opinion this wasn't the right way to solve the problem. – augustss Jul 26 '11 at 8:47 ... augustss is Lennart Augustsson, who for all practical purposes has been part of the Haskell community since before Haskell existed, cf. A History of Haskell " --




Scheme and Elisp inter-operation

Scheme and Elisp inter-operation works surprisingly well, despite them being such different Lisps. The one obvious wart is how nil, #false, and () are being handled, but I think even that will not cause issues in practice. Details are at EmacsWiki? and emacs-devel.

What amazed me most was the realization that, conceptually, hygienic macros can work not only across modules in a language, but even across similar languages! As long as you're in a language that has the concepts of expressions, identifiers, and lexical scope, you're *mostly* set:

Supporting hygiene in a language with a module system means you probably use an intermediate language (IL) that can express "raw references" to bindings from given modules. So, expressions that are input to your macro from lang X get translated to an IL snippet with the correct raw references, depending on what modules were visible at the call-site of the macro; and the body of the macro, which is in lang Y, also gets translated to an IL snippet with the correct raw references depending on the modules visible at the definition-site of the macro. Then both these IL snippets get pieced together. The snippets have their individual lexical scopes too which don't mix together, as expected with hygiene.

Your IL can probably express lexical bindings too, so if you get an identifier input to your macro from lang X, you can use that with the generic lexical binding construct of lang Y (which your macro body is written in), which just translates to a generic lexical binding in the IL, and it just works as expected, making other input expressions to your macro in lang X see that lexical binding for the identifier, even though the binding was established via lang Y's lexical binder. So macros that take identifiers and a body, and let-bind the given identifiers to something for the given body, will work fine.

(In the case of Elisp, there's conceptually two modules: the function namespace and the variable namespace. When a symbol is being compiled, it results in a raw reference to a binding in either one of those modules, depending on the usual rule that determines whether a symbol stands for a function or variable name.)

Problems remain if your macro actually relies on s-expression structure of expressions, and you want to use the macro in a language that doesn't have that. E.g. (syntax-rules ((my-let x y) (let x y))) would not be usable from JavaScript? because it has no syntax equivalent to ((var1 val1) (var2 val2) ...) to be used for x. Similarly, you could pattern-match on a vector literal, which is problematic for JS where [foo, bar] is like `#(,foo ,bar) in Scheme, though it should be fine for Elisp where [foo bar] is really like #(foo bar) i.e. foo and bar aren't evaluated. By Taylan Ulrich Bayırl... at Wed, 2014-09-17 21:51

login or register to post comments

a common way to generalize things is by generalizing 'single' to 'multiple'. e.g. promises to streams, single return value to multiple return value, single inheritance to multiple inheritance, sequential execution to concurrent execution

another one is to generalize having one type of something to having many types. e.g. success/fail to success/various-types-of-exception objects. One way to do this is to add more fields to something, e.g. add a string 'message' field to an exception object. Another form of this is parameterizing stuff, e.g. by adding an extra parameter to a function's signature you generalize it.

another one is 'higher order', e.g. higher-order functions like 'map'

another one is indirection, e.g. polymorphic dispath

another one is to take something which was exhaustively categorized, and add another category, often a 'None of the above' category, eg. nullable types

another one is to take something or some set of things which was exhaustively categorized, with each category having certain properties and behaviors, and add another category which allows the user to set these properties arbitrarily in some language, e.g. 'custom' variants of things, custom garbage collectors, custom destructors, etc

these 3 are all the same:

map toLower "FOO"

fmap toLower "FOO"

toLower <$> "FOO"

... <$> is the same as `fmap` ... map is just a less general form of fmap?


syntax: maybe code should be imperative 'normally' (e.g. without any 'do' or 'begin' block needed), and be expression-based within 'let's

otoh if "Purely Functional is the right default" as Sweeney proposes in The Next Mainstream Programming Language


" I was just curious, why did Unreal 4 decide to go with C++? "


Tim Sweeney Tim Sweeney is offline Unreal Engine Developer

Join Date Mar 2014 Posts 64

    The first three generations of the Unreal Engine included a sandboxed scripting language, UnrealScript, which provided a simple interface for gameplay programming that was shielded from the complexity of the C++ engine.
    The scripting approach is very welcoming to new programmers, but eventually it breaks down and becomes an obstacle to innovation and shipping. We experienced this over time as the Unreal Engine grew until finally, in 2011, we moved to a pure C++ architecture. The causative factors were both pervasive and general:
    It is these reasons, ultimately, that led to Epic's move to pure C++. And the benefits are numerous: UE4 is a unified and fully-debuggable code base, freed from Interop Hell and totally open to programmers to study, modify, and extend. There are side-benefits, too, such as increased performance in gameplay code, and ease of integrating other middleware written in C++.
    Building Unreal Engine 4 as a unified C++ codebase has been very freeing, giving engine and gameplay programmers enormous flexibility to write code without unnecessary interop barriers.
    This isn't to say that C++ is the ideal language for simple gameplay code. It is more complex and more dangerous than UnrealScript, C#, and JavaScript. But that is another way of saying that it's more powerful.
    By making peace with complexity and writing code in C++, there is absolutely no limit to what you can accomplish, whether it involves debugging your entire codebase in-context, interfacing to low-level engine systems, modifying them, or talking to the operating system or advanced third-party libraries. 


(this is interesting in light of Sweeney's old talk, The Next Mainstream Programming Language

OK, now I've watched the video (skimmed)

OK, now I actually have watched the video. Here are some poorly formatted notes:

Can't be a 'big agenda' language, which I interpret to mean that it can't push the envelope too much in unproven directions. Examples given:

Why not:

Lower friction:

He talks a good bit about finding 85% solutions (solving the 85% case but not trying to force everything into the mold).

He doesn't like RAII or exceptions. Considers memory very different from other kinds of resources because it's ubiquitous.

He wants pointers and proposes an owned pointer syntax. Didn't see a comparison to Rust (edit: nevermind, he discusses briefly in the Q&A past the end of the talk).

He wants to get rid of constructors (just use functions for that).

Wants to build in a syntax for strings that share memory.

Get rid of header files.

Wants to support refactoring.

Likes Option types, but not a full algebraic datatype that you have to unpack to use.

When he talks about concurrency, it sounds like he wants more safety features.

Doesn't like implicit conversions.

Wants serialzation support without C++ hacks.

Program should specify how to build itself.

Permissive license.

Better / no preprocessor.

His goal seems to be pragmatism / fixing pain points with C++ rather than something more ambitious. I was hoping to some gems but was disappointed by the specific proposals. EDIT: But also should say that as a former C++ games programmer, I agree with most of his general sentiment. By Matt M at Sat, 2014-10-04 00:05

login or register to post comments


" There are many areas where we’ve become a lot smarter since the last batch of main stream programming languages came into existence in the mid-90′s. Java, C# and JavaScript? can’t easily be changed/improved in fundamental ways because they also need to be backwards compatible to some degree. Its now more or less “common knowledge” that we’re challenged with concurrency (and utilizing multicore), and that some aspects of functional programming is needed to make this better. We now know that distributed objects is not such a good idea; it’s better to send data explicitly around and reinterpret it when it arrives. We’ve come to realize that we need a better module system than provided in those languages. It has become apparent that it is a big advantage if it is easy to create internal DSL’s, one of Ruby’s big advantages. We’ve learned a lot of things, and if the Dart team is able to wrap up those leanings of the last 15 years and repackage it as a new main stream language, there’s a good chance for success I’d say.

One example: When I was at NeXT? we often talked about how difficult it is to build apps that run things concurrently — i.e. with threads. Mind you these were smart colleagues I had, and we had no idea how to do this in a proper structured way. Everything we could come up with felt like a hack; still today in Objective-C you have to be very careful to get things processed on the main event loop. In JavaScript? this problem is “solved” by programming such things entirely event driven in one thread; making you program “backwards” in may ways. In Java/Swing runOnMainThread is a horrible explicit way to have to schedule things to make them “thread safe”.

But there are other options: Erlang gets this right. With message-passing concurrency and selective receive you can do concurrent and event driven programming with a lot less headaches. As it happens, Erlang just isn’t a language that comes pre-installed on all desktops (even though it is now in standard Ubuntu distribution), and Erlang also does not have a good GUI library. Erlang was not created for doing GUI programming. But the idea would fit perfectly well to do concurrent + event driven programming in GUI programs. I’ve said this numerous times, and will do again, GUI is the next killer-app for actor programming.

So that’s just one area where there is IMHO, immense opportunity for improvement. One can only hope that message passing concurrency does, in some way, make it into Dart. I am an Erlang-head after all. But even without, there is a lot of other areas in which the programming language community has learned from our troubles with the old main stream languages, and so there is amble room for improvement. " --


" TITLE: Studying Program Repositories AUTHOR: Eugene Wallingford DATE: October 06, 2010 11:24 AM DESC: ----- BODY: Hackage, through Cabal Last week, Garrett Morris presented an experience report at the 2010 Haskell Symposium entitled Using Hackage to Inform Language Design, which is also available at Morris's website. Hackage is an online repository for the Haskell programming community. Morris describes how he and his team studied programs in the Hackage repository to see how Haskell programmers work with a type-system concept known as overlapping instances. The details of this idea aren't essential to this entry, but if you'd like to learn more, check out Section 6.2.3 in this user guide.) In Morris's search, he sought answers to three questions:

        What proportion of the total code on Hackage uses overlapping instances?
        In code that uses overlapping instances, how many instances overlap each other?
        Are there common patterns among the uses of overlapping instances?

Morris and his team used what they learned from this study to design the class system for their new language. They found that their language did not need to provide the full power of overlapping instances in order to support what programmers were really doing. Instead, they developed a new idea they call "instance chains" that was sufficient to support a large majority of the uses of overlapping instances. They are confident that their design can satisfy programmers because the design decision reflects actual uses of the concept in an extensive corpus of programs. " --

Instance chains: Type-class programming without overlapping instances


it would be useful to spider Hackage by using cabal-install to find out what is there, then d/ling the source of each package in turn, and build up a frequency list of all tokens, to see which library functions are most used in Haskell.

i did some google searches and couldn't find any such list.

perhaps ask these guys first?


consider again order status codes. If you have an abstract object (a stock market order) being talked about by multiple agents (such as the client, the broker, and the exchange), then you might imagine that they all obey a protocol when passing around updates (messages about this object), and this protocol defines for each of them a state machine, which changes state depending upon which messages they get from the others, and sometimes emits messages to the others.

however, programming language experience seems to show that it's more powerful to treat 'states' and 'operations' as separate primitives, e.g. for each agent to have some variables that hold values (the state), and upon the receipt of a message, to specify various sequences of functions that operate on those variables to give them new values. These programming language primitives (variables and operations) can be used here and in other paradigms, and are better than having a special restricted syntax for state machines (at least, more restrictive than e.g. 'upon receiving this message, do that to these variables' is such a syntax).

now, you could try and reify/represent directly 'truths' about the object (the order) which are perhaps perceived differently by the different agents (eg. right after the exchange tells the broker that it acknowledges the order, the exchange and the broker both think of the order state as 'received by the exchange', but the broker doesn't know that yet). so, the PERCEIVED truth of the order status varies by node (agent).

messages between agents communicate state changes; should we let the language handle all that, so that we only deal in 'truths'? no, because we dont want to hardcode one update protocol into the language eg some systems use vector clocks, whereas other systems use other methods, eg the fancy bayes net cluster/clique propagation algorithm (see also ).

also, on the topics of 'truths', there is an alternative to thinking of each agent as believing in one 'truth' for each 'truth state variable'; probabalistic (perhaps bayesian) reasoning = parallel consideration of multiple possibilities (in these systems sometimes you also separately attach evidence for each hypothesis to the 'truth state variable', but in bayes nets that's taken care of just by the CPD arcs to other nodes in the net, e.g. each piece of 'evidence' resides in a separate 'truth state variable').

there's at least two ways to look at the work 'logic':

sometimes one wants to switch between considering many possibilities for a given 'truth state variable', and considering just one hypothesis.

probability could be thought of as an additional 'view' on values of truth state variable.

provenance could be thought of as another such view.

fact views could be called 'facets'; faceted classification?

perhaps a programming language could have such 'facts' (assignments to truth state variables) as additional primitives, beyond mere variable values; could also have language support for the concept of 'inference'/truth maintenance; logic programming.

should lookup oop/relational divide.

so are 'facts' states? should there be a distinction between domain states and variable values, or is that an unnecessary distinction (for example, we know that a state machine can be represented as the application of an ordinary function in ordinary variables to a 'domain state')? how are these related to what i was calling 'pieces of state' or 'state domains'?

on attaching metadata via views to a stock market order -> mb we want a relational db embedded in the language

stock market orders can be thought of as derivatives of account values. this leads to the notion of time in a db (take another look at Datomic) (but do we really want to hardcode a notion of time into the language? there are different formalizations of time, for example chu spaces) (also, this talk of formalizing time and its effect on knowledge reminds me of kant; but i have no interesting thought on that, i just wanted to say it reminds me)


should look again at my atr order status source code, since thats an example of what im trying to fix


i spoke earlier of the potential need for an __apply to let data objects specify that eg. they are secretly a container and operations applied to them should be 'autovectorized' over the container elements.

However, now i think that may be able to be unified with my desire for cheap syntactic sugar for the 'map' higher-order function. Consider the Haskell operator <$>; 'toLower <$> "FOO"' is the same as 'map toLower "FOO"' (it's the same as `map`). But alternately, one could perhaps think of the <$> operator as saying 'temporarily set __apply on "FOO" to autovectorize, then apply toLower to that object, then set __apply back to what it was on the result'.

actually, that isn't quite what is wanted, because you still need one operator to 'autovectorize just once' like this, and another one to permanently autovectorize (autovectorize without immediately undoing)


hmm.. as far as the idea of explicitly representing 'truth state variables' that state this agent's current beliefs about something in the world (an object, in Kant's terminology) from ordinary variables used in internal computations:

i guess part of what truth state variable separation would be good for is for cross-agent communication eg. language, in which different agents can (attempt) to refer to the same object, and talk about their beliefs about that object. And what you really need there is just an identifier for hte object. So maybe what you really need is just some language support for doing things with identifiers, binding external identifiers to the internal beliefs they refer to, binding one external identifier to another, etc.

in atr, i do remember doing a bunch of stuff to keep track of the orderIDs, which were external identifiers used in communication between the client and the broker server. (See eg atr/, and ib_driverc.pyx section "Execution"). That stuff probably could have used some language support. Note also the throttling (eg. placeOrderThrottle in brokerc.pyx and msgThrottle in ib_driverc.pyx and getCurrentPrice in brokerc.pyx) and retry wrappers on various messages sent to the server, and the caching of beliefs whose ground truth is server messages (e.g. currentTickerFieldValue in brokerc.pyx). (eg has our internal orderID, but each order on IB must also be associated with IB's order ID (an external order ID)). This is the sort of stuff that it would be great to have language support for, somehow.


would like optionally persistent closures:


    Arc doesn't cause the expired link issue. Arc is just a Lisp that lets
    you program however you want. The HN software is written in a style
    that uses closures to remember what a user is up to across multiple
    web requests. I'll describe a classic example of this, then how we
    changed it a few months ago. While there are still some expired link
    errors--primarily when we restart the server process--there are vastly
    fewer than there were.
    When you click "More" at the bottom of a page, HN shows you the next
    30 stories. These are different for logged-in users--they're affected
    by profile settings like "showdead"--so, if you're logged in, the
    system needs to compute the next 30 stories to show you. Which 30
    those are depends on the ones you were just looking at. So the server
    needs to know what you were just looking at in order to handle the
    "More" request correctly. How can it know that? There are many ways.
    HN's traditional way is to make a closure (a function that remembers
    any info it needs) at the time that it's generating the original page,
    which, when called, will compute the correct next 30 stories.
    The advantage is that, at the time you're making the closure, all the
    information you need to process the next request correctly is right
    there in scope. You don't have to remember it, reconstitute it, or
    anything--all of which takes extra code, and different code for each
    kind of request. You just say in the simplest way what you want the
    next request to do, if and when the user makes it. Since that's
    usually very similar to what you've just done (e.g. "these 30 stories
    instead of those 30"), it may only take a line or two of code. That's
    a huge win for simplicity.
    You can't send the closure directly to the browser, so you make a
    unique ID instead, save the closure in a table keyed by that ID, and
    put the ID in a link. When the user clicks that link, the server gets
    the ID, looks up the closure, and executes it.
    If you're saving lots of closures, eventually you'll run out of RAM if
    you don't garbage-collect some. So the HN server periodically prunes
    the oldest ones. If one of those links is still open in a browser
    window somewhere and the user clicks on it, the server will no longer
    remember what to do. That is when HN says "Unknown or expired link".
    As HN grew larger, there were many more of these closures, and the
    odds were higher that someone would try to use one after it had been
    pruned. A few months ago, since we were working on that part of the
    code anyway, we decided to eliminate the most common cases. We
    measured how often all the different types of closure were
    created--there are a few dozen different kinds, IIRC--and replaced
    the most common ones with more traditional ways of passing state back
    to the server, like query strings and hidden form fields. We did that
    until the total number of closures being created was down by an order
    of magnitude, and then--equally importantly--we stopped. Now the
    system has enough RAM to remember the vast majority of closures again,
    and the "expired link" errors have dwindled to a tiny fraction of what
    they were before. (The one unfortunate exception to this is when we
    restart the server process. Then *all* the closures get pruned,
    regardless of how much RAM you have to cache them.)
     " --

more discussion at:

including similar ideas:

drostie 4 hours ago


One thing that was exciting about Datomic to me (and still is, I just don't have any projects big enough to use it for) is that you get an immutable database state. You can emulate this in any normal database, but it gets harder to enforce the discipline among other members of your team. The basic idea is that you can serialize those same closures by just pointing to "the state of the database at revision #5968", and then, though the database moves on, you can always use that ID to compute the view of that database at that point. It does the same "heavy lifting" that storing these closures is doing, but you can easily share that ID across a distributed service with no "expired links" problems.

It's worth mentioning that you can't send a closure in a non-functional context. That is, if Alice sends a closure to Bob, it cannot any longer be the case that Alice's other operations can mutate Bob's state. So you must be serializing an "orphaned" environment tree with a bunch of closures which point at different nodes of that tree.

You could definitely do this even better by stealing some ideas from Smalltalk: encapsulate all of the states in some computational node (the original notion of "object" in OOP) which interacts with all of the other parts of the system communicate with by message-passing, and nothing else. To change the code on-the-fly, you just swap out the "code part" of some node for a new code part, and perhaps transform the state, queuing up the messages while you do so; then you can start replaying those messages to the new code. The benefit is that now at any time the nodes can move around servers arbitrarily, as long as you've got a good name-resolution service to tell you where the object is now.

In other words: (1) interpret all the things so that code and data are the same; (2) shared-state is your enemy; serialize orphaned states only; (3) you have to explicitly handle the case where someone makes a request while you are sending their closure to another server.


lysium 4 hours ago


One difficulty with storing closures is storing and transferring functions, in particular between machines. There is progress in this direction, but it is not easy.


barrkel 8 hours ago


I've occasionally mused about the possible value of introspectable and serializable closures. Rather than being memory-only, it would be nice if the weight of keeping them around could be palmed off to the browser using cookies or hidden fields.

To be practical, it would require that the activation record chain kept alive by the closure is reasonably short, that the number of live variables in the chain is fairly small, and there be a reliable way of mapping code references in and out. But I think it can be done.

One of these days, I'm going to implement a toy language with this feature combined with my other favourite, automatically recalculated data flow variables (think: "variables" that work like spreadsheet cells). These guys are highly applicable to data binding, and making them a first-order language concept makes them much more elegant to use.


ufmace 4 hours ago


> automatically recalculated data flow variables (think: "variables" that work like spreadsheet cells)

That sounds kinda like C# Properties, which can be calculated with arbitrarily complex code at get and set time. Is that what you had in mind? It's commonly used with WPF data binding.


JoachimSchipper? 43 minutes ago


Take a look at Termite for Gambit Scheme; its focus is different from yours, but it seems to be very good at serializing stuff (including continuations) and automatically proxying the rest (e.g. file descriptors).

(I don't think this is actually a good idea - intra-datacenter traffic is much faster, upgrading becomes hard in this scheme, and your security model needs to be quite complicated - but I'd be interested in learning what you find.)


brudgers 4 hours ago


I understood what you were suggesting.

What does serialization, encryption, transmission, requesting, retransmission, decryption, and deserialization gain?

  + A few bytes of memory

The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information. -- Alan J. Perlis, Epigram 34.


barrkel 1 hour ago

  + Statelessness

This is key.


barrkel 35 minutes ago


I'm not suggesting I can get a free lunch. I wasn't talking about avoiding state transitions. I wasn't even talking about pure functional programming. By talking about variables captured by closures, I'm implicitly not doing so.

In the context of server programming, by using the word 'statelessness', I meant that the request (from clicking the link) doesn't need to go back to the same server that created the link. Having an in-memory hash-table containing closures or continuations keyed by an ID in the request implies that request needs to ultimately reach back to the same machine, one way or another.

I'm talking about programming the Lisp-style multi-request process using continuations (as described by pg in his essays), except making them work in a stateless way - without the requirement for hash tables full of continuations that need periodic collecting, with the consequence that the links go stale.

A request handler generating such continuations would, of course, need to be simple and lightweight, not deep or complex. Code artifacts and stateful resources would need to be serialized using keys that contain enough information to find or recreate them. Imagine having server-side URLs for every function, for every resource, where doing a request for a function pointer or a resource could potentially open a database connection or load a library. The keys would be analogous to such URLs.

Consider an activation record. It is the storage for local variables, parameters and the function return location (aka continuation, when you twist your mind around it). The activation record for any given function has a particular signature. Let's say one of these local variables is a Customer object, mapped by an ORM from the database. That is something that can be reduced to an ID; if you know that this activation record only ever has a Customer in that slot, you can potentially serialize that object as a single number.

Keep the chain of activation records short, and they act mostly as a path through a tree to the current state of a process or interaction. Function a calls function b calls function c, which returns a result to the client, along with a continuation (or more interestingly, multiple continuations). A serialized continuation would be little more than a record of the current step in the overall process, everything needed to pick up where one left off. I think, with the right level of language abstraction, this can be made very slim.

(PS: I detect a certain amount of exasperation in your tone. I'm not sure what I did to trigger it.)


webXL 6 hours ago


That's what ViewState? does more or less. There's a lot of overhead and little gain IMO. I think it's best to transfer as little state as possible across the wire for simplicity.


barrkel 5 hours ago


It is not. viewstate includes a serialization of a set of controls that, server-side, represent the state of the page.

That's not what I'm talking about.

This is why I think I need to write a toy language example - most people, given a brief outline of what I'm talking about, think I'm talking about something else they're more familiar with, and misunderstand.


krapp 6 hours ago


Am I simply not awake yet this morning or are you more or less describing sessions?


barrkel 5 hours ago




pshc 15 hours ago


Oh! Taking that a step further, what if you mmap'd an empty file first, say 1MB of zeroes. Then, just start writing closures to it, one after another. When you hit 1MB, use mremap to add another 1MB. And just keep going! The pagefile would become a fossil record of the webserver's access history. The earlier in the file, the older the request, and the less likely it would ever be mapped into physical RAM ever again. On a 64-bit machine with a modern hard drive, you could probably go forever :)


espeed 12 hours ago


Chronicle Map would be great for that:


derefr 15 hours ago


Or you could just URLencode the serialized closure, and sign that URL. Now as long as the page exists, the closure exists!


rcfox 14 hours ago


That sounds really dangerous. It probably wouldn't be too hard to inject arbitrary data into the server.


amock 13 hours ago


Signing the URL should prevent that, but it still seems overly complicated.


derefr 3 hours ago


Note that the client is passing you a token with your signature on it, not the client's signature. This isn't PKI, or even shared-secret encryption like RSA; this is the client receiving an opaque blob and then passing it directly back to the server, and the server verifying the URL's HMAC to prove that A. the non-HMAC part of the URL is byte-for-byte identical to the one the HMAC claims it is; and B. the HMAC contains the server's secret.

I really like this approach, myself (it works really well in Erlang, where closures can be serialized like any other term), so I'll argue in favor of it for a bit:

1. OS package managers (especially those that provide automatic security updates) are, effectively, arbitrary code execution limited solely by signature verification. If you don't trust signature verification, you basically can't trust OS update infrastructure.

And these are actually less secure than URL signing, when you think about it: with a signed URL, you are the signatory, and it's very easy to know if you are you. With OS updates, you have to trust the OS manufacturer has itself granted trusts only to the right entities. (Microsoft could put an update signing key from law enforcement into Windows, letting them push automatic wiretap/rootkit "updates" to selected individuals, etc.)

In other words, the security of a system is derived from its weakest link—and there are links far weaker than URL signing.

2. People already do this a ton—deserializing an opaque blob of data signed by the server and then treating it as if it was something just sitting in the server's memory to begin with. Where? In "signed-cookie session storage", the default session mechanism of both Rails and Django. The only difference is that you're putting the information in the URL (where it belongs, in this case) instead of the session—although you could just as well store a continuation table in the session, and then reference it from the URL, if you liked.

Okay, there's also the fact that you're storing a serialized closure instead of a public route—but in business terms, that's no more dangerous to e.g. the valuable information in your database than storing the user's effective UID in signed-cookie session storage, presuming you have administrator-role users in your system with the ability to delete that data.

The one difference might be if the limited set of all your public API endpoints acts as a slapshod "sandbox" for your server, with you trusting that sandbox to protect your system. Which is to say, if your server can do more harm by executing arbitrary code than an administrator user can do by sending messages to it, you should really look into Docker/BSD jails/etc.


vilhelm_s 15 hours ago


I thought the same thing. There certainly have been Scheme systems developed that can serialize closures and continuations (e.g. I found [1], and the systems they cite in the related work section). But it seems tricky to get the garbage collection to work right... somehow you need to tell the program which data should be serialized, and which should be stored in RAM and somehow marked as still being live.

For the security aspect, maybe the server could have a secret key and store a MAC along with the serialized string?



" Struct literals. For the addition of features in later point releases, it may be necessary to add fields to exported structs in the API. Code that uses untagged struct literals (such as pkg.T{3, "x"}) to create values of these types would fail to compile after such a change. However, code that uses tagged literals (pkg.T{A: 3, B: "x"}) will continue to compile after such a change. We will update such data structures in a way that allows tagged struct literals to remain compatible, although untagged literals may fail to compile. (There are also more intricate cases involving nested data structures or interfaces, but they have the same resolution.) We therefore recommend that composite literals whose type is defined in a separate package should use the tagged notation. "

could just prohibit using "untagged" (e.g. non-keyword) struct references between modules -- maybe this could mean that when a type is exported across modules, it loses the part that can specify a field by number, retaining only that which specifies it by name???


some design patterns for exploratory numerical analysis:

towers of pre- and post-processing wrappers: you have a function y = doSomething(x) which does some computation; then you wrap it with y = doSomethingWithPreprocessing(x2) which applies some preprocessing before calling doSomething, and/or y2 = doSomethingWithPostprocessing(x) which applies postprocessing to the value returned by doSomething, and/or both; recurse, building a tower of functions which does more and more pre- and post-processing. Notes:

'pipeline' scripting functions: functions that do not just core computation, but also project-specific workflow, such as loading and saving temporary results, and metadata describing what they contain, into idiosyncratically-named files, in case they are unexpectedly needed in the future; and such as providing key statistical readouts printed to console, and interactive visualizations; and such as trying a bunch of specific methods which were chosen idiosyncratically for this one project, and comparing their results.


in these 'towers of wrapped functions' scenarios, you'd really like the outer functions be able to delegate all of the following down the tower:


In [48]: bool(False+False) Out[48]: False

In [49]: bool(False+True) Out[49]: True

In [50]: bool(True+True) Out[50]: True


interestingly, bradley kuhn has a master's degree focused on multilingual VMs:


bradley huhn seems to like it:


" “One thing well” misses the point: it should be “One thing well AND COMPOSES WELL”


Consume input from stdin, produce output to stdout. Put another way, your program should be a filter.


Output should be free from headers or other decoration


Output should be simple to parse and compose


Treat a tool’s output as an API. Your tool will be used in contexts beyond your own imagination. If a tool’s output format is changed, other tools that compose or otherwise build on its output will invariably break—you have broken the API contract.


Place diagnostics output on stderr. Diagnostics output includes anything that is not the primary data output of your tool. Among these are: progress indicators, debugging output, log messages, error messages, and usage information. When diagnostics output is intermingled with data, it is very difficult to parse, and thus compose, the tool’s output. What’s more, stderr makes diagnostics output more useful since, even if stdout is being filtered or redirected, stderr keeps printing to the user’s terminal—the ultimate target of diagnostics output.


Signal failure with an exit status


Make a tool’s output portable. Put another way, a tool’s output should stand on its own, requiring as little context as possible to parse and interpret. For example, you should use absolute paths to represent files, and fully qualified hostnames to name internet hosts


Omit needless diagnostics


Avoid making interactive programs ...

A common use of interactive programs is to ask the user to confirm some dangerous action. This is easily avoided by asking the user instead to supply a flag on the command line to the appropriate tool.


" Output should be simple to parse and compose

This usually means representing each record as a single, plain-text formatted line of output whose columns are separated by whitespace. (No JSON, please.) Most venerable Unix tools—grep, sort, and sed among them—assume this. As a simple example, consider the following output from a benchmark suite. It is formatted by starting each record with the benchmark name, followed by a set of key-value pairs associated with the named benchmark. This is a flexible structure to work with as it allows you to add or remove keys at will without violating the output format.

$ ./runbenchmarks Benchmark: fizzbuzz Time: 10 ns/op Alloc: 32 bytes/op Benchmark: fibonnacci Time: 13 ns/op Alloc: 40 bytes/op ... $

While convenient, it is quite clumsy to work with in Unix. Consider a very common thing we might want to do: look up the timing results for a single benchmark. Here’s how you do it.

$ ./runbenchmarks

Benchmark: fizzbuzz Time: 10 ns/op lloc: 32 bytes/op $
awk '/^Benchmark:/ { bench = $2} bench=="fizzbuzz"'

If instead each line presents exactly one record, where columns are separated by whitespace, this becomes a much simpler task.

$ ./runbenchmarks fizzbuzz 10 32 fibonnaci 13 40 ... $ ./runbenchmarks

fizzbuzz 10 32 $
grep '^fizzbuzz'

The advantage becomes even more evident when reordering or aggregating the input. For example, when the output is record-per-line, sorting the results by time spent is a simple matter of invoking sort:

$ ./runbenchmarks

fibonnaci 13 40 fizzbuzz 10 32 ... $
sort -n -r -k2,2


" Make a tool’s output portable. Put another way, a tool’s output should stand on its own, requiring as little context as possible to parse and interpret. For example, you should use absolute paths to represent files, and fully qualified hostnames to name internet hosts. Portable output is directly usable by other tools without further context. A frequent violator of this is build tools. For example, both the GCC and Clang compilers try to be clever by reporting paths that are relative to your working directory. In this example, the source file paths are presented relative to the current working directory when the compiler was invoked.

$ cc tmp/bad/x.c tmp/bad/x.c:1:1: error: unknown type name 'INVALID_C' INVALID_C ^ tmp/bad/x.c:1:10: error: expected identifier or '(' INVALID_C ^ 2 errors generated. $

This cleverness breaks down quickly. For example if I use make(1) with the -C flag.

$ cat tmp/bad/Makefile all: cc x.c $ make -C tmp/bad cc x.c x.c:1:1: error: unknown type name 'INVALID_C' INVALID_C ^ x.c:1:10: error: expected identifier or '(' INVALID_C ^ 2 errors generated. make: * [all] Error 1 $

Now the output is less useful: to which file does “x.c” refer? Other tools that build on this need additional context, the -C argument, in order to interpret the compiler’s output—the output does not stand on its own. "

" Omit needless diagnostics. Resist the temptation to inform the user of everything that is being done. (But if you must, do it on stderr.) A good tool is quiet when all is well, but produces useful diagnostics output when things go wrong. "


from the above notes on (from here up to the occurence of "") we can get some hints on what we might need or want for an augmented Unix-pipe-like thingee within oot:

there is some debate on whether programs should infer from their context and format their output appropriately:

i'm thinking, no, except there should be framework-provided routines in the context which can choose views, add pre- and post-processing (eg pagination), etc


"This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface." -- Doug McIlroy?'s (one of the Unix founders, inventor of the Unix pipe)


voltagex_ 3 days ago


I'm not sure I agree with the "no JSON, please" remark. If I'm parsing normal *nix output I'm going to have to use sed, grep, awk, cut or whatever and the invocation is probably going to be different for each tool.

If it's JSON and I know what object I want, I just have to pipe to something like jq [1].

PowerShell? takes this further and uses the concept of passing objects around - so I can do things like ls

$_.Name and extract a list of file names (or paths, or extensions etc)



ygra 2 days ago


I was also constantly thinking of PowerShell? while reading that. A PowerShell?-specific list of such advice would actually be rather short, given that most of the pitfalls are already avoided. I still firmly believe that PowerShell? is actually a much more consistent Unix shell in that several concepts that ought to be separate are actually orthogonal. Let's see:

Input from stdin, output to stdout: Nicely side-stepped in that most cmdlets allow binding pipeline input to a parameter (either byval or byname, if needed). Filters are trivial to write, though.

Output should be free from headers: Side-stepped as well, in that decoration comes from the Format-* cmdlets that should only ever be at the end of a pipeline that's shown to the user.

Simple to parse and to compose: Well, objects. Can't beat parsing that you don't need to do.

Output as API: Well, since output is either a collection of objects or nothing (e.g. if an exception happened) there isn't the problem that you're getting back something unexpected.

Diagnostics on stderr: Automatic with exceptions and Write-Error. As an added bonus, warnings are on stream 2, debug output on stream 3 and verbose output on stream 4. All nicely separable if needed.

Signal failures with an exit status. Automatic if needed ($?), but usually exception handling is easier.

Portable output: That's about the only advice that would still hold and be valuable. E.g. Select-String returns objects with a Filename property which is not a FileInfo?, but only a string; subject to the same restrictions that are mentioned in the article.

Omit needless dagnostics: Since those would be either on the debug or verbose stream they can be silenced easily, don't interfere with other things you care about and cmdlets have a switch for either of that, which means you only get that stuff if you actually care about it.

Avoid interactivity: Can happen when using the shell interactively, e.g.

    Home:> Remove-Item
    cmdlet Remove-Item at command pipeline position 1
    Supply values for the following parameters:
    Path[0]: _

However, this only ever happens if you do not bind anything to a parameter, which shouldn't happen in scripts. If you bind $null to a parameter, e.g. because pipeline input is empty or a subexpression returned no result, then an error is thrown instead, avoiding this problem.

Nitpick: You'd need ls

% Name or ls % { $_.Name } there. Otherwise you'd have an expression as a pipeline element, which isn't allowed.



seanp2k2 3 days ago


+1 for jq. A lot of my work these days involves using web APIs in addition to "local" ones from CLI tools. xpath was good for dealing with XML stuff in a similar fashion, and HTML-XML-utils is an awesome suite of CLI things for slicing and dicing, if you're into that sort of thing:


grosskur 2 days ago


xmlstarlet is also handy for command-line XML parsing:


reirob 2 days ago


Agree, had to use it extensively several years ago, it saved me so much time. But even then the development seemed to have stopped on this tool.



lstamour 3 days ago


On Mac (OS NeXT?, perhaps?), the convention seems to be that most commands produce human readable output by default, but you can pass a parameter like -x or -xml to get (usually) XML, machine-readable output, and with some tools, -j or -json will give you that format.

But then you've oddities like plutil behaving like gzip by modifying the file you specify rather than printing to stdout. You have to pass -o and a dash to get it to leave the file alone and instead reformat it to stdout. That one gets me every time. And I'm not alone:

But other parts are nice. For instance, "system_profiler -xml > MyReport?.spx" generates XML that will open in the System Profiler GUI app. The XML generated is usually a Plist, since that's as native to the platform as the Registry might be to Windows...

Let me know when PowerShell? gets tabs though. Maybe there's a port running in Mono somewhere? Seriously, I wish somebody would build a better terminal, maybe get creative with scrollback and chaining commands, and ship it in an OS... with tabs. ;-)


tfigment 2 days ago


Not sure if its what you had in mind for Windows and tabs but I've found ConsoleZ? [1] quite nice and allows powershell, cmd and others to have tabs.




maybe should survey unix commands to see what sorts of options could be 'standardized'

some ideas:

oh wait someone did, see next one:


acabal 3 days ago


Great article. The other thing I've always wished for command-line tools is some kind of consistency for flags and arguments. Kind of like a HIG for the command line. I know some distros have something like this, and that it's not practical to do as many common commands evolved decades ago and changing the interface would break pretty much everything. But things like `grep -E,--extended-regexp` vs `sed -r,--regexp-extended` and `dd if=/a/b/c` (no dashes) drive me nuts.

In a magical dream world I'd start a distro where every command has its interface rewritten to conform to a command line HIG. Single-letter flags would always mean only one thing, common long flags would be consistent, and no new tools would be added to the distro until they conformed. But at this point everyone's used to (and more importantly, the entire system relies on) the weird mismatches and historical leftovers from older commands. Too bad!


daxelrod 3 days ago


One sort-of attempt at this is GNU's Coding Standards.

Long and Short Options:

General Interfaces:

Command Line Interfaces:

Program Argument Syntax:


ramses0 3 days ago


How to be Unix-y in Eleventy-Billion Steps.

""" The two surprising finds in the above documents are the standard list of long options and short options from -a to -z.

Forver and a day I am trying to figure out what to name my program options and these two guides definitely help. It allows me to definitively say you should use -c … for “command” instead of -r … for “run” because -r means recurse or reverse. """



voltagex_ 3 days ago

link lists alternatives for each short option, so which do you choose?


userbinator 3 days ago


Actually most of them are quite consistent since POSIX published guidelines for it - and the only inconsistencies are historical exceptions:

(I'm not so convinced that long options are a good thing, as evidenced by the --extended-regexp/--regexp-extended and other little "was it spelt this way or that?" type of confusions. It's not hard to remember single letters, especially if they're mnemonic.)


pimlottc 2 days ago


Long options are very nice to use in scripts as they are somewhat self-documenting. Compare:

    curl -kLIiso


    curl --insecure --location --head --include --silent --output www.example.rog

And of course as a practical matter, with short opts you'll run out of characters eventually, and meaningful mnemonics before that.


dTal 2 days ago


I've often thought this - that is funny is a terrible embarrassment. I would also like to add that manpage syntax help should be standardized and machine-parseable. I had an idea recently to auto-generate GUIs for command line tools from the manpage syntax line, but it turned out that while such lines look precise but cryptic, they are often in fact highly ambiguous, nonstandard, and still cryptic. This seems broken to me.


james2vegas 2 days ago


Blame man(7), have a look at mdoc(7): Semantic markup for command line utilities: Nm : start a SYNOPSIS block with the name of a utility; Fl : command line options (flags) (>=0 arguments); Cm : command modifier (>0 arguments); Ar : command arguments (>=0 arguments); Op, Oo, Oc : optional syntax elements (enclosure); Ic : internal or interactive command (>0 arguments); Ev : environmental variable (>0 arguments); Pa : file system path (>=0 arguments)


JetSpiegel? 2 days ago


Linux kernel tools solves this, but the help looks cryptic and is difficult to parse by a human(such as myself).

I can't find documentation on what I mean, but try ip --help


jzwinck 3 days ago


You're right, myriad popular tools are not totally consistent (ls -h and du -h are similar but grep -h is very different). There is a bit of hope however--the GNU folks have documented lots of the options currently in use so you can try to find one that fits when you build new tools:



Additional tip: if writing a tool that prints a list of file names, provide a -0 option that prints them separated by '\x0' rather than white space. Then the output can be piped through xargs -0 and it won't go wrong if there are files with spaces in their paths.

I suggest -0 for symmetry with xargs. find calls it -print0, I think.

(In my view, this is poor design on xargs's part; it should be reading a newline-separated list of unescaped file names, as produced by many versions of ls (when stdout isn't a tty) and find -print, and doing the escaping itself (or making up its own argv for the child process, or whatever it does). But it's too late to fix now I suppose.)


Animats 3 days ago


1978 called. It wants its pipes back.

That approach dates from the days when you got multi-column directory listings with

  ls | mc

Putting multi-column output code in "ls" wasn't consistent with the UNIX philosophy.

There's a property of UNIX program interconnection that almost nobody thinks about. You can feed named environment variables into a program, but you can't get them back out when the program exits. This is a lack. "exit()" should have taken an optional list of name/value pairs as an argument, and the calling program (probably a shell) should have been able to use them. With that, calling programs would be more like calling subroutines.

PowerShell? does something like that.


grosskur 2 days ago


You can simulate this with so-called "Bernstein chaining". Basically, each program takes another program as an argument, and finishes by calling exec() on it rather than exit(), which preserves the environment. See:

Or write environment variables to stdout in Bourne shell syntax so the caller call run "eval" on it. Like ssh-agent, for example.


agumonkey 2 days ago


Oh wow, unix continuation passing style. Never heard of that o_o;


gohrt 2 days ago


Continuation Passing Style!


oneeyedpigeon 2 days ago


I agree that the column formatting code shouldn't be in ls. However, if it were removed (which it won't ever be, of course: theoretical) I would want every system I ever access via a terminal to somehow alias ls to "ls

mc". To support full working of ls, though, that can't just be a straight alias, so I need a shell script to handle things like parameters to ls, which itself is then aliased to ls ... is that really better?


4ad 2 days ago


In Plan 9 programs return strings instead of numeric codes.


dap 3 days ago


Lots of great points here, but as always, these can be taken too far. Header lines are really useful for human-readable output, and can be easily skipped with an optional flag. (-H is common for this).

The "portable output" thing is especially subjective. I buy that it probably makes sense for compilers to print full paths. But it's nice that tools like ls(1) and find(1) use paths in the same form you gave them on the command-line (i.e., absolute pathnames in output if given absolute paths, but relative pathnames if given relative paths). For one, it means that when you provide instructions to someone (e.g., a command to run on a cloned git repo), and you want to include sample output, the output matches exactly what they'd see. Similarly, it makes it easier to write test suites that check for expected stdout contents. And if you want absolute paths in the output, you can specify the input that way.


zaptheimpaler 2 days ago


I also think headers should be included. Its really annoying to go pore through a man page just to see what the columns mean. You could use flags, or maybe send headers to STDERR.


arh68 2 days ago


I think it's insane to restrict programs to just STDOUT & STDERR. Why 2? Why not use another file descriptor, maybe STDFMT, to capture all the formatting markup? This would avoid -0 options (newlines are markup sent to stdfmt, all strings on stdout are 0-terminated), it would avoid -H options (headers go straight to STDFMT), it would allow for less -R to still work, etc.

It's possible other descriptors would be useful, like stdlog for insecure local logs, stddebug for sending gobs of information to a debugger. It's certainly not in POSIX, so too bad, but honestly stdout is hard to keep readable and pipe-able. Adding just one more file descriptor separates the model from the view.