proj-plbook-plChNockLang

Table of Contents for Programming Languages: a survey

Nock

Although it is not as popular as LLVM, RPython, the JVM, or the CLR, I placed Nock in its own chapter because of it's novelty (it's functional) and minimalism.

Goals: a functional-style virtual machine, with a very simple specification, yet also practically executable.

Pros:

Cons:

People: C. Guy Yarvin

Primitive data types: non-negative integers, binary trees with non-negative integer leaves

Retrospectives:

Introduction

The Nock programming language may be seen as elaboration of SK combinator calculus in the same way that assembly language may be seen as an elaboration of Turing machines. It has two instructions that correspond directly to the S and K of SK combinator calculus:

It has two primitive data types, non-negative integers, and binary trees (whose leaf nodes are integers).

The other primitive instructions are not necessary for universal computation, but make programming more convenient. The other primitive instructions are:

For further convenience, Nock also provides 5 more instructions which can be defined in terms of the preceding primitives:

(note: i contributed an abbreviated variant of the above text to Wikipedia)

A somewhat short (but somewhat challenging) introduction to Nock may be found at [1].

Data literal constructors and primitive operations

(NonNegativeIntegers?, BinaryTrees?, successor, isLeaf, isStructurallyEqual, select)

where successor(a number) = that number + 1, isLeaf(a number) = 1, isLeaf(interior node of a tree) = 0, and select takes two arguments, a list and a number, and the number it treated as an index into the (binary) tree in the manner explained in http://www.urbit.org/2013/08/22/Chapter-2-nock.html ; the subtree (or leaf node, which would be a natural number) at that index is returned.

Data literal idiosyncracies

There is no empty list. The results of isTree use the convention 1 is False, 0 is True.

Primitive instruction set

todo this section isn't clear if you don't already know Nock... as notes to myself it is useful though

select, identity, nock, isLeaf, successor, isStructurallyEqual

The 'nock' primitive instruction requires further explanation. See 'The nock primitive instruction', below.

Note that 'select' and 'identity' take their input directly from their argument, whereas isLeaf, successor, isStructurallyEqual take input by first evaluating their argument as a Nock subexpression in the context of the current state (and 'nock' treats its argument as a pair of Nock subexpressions, each of which are first evaluated in the context of the current state), where by 'argument' i mean the third position in the Nock instruction format, and by 'current state' i mean the first position in the Nock instruction format.

Each primitive instruction is represented by a numerical opcode, from 0 to 5.

Implicit primitive instruction cons

There is no opcode for a 'cons' operator or for any other way to construct a binary tree. However, Nock programs can indeed cons; the cons operator, instead of being represented explicitly by an opcode, is represented implicitly in an idiosyncratic/clever way in the representation of a Nock program.

A Nock program is represented by a binary tree with non-negative integer leaves. Most Nock instructions are of the form [a OPCODE c], where 'a' and 'c' are arbitary binary trees which are interpreted as arguments to the instruction, and OPCODE is a number that specifies the instruction.

To write a program instruction that will execute 'cons(a, b)', you put a tree in the middle OPCODE position, instead of a number. An instruction of the form [a [b c] [d e]] produces the following binary tree as a result: [eval([a b c]) eval([a d e])].

Macro instructions

if, compose, push-and-compose, call-on-result-of, hint

'if' takes three arguments (in addition to a state argument), the first of which must evaluate to 0 or 1; the result is either the second or the third argument depending on what the first argument evaluated to.

'compose' takes two arguments (in addition to a state argument), and applies the first argument to the state, and then applies the second argument to the result.

'push-and-compose' takes two arguments (in addition to a state argument), and applies the first argument to the state, then pushes the result onto the state to produce a new state, and then applies the second argument to the new state.

'call-on-result-of' takes two arguments (in addition to a state argument), and applies the first argument to the state to produce a temporary data structure called a 'core', then selects into this 'core' using the second argument as an index, and evaluates the result.

'hint' takes two arguments (in addition to a state argument), and simply throws the first argument away and evaluates the second argument applied to the state. If the first argument is not a number but a tree, then some computation is supposed to be done involving the first argument, after which the result of this computation is ignored: the head of the tree is discarded and the tail of the tree is applied to the state and then discarded. The idea of 'hint' is to provide hints to the compiler/interpreter but without affecting the actual result of the computation. When the first argument is not a number but a tree, we have a 'computed hint' which is not hardcoded but which is itself the result of some computation.

Each macro instruction is represented by a numerical opcode, from 6 to 10.

Syntax

Nock programs are expression encoded as abstract syntax trees.

A syntactically valid Nock node is a tree of the form [subject verb object], where [verb object] satisfies the 'formula syntactic validity condition' below. Each of subject, verb, object may be numbers or trees. Note that, because we are considering these trees to be binary, [subject verb object] is a shorthand for [subject [verb object]]. A pair [verb object] is called a 'formula'.

formula syntactic validity condition: One of the following (a) or (b) is true: (a) verb is a valid Nock opcode, and object is a binary tree which, considered as a linked list, is at least long enough to provide enough arguments to the instruction represented by the opcode in verb; or (b) 'verb' is of the form '[verb2 object2] [verb3 object3]', and this syntactic validity condition (recursively) holds for each of those two formulas.

Presumably in order to achieve extreme conciseness, the Nock specification doesn't actually contain the concept of syntactically valid/invalid Nock nodes; any node is taken as legitimate source code but 'invalid' trees merely cause the execution of a trivial infinite loop.

Note also that since Nock nodes may be dynamically constructed and evaluated during execution, it is not in general possible to determine at compile time whether a given Nock expression will give rise to and try to execute a syntactically invalid Nock expression as it is running, although i imagine that most programmers will constrain themselves to only use a restricted set of Nock idioms which can be proven not to have this problem.

Evaluation strategy

Nock is a functional language based on expression evaluation.

To evaluate a syntactically valid Nock node of the form [subject verb object]:

If verb is an opcode, then one follows the reduction rule specified by its opcode. Note that many of the reduction rules are recursive, that is, in order to compute the reduction you must first compute the reduction of another Nock expression.

If our node is of the form [subject [verb2 object2] [verb3 object3]], then the result is [eval([subject verb2 object2]) eval([subject verb3 object3])].

According to the specification, the result of evaluating a syntactically invalid Nock node is to get the same expression back; so one might say that the Nock specification does not distinguish between invalid syntax and programs which trivially loop; however in practise a compiler or interpreter would easily be able to distinguish what i am here calling syntactically valid Nock nodes.

As mentioned above, because the reduction rules are recursive, the reduction of a Nock node proceeds by expanding a Nock node into new Nock nodes that must be evaluated in order to compute the final result. In other words, an interpreter would be computing new Nock expressions on the fly and then evaluating them. It is possibly for these new Nock expressions to be syntactically invalid.

The subject

The 'subject' in Nock represents state. As a functional language, there is no state in Nock except what is passed explicitly, e.g. in the value of the subject. All Nock primitive instructions that require recursive evaluation (that is, all of them except for 'select' and 'identity') pass in the same state that they were given to the recursive evaluation.

However, it is possible for a Nock expression to cause the evaluation of another Nock evaluation in the context of a modified state. A Nock expression explicitly represents the state in the subject, so to form one expression that invokes a subexpression with modified state, all you have to do is to place something other than the subject that you were given into the subject position in the subexpression. It is convenient to speak of a formula being applied to the subject, or equivalently to the state. This facility is used in the macro instructions. For example, we noted above that 'compose' takes two arguments (in addition to a state argument), and applies the first argument to the state, and then applies the second argument to the result. So, in the subexpression that calls for the evaluation of the second argument, the subject position is filled by the result of evaluating the application of the first argument to the state.

Nock does not define any facilities for I/O; it is a pure language.

The nock primitive instruction

Definition

The 'nock primitive instruction', whose opcode is 2, is found in nodes of the form:

[subject 2 [verb1 object1] [verb2 object2]]

It is evaluated to:

eval([eval(subject verb1 object1) eval(subject verb2 object2)])

The subexpression eval(subject verb2 object2) should evaluate to a syntactically valid formula [verb3 object3], so that the result of the inner evaluations is a syntactically valid Nock formula, that is, recalling that [subject3 [verb3 object3]] can be written in the shorthand [subject3 verb3 object3],

eval(subject verb1 object1) = subject3 eval(subject verb2 object2) = [verb2 object3] [eval(subject verb1 object1) eval(subject verb2 object2)] = [subject3 verb3 object3]

Contrast with other the other instructions

Primitive instructions 'select' and 'identity' do not require recursive evaluation. Primitive instructions 'isTree' and 'successor' and 'isStructurallyEqual' cause recursive evaluation of their single argument (applied to the subject but untransformed) before they apply their characteristic operation. The implicit primitive operation cons causes recursive evaluation of both of its arguments (applied to the subject but untransformed).

All of those operations share the following form: possibly, they first fully reduce (recursively evaluate) each of evaluate their arguments separately, and then they do something to the result that can be computed without further recursive Nock evaluation, something that can be accomplished in a number of steps that scales at most linearly with the size of the fully reduced arguments. Clearly, no such set of such instructions could be Turing-equivalent, because a reduction of an expression that only involved them could always be completed in time linear in the size of the expression.

All of those instructions also share the property that one can could check their arguments at 'compile time' to make sure they are syntactically valid formulas.

The nock primitive instruction is different. Reduction of a Nock instruction produces a top-level Nock node which still has to be evaluated (the outer 'eval' in 'eval([eval(subject verb1 object1) eval(subject verb2 object2)])'), even after the evaluation of its arguments (the inner 'eval's in that same expression). This is in contrast to every other primitive instruction.

Each of the Nock macro instructions contains the nock primitive instruction, directly or indirectly, and so they may also cause recursive evaluation.

Use in loops

Note first that for any subject, the Nock expression '[subject select 1]' evaluates to 'subject'; we write eval([subject select 1]) = subject.

This implies that any Nock expression of the form:

[subject 2 [0 1] x] or, written with names instead of opcodes: [subject nock [select 1] x]

will reduce as follows:

eval(eval([subject select 1]) eval([subject x]))

eval([subject eval([subject x])])

That is, it causes the evaluation of a Nock expression with the same subject as originally, and with a formula that is given by the result of evaluating an operation on that subject. If we place program code (Nock expressions) within the subject, we can write x so that it dynamically selects this code; so the innermost eval will select some code out of the subject, and then the outermost eval will execute it. However, the code from the subject still has access to the subject, and so if it chooses, it can select its own code from the subject again, and execute that. For this sort of thing to work, we must not only evaluate an argument which has already been evaluated (like eval), but we must have a way to pass in the same subject both times.

For instance, the following Nock expression causes an infinite loop when evaluated: [[[a 0 1] [0 1]] 2 [a 0 1] [0 1]]. A nicer way of writing that, using names instead of opcodes, might be:

[[nock [select 1] [select 1]] nock [select 1] [select 1]]

This is of the form:

subject: [nock [select 1] [select 1]] verb: nock argument1: [select 1] argument2: [select 1]

Note that the subject is identical to the formula (recall that in Nock, 'formula' means [verb object] within the Nock expression form [subject [verb object]], and that [subject verb argument1 argument2] is shorthand for [subject [verb [argument1 argument2]]], so [verb argument1 argument2] = [verb [argument1 argument2]] is of the form [verb object]).

What happens when we evaluate this expression? The verb is not an opcode but a tree, so the top level operation is actually an implicit cons. Reducing this, we get:

[subject nock [select 1] [select 1]]

[eval([subject select 1]) eval([subject select 1])]

[subject subject]

[[nock [select 1] [select 1]] [nock [select 1] [select 1]]]

[[nock [select 1] [select 1]] nock [select 1] [select 1]]

which is identical to what we started with.

Contrast with 'eval' primitive in other languages

The first thing i thought of when i say the nock primitive instruction was other languages' 'eval', but whereas 'eval' takes one argument, nock takes two arguments, both of which are recursively evaluated, and also a 'subject', which is duplicated and used in the evaluation of both arguments.

This is a critical difference, because it allows the nock instruction (in conjuction with 'select') to provide the power needed for looping. The duplication of the subject is needed so that program code from the subject can be copied and consumed in each loop iteration yet still available for the next iteration.

Why does nock need to take two arguments and evaluate both of them, rather than just taking one argument and resulting in eval([subject eval(argument1)]? If we wanted a loop that progressed, rather than just infinitely looping as in the example above, we would need to update some sort of state that persists between loop iterations. But we'd still want to execute the same loop code in each iteration. The subject serves as this persistent state, and formula selects the code to be executed in the next iteration. So the evaluation of the first argument is what updates the state, and the evaluation of the second argument is like a continuation which tells us what to do in the next iteration.

Contrast with 'apply' primitive in other languages

In lambda calculus, for example, we have an infinite loop of the form:

(lambda x . x x) (lambda x . x x)

(lambda x . x x) (lambda x . x x)

The implicit primitive operation here is 'apply'. We could write this:

apply((lambda x . x x), (lambda x . x x))

Salient differences between the nock primitive and apply are:

Comparison with 'S' primitive in SKI combinatory calculus

The nock primitive operation corresponds exactly to the S combinator in combinatory calculus. The S combinator is defined:

Sxyz = xz(yz)

The 'z' corresponds to the subject in the nock primitive; the 'z' is copied and both arguments are applied to it, and then the first result is applied to the second result. The inner eval in the nock primitive corresponds to the xz and yz applications, and the outer eval corresponds to the application of xz to yz.

Interestingly, the K combinator (which takes an argument and then ignores a second argument, yielding the first argument: Kxy = x), also corresponds to the 'identity' primitive operation in Nock, which takes the argument and then ignores the subject. The S and K combinators form a Turing-universal basis. This provides another way to look at the Nock primitive operations; the nock and identity primitives already form a Turing-universal basis, and then the select, successor, isLeaf, and isStructurallyEqual primitives, and the cons implicit primitive, add the minimal operations necessary for operating with non-negative integers and binary trees.

So Nock can be thought of as the SKI combinatory calculus endowed with more useful primitive data types (so that you don't have to use unnatural encodings, e.g. Church numerals, for data), as well as with a notation that, by explicitly and prominently placing the 'subject' in every expression, is a little clearer.

Alternate Nock notation

We can rewrite Nock expressions into more a familiar notation by:

')
, and indenting

Now the reduction of the infinite loop above becomes:

eval[NOK [SEL 1] [SEL 1]

NOK [SEL 1] [SEL 1]]

eval[eval[[SEL 1]

NOK [SEL 1] [SEL 1]]
eval[[[SEL 1]
NOK [SEL 1] [SEL 1]]]]

eval[NOK [SEL 1] [SEL 1]

NOK [SEL 1] [SEL 1]]

or, if we define 'subject' to be NOK [SEL 1] [SEL 1] and don't always have newlines with all '

's:

eval[NOK [SEL 1] [SEL 1]

subject]

eval[eval[[SEL 1]

subject]
eval[[[SEL 1]subject]]]

eval[NOK [SEL 1] [SEL 1]

subject]

A functional assembly language

The point of Nock is to be a 'functional assembly language', that is, the analog of assembly language for a computational paradigm in which mutation doesn't exist, but instead there is expression reduction (or one might say, a combinatorial assembly language, for a computational paradigm based on the combinatory calculus instead of Turing machines or lambda calculus).

While the execution of a Nock program doesn't make use of mutation (which is expensive to emulate, from the perspective of a machine whose hardware was combinatory-calculus-like), instead it requires one to keep track of many intermediate results each of which might be as large as the inital state input; on a real-world machine (which has Turing-machine-ish hardware) a naive implementation would involve making a copy of the entire state input upon almost every operation (of course a more sophisticated implementation would use pointers into the initial state object, and record the result of functions applied to it, or parts of it, as diffs against (parts of the) original). My point is that these are kind of dual to each other; Turing machines want you to mutate things, whereas combinatory-calculus-machines want you to be making zillions of copies of large expressions; the naive way to implement either of these within the other is computationally expensive.

On space and time complexity

A single Nock operation might have large space and time requirements if naively emulated on a Turing machine model. Nock provides a structural equality operation, which may be applied to arbitrarily large binary trees.

In addition, the Nock instruction 2 (the "Nock" instruction, or S combinator) 'copies' the state, which can be an arbitrarily large binary trees, into two subexpressions.

In real implementations, perhaps pointers could be use to reduce the need to physically copy a large binary tree when a 'copy' is made, and even if one or the other trees were modified, common subexpressions could be shared ("copy on write"). If unchanged common subexpressions of these 'divergent copies' were later compared with structural equality, the implementation could use pointer equality to recognize the commonality and avoid doing a full recursive compare of each node in the subexpression. The possibility of this optimization could be one motivation for including structural equality as a primitive instruction in Nock.

Variants

Sam Atman created a variant of Nock called Ax. I don't understand it, but as far as i can tell, the differences between Ax and Nock are:

I don't think this is the whole story, as the specification [2] of the first 6 primitives seems a bit longer than would be expected if this was it, but i haven't had time to go thru it yet, in particular i don't yet fully understand the lines:

Ξ [0]  →  Ξ [0]  
Ξ [0 0]  →  Ξ [0 0]
Ξ [0 0 0]        →   0
Ξ [0 1 0 0]      →   1    
Ξ [0 1 0 1]      →   2
Ξ [2 1 2 1]      →   3
Ξ [a 0 n]        →   n
Ξ [a 1 0 n]      →   n + 1 
Ξ [a 1 b]        →   Ξ [a b]   →   n   →   n + 1 
Ξ [3 3 [[2 1] [1 2 1]] [0 2 1]]   →   [3 4]

Links:

Links

Unfortunately the folks in charge of Urbit have a habit of periodically posting new Nock docs, then removing them from the public website and replacing them with a 'coming soon!' notice. Luckily you can still read the old docs from various public Github repos.