Revision 7 not available (showing current revision instead)

notes-computer-programming-hoon-hoonTutorialNotes

note: this is based on the OLD (deprecated) tutorial, see http://urbit.org/docs/dev/hoon/tutorial for the new one. The links below are archived versions.

TOC of the tutorial:

Chapter I - Intro To Hoon:

Chapter II - Moar Hoon Types:

Chapter III - Hoon Computes:

Chapter IV - Gates:

Chapter V - Tiles:

Chapter VI - Type Inference:

Chapter VII - Odds, Ends, Quirks:

---

notes on the tutorial

Chapter 0 - concepts from the Nock tutorial and elsewhere that are relevant here

http://urbit.org/doc/nock/tut/1/

" A noun is an atom or a cell. An atom is a natural number. A cell is an ordered pair of nouns. "

We write cells with square brackets, e.g. [a b]. As a notational shortcut, [2 6 14 15] is short for [2 [6 [14 15]]], i.e. brackets group to the right.

http://urbit.org/doc/nock/tut/3/

core battery, payload gate formula, sample, context arm (? not actually defined anywhere in the Nock tutorial, though this word is used. But http://urbit.org/doc/hoon/tut/3/ defines it as a symbol and a twig; presumably its part of a core) axis/axes

todo: where to put this: i think a.b is what, in other languages, would be written b.a, that is, start with a and then look at the b field within it.

Chapter I - Intro To Hoon

"Like all men in Babylon, I have been proconsul; like all, a slave." - The Lottery in Babylon

"Hoon is actually not much more than a fancy wrapper around Nock. People who know C can think of Hoon as the C to Urbit's Nock - just a sprinkling of syntax, wrapped around machine code and memory."

Principles of type

subjects are typed

Hoon is statically typed

"...instead of calculating tree axes by hand, we could actually assign names to different parts of the tree - and those names would stay the same as we pushed more data on the subject. ... The way we're going to do this is by associating something called a type with the subject"

"Hoon is a statically typed language, which just means that the type isn't a part of your program: it's just a piece of data the compiler keeps around as it turns your Hoon into Nock."

AST = 'twig'

the Hoon compiler takes an AST and a subject type, and computes a Nock formula and also the type of the product of executing that formula on a subject of that type

"there is no direct syntax for defining or declaring a type. ...Types are always produced by inference."

Printing types

Typing ":type; SOMETHING" prints the value of SOMETHING and then, on the line below it, its type, rendered intelligently as a string. E.g.

> :type; 42

42 @ud

Typing "-:!>(SOMETHING)" prints the actual internal type object. Don't do this, because "for a core, the type actually contains the entire codebase". "Black magic is evil magic - it must exact its evil price."

Atom examples

Use the syntax: `t`x

to cast value 'x' to type 't'.

The logic of smell

Hoon has two type systems, the 'odor' system, and the type system proper. The type system proper is the 'real' type system. In terms of the 'real' type system, many things that would be separate types in some programming languages are the same type in Hoon. For example, integers and strings are both just Hoon 'atoms'. The idea is that both of these things can be internally encoded as unsigned integers. (aside: if you know math, the idea that you can encode a string as a single integer is like the Godel encoding of sequences). But there are reasons of convenience to distinguish between integers and strings; for example, when the Hoon interpreter prints out the value of a variable (say 'x') containing an atom, it would be nice for it to know whether that atom is an integer atom or a string atom, and to print it differently in each case, so that the programmer can just type 'x' at the prompt to see the value of x, rather than always having to type stuff like 'printint x' or 'printstr x'. So, we have the 'odor' system, which is a second kind of type annotation that is attached to atoms to indicate what kind of atom they are, even though all atoms are really just unsigned integers.

Atoms of an odor can be coerced to a different odor, but coercion cannot happen in the type system proper (i think). Even without coercion, a subodor can be conversed to any of its superodors, and a superodor can be coerced to any of its subodors.

The names of odors start with '@'. The rest of the name must be ASCII. '@' by itself is the name for the 'odorless' odor.

An odor is a subodor of another odor if the superodor's name is a prefix of the subodor's name.

The smell of size

In addition to the above, there is an (optional) orthogonal dimension to odors that indicates the (maximium) size of the value, in bits. This is specified by appending a single capital letter to the rest of the odor name. 'A' means 1 bit, 'B' means 2 bits, 'C' means 4 bits, 'D' means 1 byte, etc, up to 'Z', which means 4MB.

You can build atoms bigger than 4MB, but in this case, just don't have a capital letter in the odor name.

You can coerce odors without size to any size. You can coerce odors of smaller size to odors of larger size, but you not the other way around. E.g. you can coerce odors of 1 byte ('D') to odors of 1K ('N'), but you can't coerce 1K to 1 byte.

Known and unknown odors

'@' followed by any ASCII text (without whitespace, i assume; in fact, must it be alphanumeric?) is a valid name for an odor. If the odor isn't defined by the language, that's okay, it's just a custom odor. Custom odors are called 'unknown odors'.

This allows you to say, e.g. 'udm' to indicate an unsigned decimal to be interpreted as miles, and 'udk' to indicate an unsigned decimal to be interpreted as kilometers.

Known odors

" Hoon knows about the following odors, with defined meanings:

@c              UTF-32 codepoint
@d              date
  @da           absolute date
  @dr           relative date (ie, timespan)
@f              yes or no (inverse boolean)
@n              nil
@p              phonemic base
@r              IEEE floating-point
  @rd           double precision  (64 bits)
  @rh           half precision (16 bits)
  @rq           quad precision (128 bits)
  @rs           single precision (32 bits)
@s              signed integer, sign bit low
  @sb           signed binary
  @sd           signed decimal
  @sv           signed base32
  @sw           signed base64
  @sx           signed hexadecimal
@t              UTF-8 text (cord)
  @ta           ASCII text (span)
    @tas        ASCII symbol (term)
@u              unsigned integer
  @ub           unsigned binary
  @ud           unsigned decimal
  @uv           unsigned base32
  @uw           unsigned base64
  @ux           unsigned hexadecimal

Each of these forms has a URL-safe syntax, which we'll get to. Each parses as an atomic constant in Hoon, and each is printed by the Hoon prettyprinter. "

The humble type

" a type in Hoon has two roles.

One, it defines a set of nouns. Any finite noun is either in this set, or not in it.

Two, it ascribes semantics to all nouns in this set. For example, a Hoon type exports a semantic namespace. "

Now the tutorial begins to introduce 'tile syntax'. Despite saying earlier that "there is no direct syntax for defining or declaring a type", actually the tile syntax is pretty much a syntax for specifying a type (although, to humans, the tile syntax is a language for defining types, the Hoon compiler doesn't (directly) interpret them that way; to Hoon, all tiles are converted to normal non-tile twigs through one of the tile reductions that we'll talk about later; the Hoon type checker only operates on these normal twigs, after the tiles have been converted). Tiles are covered in a later chapter, and also in http://doc.urbit.org/doc/hoon/lan/tile/ . An example is given:

++  type  $|  ?(%noun %void)
          $%  [%atom p=term]
              [%cell p=type q=type]
          ==

'++' means that what follows is a 'gate', which is like a function. If a gate is specified by a tile, that means that the function will be a 'normalizing' function that attempts to coerce values to the type indicated by the tile. '++ type' means that this gate is named 'type'. We are looking at an abbreviated version of a function called 'type' inside the interpreter.

'$

' is the 'reed' tile constructor. 'Reed' takes two tiles, and describes a type in terms of these two tiles; the first tile describes which atoms are in the type, and the second tile describes which cells are in the tile. In this case, the atom tile is '?(%noun %void)', and the cell type is everything from '$%' to '=='.

'?()' is the 'fern' tile constructor. It takes a list of tiles, and says that a value is in the type iff it is any of these tiles.

'$%' is the kelp tile constructor, and '==' closes the kelp constructor. A kelp is a discriminated union, and is defined as a list of labeled cases. Here, one case is labeled '%atom', and the other one is labeled '%cell'.

So, in sum, this is saying that if the value is an atom, then it is of type 'type' iff it is either %noun or %void. %noun and %void are names of tiles that indicate the type of all nouns, or the type %void respectively. And if the value is a cell, then it is of type 'type' iff either (a) it is of form 'term', in which case we'll call it an 'atom', or (b) it is of form [type type], in which case we'll call it a 'cell'.

You might ask, since both atoms and cells are nouns, why do we have a case for %noun but also a case for %atom and a case for %cell? I'm not sure, but i think because this expression isn't really defining types as much as it is defining a tile 'type' in terms of other tiles. Some tiles are of the form %noun (or the irregular form of %noun, '*'), and some tiles are of the form [%atom term], and some tiles are of the form [%cell type type] (todo is this right, or is that just the internal representation of these tiles?). Now, an atomic VALUE, for example '3', would match both the tiles %noun and also the tile [%atom 3].

A tile for the type 'any noun' is '*'.

The noble cube

Canonical atom syntaxes

"If some of these syntaxes seem contrived or odd, bear in mind: none of them collides with any of the others, and they are all URL-safe and more. The canonical atom forms use only lowercase characters, numbers, ., -, and ~. A cell form adds _."

..

Transparent cell syntax

Opaque noun syntax

Noncanonical syntaxes

Hoon symbol, @tas

%$ is the eempty symbol (empty string) as cubical @tas

Loobeans, @f

Cords, @t

Strings

Chapter II - Moar Hoon Types

"What good is a phone call if you're unable to speak?" - The Matrix

More simple types

%fork

%face

Name resolution

Wing resolution

Axis syntax

Resolving forks

Chapter III - Hoon Computes

"I've only been in love with a beer bottle and a mirror." - Sid Vicious

Syntax

Glyphs

Runes

Regular forms

Tall forms

Wide forms

Irregular forms

Semantics

A new testbed

Twigs

Natural and synthetic hoons

Cores

Preparing to decrement

Actually decrementing

Making it pretty

Exercises

Chapter IV - Gates

"Anyone who thinks he's original is merely ignorant." - Nicolás Gómez Dávila

Gates and "lambda"

We make a gate

We understand a gate

walk thru

< (deq a) :: call '(deq a)', after replacing the subject with the following:

++ deq :: define a gate named 'deq' =+ x=0 :: assign 0 to x
% :: define a core (e.g. add this core to the subject)
% :: define a core
  ++  $              ::   the core has a gate named '$'
    =+  y=0          ::   assign 0 to y
    =<  $            ::   call '$' (defined below), after replacing the subject with the following:
    |%               ::     define a core
    ++  $            ::     the core has a gate named '$', whose body is below
      ?:  =(x +(y))  ::       if x == (y+1)
        y            ::         then return y
      $(y +(y))      ::         otherwise, call $ (3 lines above), after replacing y with (y+1)
    --
  ----

now, with synthetic runes:

< (dec a) :: call '(deq a)', after replacing the subject with the following:

++ deq :: define a gate named 'deq' =+ x=0 :: assign 0 to x
% :: define a core (e.g. add this core to the subject)
. :: define a core, with one gate, named '$' (whose body is below)
  =+  y=0         ::   assign 0 to y
  |-              ::   define a core, with one gate, named '$' (whose body is below), and call it
  ?:  =(x +(y))   ::   if x == (y+1)
    y             ::     then return y
  $(y +(y))       ::     otherwise, call $ (3 lines above), after replacing y with (y+1)--

todo: in the second code example, why do we use '=+' instead of '^=', and what is the difference between '=+' and'^='?

We call a gate

(name arg) is an irregular form for (%- name args)

so '=< (deq a)' is short for:

< %- deq

    a

walk thru

++ twig :: 'twig' is a gate from hoon.hoon

% [%cnhp p=twig q=tusk] :: one kind of twig is something of the form [%- p q], where p is a twig and q is a tusk (todo why the %, is this a core?)
  ==

++ tusk (list twig) :: a tusk is a list of twigs

++ open :: 'open' is a gate from hoon.hoon ^- twig :: todo ?- gen :: switch/case on 'gen' (presumably 'gen' is a parameter being passed into 'open') [%cnhp *] :: the case where 'gen' is of the form [%- noun] ?~(q.gen [%tsgr p.gen [%cnzy %$]] [%cncl p.gen [%cltr q.gen]]) :: if q is nil, then return (=> p [%cnzy %$]) (cnzy is an internal twig that pulls the specified limb from the subject [1]). Else, return (%: p (:* q)) (that is, fetch gate '$' from core p, set its sample to the arguments given in q, converted to a tuple, and then evaluate that gate). :: [%cncl *] :: the case where 'gen' is of the form [%: noun] [%cnsg [%$ ~] p.gen q.gen] :: (%~ [%$ ~] p q). fetch gate '$' from core p with its sample set to q :: [%cnsg *] :: the case where 'gen' is of the form [%~ noun] [%cntr p.gen q.gen [[[[%& 6] ~] r.gen] ~]] :: (%* p q [[[[%& 6] ~] r] ~]) . todo what is %&? :: [%cntr *] :: the case where 'gen' is of the form [%* noun] :+ %tsls :: :+ means produce a cell, so (:+ %tsls q.gen (:+ %cnts x y)) yields [=< q [%= x y] (in this case x is an expression whose head is 'weld' and y is an expression whose head is 'turn') q.gen :+ %cnts (weld p.gen `wing`[[~ 2] ~]) :: 'weld' is list concatenate. Backquotes are a form of typecasting, `p`q is short for ^-(p q) (typecast q to the icon of tile p). So we are concetenating the list p with the wing defined by the single-element list whose element is [~ 2]. (turn r.gen

([p=wing q=twig] [p [%tsgr [~ 3] q]])) :: turn is 'transform'. todo
  ==

walk thru

++ twig :: 'twig' is a gate from hoon.hoon

% [%cnhp p=twig q=tusk] :: one kind of twig is something of the form [%- p q], where p is a twig and q is a tusk (todo why the %, is this a core?)
  ==

++ open :: 'open' is a gate from hoon.hoon ^- twig :: todo ?- gen :: switch/case on 'gen' (presumably 'gen' is a parameter being passed into 'open') [%cnhp *] :: the case where 'gen' is of the form [%- noun] :+ %tsls q.gen :: :+ means produce a cell, so (:+ =< q (:+ => [%cnzy %$] x)) yields [=< q [=> [%cnzy %$] x]], that is, =<(q =>(%cnzy($) x)), that is, evaluate q with the subject obtained by evaluating x with the subject obtained by (%cnzy $), that is, evaluate q with the subject obtained by evaluating x with the subject obtained by pulling $ from the current subject (where x is the line below starting with %cnhp). :+ %tsgl [%cnzy %$] [%cnhp [[%& 2] ~] [[[%& 6] ~] [%tsgl [~ 3] p.gen]] ~] :: %- [[%& 2] ~] [[[%& 6] ~] [%tsgl [~ 3] p.gen]]. I think this means, call the head of the current subject (the formula, assuming the subject is a gate) after replacing the head of its tail (the sample, assuming the subject is a gate) with =>([~ 3] p), which i don't understand. todo ==

walk thru (note don't read this alone, this only makes sense in context of the part of the tutorial it is talking about)

    $.-(+< =>(+ a))       ::  replace the sample and invoke

this is a Wing with two Limbs, separated by a '.'. We read Wings from right to left, so first we evaluate "-(+< =>(+ a))", then we evaluate '$' on that.

"-(+< =>(+ a))" is of the form "W(W2 Z)", where W is the wing to be operated on (interpreted relative to the current subject), W2 is a wing specifying a location within W, and Z is a value. W(W2 Z) means "Start at the subject. Traverse wing W. Call the subtree found here 'w'. Now traverse wing W2. Now replace whatever you find there with Z to form a new w, call it w'. Return w'.".

So, we start at the subject, and we traverse W, which is "-", which means to take the head. The head of the subject is currently 'deq' (which is a gate). We're going to make a replacement within 'deq', generating a new, changed version of deq (call it deq2). Now we traverse W2, which is "+<", which is read from left to right, and which says to "first take the tail, then take the head". This means "head(tail)" or in other words "the head of the tail". We are currently starting at a gate, and the head of the tail of a gate is that gate's sample.

Now we are going to swap out the W2 of w with Z in order to make a w', which in our case means that we are going to swap out the sample of deq with "=>(+ a)" in order to make a deq2.

"=>(+ a)" means "a, with respect to the subject +" (where + is itself taken with respect to our current subject). + is the +-<> axis limb syntax for 'tail', so "+ with respect to the current subject" means the tail of the current subject. The current subject is whatever it was before we added 'deq', but with 'deq' prepended to that. So, the tail of the current subject means the old subject, e.g. the subject of our caller before we started messing with things with "=< =+ deq". We have to do that, because as the tutorial says, "the programmer who wrote a meant to write a twig against the original subject, not against the cell of gate and subject"; in other words, 'a' is an expression defined with respect to a subject, and it was meant to be interpreted with respect to the subject as it was just before we were called, that is, just before the line "=< =+ deq".

So, to summarize "-(+< =>(+ a))", "=>(+ a)" evaluates 'a' (with respect to the subject that the expression 'a' was meant to be evaluated against); then we take deq and find its sample and replace that sample with the result of evaluating 'a', to form deq2. deq2 is just deq but with its sample replaced by (the evaluation of) a.

So now the whole expression looks like "$.deq2", where deq2 is as defined above. So now that we've computed deq2, all that remains is to follow the limb "$" within deq2. Since deq2 is a gate, this just means to invoke the gate.

So, to summarize, we evaluated a, created a copy of deq, deq2, with its sample replaced by the evaluation of a, then we invoked the gate deq2.

Sequel hook

Chapter V - Tiles

"Not to get knowledge, but to save yourself from having ignorance foisted upon you." - Carlyle

Tiles

Tiles are patterns that specify a type.

Reductions

Bunting

~(bunt al sec) appears to be the irregular form of this rune:

%~: %~(p q r) "pulls p from the tray q with its sample set to r". Irregular syntax ~(p q r). [2]

So then what is 'al' here? todo.

walk thru

++ twig $% [%brts p=tile q=twig] :: one type of twig is of the form

(p q)
          ==

walk thru

++ open :: with gate 'open' in hoon.hoon ^- twig :: todo ?- gen :: switch/case on variable 'gen' [%bctr *] [%ktsg ~(bunt al p.gen)] :: $*(p) reduces as: ^~(%~(bunt al p)), that is, at compile time, pull 'bunt' from tray 'al' with its sample set to 'p' [%brcb *] [%tsls [%bctr p.gen] [%brcn q.gen]] ::

_(p q) reduces at: =<($*(p)%(q)), that is, use the product of q as the subject of $*(p)
    [%brts *]  [%brcb p.gen (~(put by *(map term foot)) %$ [%ash q.gen])]
       :: |=(p q) reduces as: |_(p (~(put by *(map term foot)) %$ [%ash q.gen])). todo (what is put? map? foot?; *() is an irregular form of $*)
  ==

Clamming

Fishing

Whipping

Tile syntax and abuse

The tiles

[%axil p=base]

[%leaf p=term q=@]

[p=tile q=tile]

[%bark p=term q=tile]

[%reed p=tile q=tile]

[%kelp p=[i=line t=(list line)]]

[%bush p=tile q=tile]

[%fern p=[i=tile t=(list tile)]]

[%herb p=twig]

[%weed p=twig]

Chapter VI - Type Inference

"Ever, as before, does Madness remain a mysterious-terrific, altogether infernal boiling-up of the Nether Chaotic Deep, through this fair-painted Vision of Creation, which swims thereon, which we name the Real." - Carlyle

On type inference algorithms

Low-powered type inference

Hoon's type inference system has three disadvantages:

Upon first reading about the infinite loop, i immediately thought, 'that's terrible, this type inference system is fatally flawed and unfit for use'. The author says "this never happens once you know what you're doing" but i was skeptical. But, after reading more, i've come to agree that the possible situations where the inference algorithm may hang sound like they are easily avoided, so i now think it may be a good system. The parts that talk about where the inference algorithm may hang are the sections 'Pseudolazy evaluation: %hold' and 'How we use %hold'. In short, in Hoon you can build recursive types, which is how you construct the type for a lazy list (an infinite list). When descending into a recursive type, the type inference algorithm will check if the node it's about to expand is just a recurrent instance (an exact repetition) of any ancestral node, and if so, it won't descend into that node (so it will avoid getting caught in an infinite loop). However, it is possible in theory to construct a type that is infinite but doesn't repeat exactly, in which case the type inference algorithm would hang. But this wouldn't happen accidentally because you can't do this without using the "%hold" construct for recursive type construction. You can avoid using %hold directly by instead only using prebuild lazy constructs such as lazy lists. So this is indeed avoidable; just pretend %hold is an implementation detail rather than part of the language, don't use it, and then the type inference algorithm terminates.

Facts to keep in mind

If you want to read the source code for the semantics of Hoon, it's in the file hoon.hoon, gate ++ut, gate ++mint. Note that "gol - which looks for all the world like a mechanism for backward inference, is not. It is semantically irrelevant and only exists to get better localization on error reports."

What mint does is map [subject-type hoon-twig] to [product-type nock-formula].

walkthru

++  open     :: this is a gate named 'open'
  ^-  twig   :: the output type of this gate is 'twig'
  ?-  gen    :: switch on variable 'gen'
    [%tsls *]  [%tsgr [p.gen [~ 1]] q.gen]  :: if 'gen' is of the form [%tsls *] e.g. =+ [p q],
                    :: then return [%tsgr [p.gen [~ 1]] q.gen]
                    :: e.g. => [[p [~ 1]] q]
  ==

todo: i don't understand what the [~ 1] is for there for, but presumably it's concatenating p onto the current subject.

"=+(a b) is =>([a .] b)"

that is, =+ (the hoon that adds a to the current subject and then executes b) is a macro for =>([a .] b), which means "replace the current subject with [a .] and then execute b", where the . in [a .] refers to the current subject.

walkthru

++  mint                             :: this is a gate named 'mint'
  |=  [sut=type gen=twig]            :: it takes two input arguments, 'sut' which is of type 'type', and 'gen' which is of type 'twig'
  ^-  [p=type q=nock]                :: it yields two outputs, 'p' which is of type 'type', and 'q' which is of type 'nock'
  ?-    gen                          :: switch on 'gen'
      [%tsgr *]                      :: case gen is of the form =>(p q)
    =+  fid=$(gen p.gen)               :: let fid=mint(gen=gen.p) 
                                       ::   that is, $ means a recursive call to this function ('mint'). 
                                       ::   In the call, we are leaving parameter 'sut'
                                       ::   as-is and reassigning 'gen' to 'gen'. Since 'mint''s output is of the form
                                       ::   [p=type q=nock], that means that the contents of 'fid' will now be of that form.
                                       ::   So, here we are (generating Nock code to)
                                       ::   evaluate p in the context of
                                       ::   a subject of the type of the current subject; p.fid is the 
                                       ::   type of the product, and q.fid is the product.
    =+  dov=$(sut p.fid, gen q.gen)    :: let dov=mint(gen=gen.q, sut=fid.p) 
                                       ::   So, here we are (generating Nock code to) 
                                       ::   evaluate q in the context of a subject
                                       ::   of the same type as the product of p.
    [p.dov (comb q.fid q.dov)]         :: return [dov.p (comb fid.q dov.q)]
  ==                                 :: end case

todo: i don't understand the syntax of '[dov.p (comb fid.q dov.q)]'. Does p.dov evaluate to a function, and then we call that function with arguments '(comb fid.q dov.q)'? if so, does '(comb fid.q dov.q)' mean "call function 'comb' with 2 positional arguments, fid.q and dov.q"? If so, why doesn't the call use the same syntax as $(gen p.gen), where we have a list of pairs of (argument name, argument value)? Or, is this just a cell with two elements, the first element is p.dov, and then second element is the result of calling 'comb' with two arguments, q.fid and q.dov?

'comb' takes as input two pieces of Nock code (named 'mal' and 'buz'), and returns Nock code for their composition, by which is meant, execute mal against the current state, then execute buz against the result of that. Nock code for their composition is just [7 mal buz], which would be really easy, but 'comb' also includes a check for a bunch of special cases; if the code happens to fall in one of these, 'comb' perform some minor simplifications on its output, perhaps to make the result more efficient, or more readable/idiomatic.

++  comb                                :: this is a gate named 'comb'
  |=  [mal=nock buz=nock]               :: it takes two input arguments, 'mal' and 'buz', both of type nock
  ^-  nock                              :: it produces one output, of type nock
  ?:  &(?=([0 *] mal) !=(0 p.mal))      :: if mal is of the form [0 *], AND p.mal is not zero
    ?:  &(?=([0 *] buz) !=(0 p.buz))    ::   if buz is of the form [0 *], AND p.buz is not zero
                                        ::     (if we've reached here, then both mal and buz are of the form [0 *], e.g. selections
      [%0 (peg p.mal p.buz)]            ::     return [0 (peg p.mal p.buz)]
                                        ::       (presumably 'peg' is composing the selections in mal and buz)
    ?:  ?=([2 [0 *] [0 *]] buz)         ::   if buz is of the form [2 [0 *] [0 *]]
                                        ::     (if we've reached here, then both mal is of the form [0 *], e.g. a selection,
                                        ::     and buz is of the form [2 [0 *] [0 *]], e.g. do some selections and then
                                        ::     eval the result)
      [%2 [%0 (peg p.mal p.p.buz)] [%0 (peg p.mal p.q.buz)]]
                                        ::     return [2 [0 (peg p.mal p.p.buz)] [0 (peg p.mal p.q.buz)]]
                                        ::       i.e., pre-compute the composition of the selections 
                                        ::             (recall that Nock instruction 0 is selection)
                                        ::       then do [2 a b] where a and b are the results of those composed selections.
    [%7 mal buz]                        ::   else (buz is neither of the form [0 *], nor of the form [2 [0 *] [0 *]]
                                        ::     return [7 mal buz] (recall that Nock instruction 7 is composition)
  ?:  ?=([^ [0 1]] mal)                 :: if mal is of the form [^ [0 1]], that is, if mal is a cell whose first element 
                                        ::   is a cell and whose second element is is [0 1],
    [%8 p.mal buz]                      ::     then return [8 p.mal buz].
                                        ::     If mal is a cell, then its Nock evaluation will be
                                        ::     an autocons of the result of evaluating each of the two elements in the cell.
                                        ::     The first element of the result cell will the result of evaluating p.mal.
                                        ::     The second element of the result cell will the result of evaluating [0 1],
                                        ::     which is just /[a 1] (where 'a' is the current state), which is just
                                        ::     a. In other words, the result of the Nock evaluation of mal will be the concatenation
                                        ::     of the evaluation of p.mal under the current state, with the current state. If we then
                                        ::     replace the current state with that result, and then execute buz, the overall effect will
                                        ::     be identical to Nock operator 8 applied to p.mal and buz.
  ?:  =([0 1] buz)                      :: if buz is of the form [0 1],
    mal                                 :: then return mal
                                        ::   we are composing mal and buz, and buz says 'just return the state that i was given', and
                                        ::   the state that we gave the buz was the result of executing mal. So the code to accomplish this is
                                        ::   just to execute mal; buz was a no-op.
  [%7 mal buz]                          :: or, if none of the other 'if's were hit, just return [7 mal buz]

todo: here i interpreted the != as 'not equal to', because i think '&' is logical AND, so both input arguments to the & should be booleans (loobeans), but this is at odds with http://doc.urbit.org/doc/hoon/lan/rune/#zaptis , which says that != takes a twig and returns a noun of that twig's formula. Which is it?

todo: if ?=([2 [0 *] [0 *]] buz) is true, then why doesn't p.buz refer to '2'?

note: if you are looking at the Nock spec in http://doc.urbit.org/doc/nock/tut/1/ or http://doc.urbit.org/doc/nock/tut/2/, you might be wondering, why am i saying that [0 1] is the Nock code for /[1 a], when the spec says that the Nock code for that should be [a 0 1]? The answer is that there are really two forms of Nock code, one in which the state is explicitly given as the first element of each instruction cell, and one in which the state is implicit. The spec is written in the former format, but we are working in the latter format. The latter format is, i think, called a "Nock formula" but i'm not sure. Todo: bring this up and ask if some text about this distinction could be added to to http://doc.urbit.org/doc/nock/tut/1 to clarify this.

Pseudolazy evaluation: %hold

++  type  $|  ?(%noun %void)                     :: this is a gate named 'type'
                                                 :: $| is the reed tile constructor. Reeds are atom/cell tiles.
                                                 :: $| takes two arguments, first, the type of the thing if it's an atom, second, the
                                                 :: type of the thing it it's a cell.
                                                 :: ?() is the fern tile constructor (?() is an irregular form of $?).
                                                 :: ferns are plain selection tiles, e.g. their icons are forks.
                                                 :: ?(%noun %void) means something whose type is either %noun or %void.
                                                 :: So, $| ?(%noun %void) B means something whose type, if it's an atom,
                                                 :: is either %noun or %void, and whose type, if it's a cell, is B.
          $%  [%atom p=term]                     :: $% is the kelp tile constructor. Kelps are tag selection tiles (discriminated union). 
                                                 :: They take a list of cases, one on each line, each case with a label.
                                                 :: Here, the first case in the list is [%atom p=term]. The label is %atom
                                                 :: (here, because this kelp is listing types, the label, %atom, is the type stem).
                                                 :: After the label, we have an element 'term' whose axis is labeled 'p'.
                                                 :: Semantically, recall that %atom types are parameterized by a term, which is the
                                                 :: odor of the atom.
              [%cell p=type q=type]              :: The next case in the kelp is labeled %cell. After the label, there are two elements,
                                                 :: type and type, with axes 'p' and 'q' respectively.
                                                 :: Semantically, recall that %cell types are parameterized by two types, which are the
                                                 :: type of each element in the cell.
              [%core p=type q=coil]              :: Recall that %core types have a type and a coil. Todo, see http://doc.urbit.org/doc/hoon/tut/3/
              [%cube p=* q=type]                 :: %cube types ('constant' types with only a single value in them) are characterized by
                                                 :: a noun that says what the constant is ('p=*', recall '*' is the (axil) tile for any noun),
                                                 :: and also a type, which is the type of that constant.
              [%face p=term q=type]              :: %face types are labeled values. 
                                                 :: the term p is the label, and the type q is the type of the vale.
              [%fork p=type q=type]              :: %fork types are things which could be of type p, or of type q (union types)
              [%hold p=(list ,[p=type q=twig])]  :: todo
          ==                                     :: this == terminates the $%

How we use %hold

Where we use %hold

%hold is used in lots of places. The most common use is when a wing resolves to an arm. Rather than immediately evaluate the core in the arm; rather, we make a %hold.

You'll see things that look like type signatures (with the ^- rune). But these aren't type signatures; only typecasts. When the compiler wants to tell what type a gate produces, it doesn't look at any type signature, instead of involkes ++play, which is like ++mint, except that it only computes the output type for the code it is passed, instead of the both the output type and the Nock formula for the code it is passed, which is what ++mint does (also, i think that ++play doesn't bother to verify correctness).

potential mistake or at least a misunderstanding on my part: "Logically this iteration is always the same as q:(mint p q), but in actual practice ++mint is making a formula we don't need. Also, we only need to actually verify correctness when we generate a formula - not when we traverse pseudolazy types. Hence, we have the lighter ++play gate for traversal." -- shouldn't that be p:(mint p q)? because isn't the first output of mint the inferred type of the result?

Head and tail recursion

"p:(mint sub ~(bunt al p.gen))": 'gen' is the ^- rune and its input arguments; p.gen is the tile argument and q.gen is the twig argument. i dunno what 'al' is, todo. ~() is the irregular form of %~; ~(p q r) means "pulls p from the tray q with its sample set to r". Irregular syntax [3]