notes-computer-programming-hoon-hoonTutorialNotes

note: this is based on the OLD (deprecated) tutorial, see http://urbit.org/docs/dev/hoon/tutorial for the new one. The links below are archived versions.

TOC of the tutorial:

Chapter I - Intro To Hoon:

Chapter II - Moar Hoon Types:

Chapter III - Hoon Computes:

Chapter IV - Gates:

Chapter V - Tiles:

Chapter VI - Type Inference:

Chapter VII - Odds, Ends, Quirks:

---

notes on the tutorial

Chapter 0 - concepts from the Nock tutorial and elsewhere that are relevant here

http://urbit.org/doc/nock/tut/1/

" A noun is an atom or a cell. An atom is a natural number. A cell is an ordered pair of nouns. "

We write cells with square brackets, e.g. [a b]. As a notational shortcut, [2 6 14 15] is short for [2 [6 [14 15]]], i.e. brackets group to the right.

http://urbit.org/doc/nock/tut/3/

core battery, payload gate formula, sample, context arm (? not actually defined anywhere in the Nock tutorial, though this word is used. But http://urbit.org/doc/hoon/tut/3/ defines it as a symbol and a twig; presumably its part of a core) axis/axes

todo: where to put this: i think a.b is what, in other languages, would be written b.a, that is, start with a and then look at the b field within it.

Chapter I - Intro To Hoon

"Like all men in Babylon, I have been proconsul; like all, a slave." - The Lottery in Babylon

"Hoon is actually not much more than a fancy wrapper around Nock. People who know C can think of Hoon as the C to Urbit's Nock - just a sprinkling of syntax, wrapped around machine code and memory."

Principles of type

subjects are typed

Hoon is statically typed

"...instead of calculating tree axes by hand, we could actually assign names to different parts of the tree - and those names would stay the same as we pushed more data on the subject. ... The way we're going to do this is by associating something called a type with the subject"

"Hoon is a statically typed language, which just means that the type isn't a part of your program: it's just a piece of data the compiler keeps around as it turns your Hoon into Nock."

AST = 'twig'

the Hoon compiler takes an AST and a subject type, and computes a Nock formula and also the type of the product of executing that formula on a subject of that type

"there is no direct syntax for defining or declaring a type. ...Types are always produced by inference."

Printing types

Typing ":type; SOMETHING" prints the value of SOMETHING and then, on the line below it, its type, rendered intelligently as a string. E.g.

> :type; 42

42 @ud

Typing "-:!>(SOMETHING)" prints the actual internal type object. Don't do this, because "for a core, the type actually contains the entire codebase". "Black magic is evil magic - it must exact its evil price."

Atom examples

Use the syntax: `t`x

to cast value 'x' to type 't'.

The logic of smell

Hoon has two type systems, the 'odor' system, and the type system proper. The type system proper is the 'real' type system. In terms of the 'real' type system, many things that would be separate types in some programming languages are the same type in Hoon. For example, integers and strings are both just Hoon 'atoms'. The idea is that both of these things can be internally encoded as unsigned integers. (aside: if you know math, the idea that you can encode a string as a single integer is like the Godel encoding of sequences). But there are reasons of convenience to distinguish between integers and strings; for example, when the Hoon interpreter prints out the value of a variable (say 'x') containing an atom, it would be nice for it to know whether that atom is an integer atom or a string atom, and to print it differently in each case, so that the programmer can just type 'x' at the prompt to see the value of x, rather than always having to type stuff like 'printint x' or 'printstr x'. So, we have the 'odor' system, which is a second kind of type annotation that is attached to atoms to indicate what kind of atom they are, even though all atoms are really just unsigned integers.

Atoms of an odor can be coerced to a different odor, but coercion cannot happen in the type system proper (i think). Even without coercion, a subodor can be conversed to any of its superodors, and a superodor can be coerced to any of its subodors.

The names of odors start with '@'. The rest of the name must be ASCII. '@' by itself is the name for the 'odorless' odor.

An odor is a subodor of another odor if the superodor's name is a prefix of the subodor's name.

The smell of size

In addition to the above, there is an (optional) orthogonal dimension to odors that indicates the (maximium) size of the value, in bits. This is specified by appending a single capital letter to the rest of the odor name. 'A' means 1 bit, 'B' means 2 bits, 'C' means 4 bits, 'D' means 1 byte, etc, up to 'Z', which means 4MB.

You can build atoms bigger than 4MB, but in this case, just don't have a capital letter in the odor name.

You can coerce odors without size to any size. You can coerce odors of smaller size to odors of larger size, but you not the other way around. E.g. you can coerce odors of 1 byte ('D') to odors of 1K ('N'), but you can't coerce 1K to 1 byte.

Known and unknown odors

'@' followed by any ASCII text (without whitespace, i assume; in fact, must it be alphanumeric?) is a valid name for an odor. If the odor isn't defined by the language, that's okay, it's just a custom odor. Custom odors are called 'unknown odors'.

This allows you to say, e.g. 'udm' to indicate an unsigned decimal to be interpreted as miles, and 'udk' to indicate an unsigned decimal to be interpreted as kilometers.

Known odors

" Hoon knows about the following odors, with defined meanings:

@c              UTF-32 codepoint
@d              date
  @da           absolute date
  @dr           relative date (ie, timespan)
@f              yes or no (inverse boolean)
@n              nil
@p              phonemic base
@r              IEEE floating-point
  @rd           double precision  (64 bits)
  @rh           half precision (16 bits)
  @rq           quad precision (128 bits)
  @rs           single precision (32 bits)
@s              signed integer, sign bit low
  @sb           signed binary
  @sd           signed decimal
  @sv           signed base32
  @sw           signed base64
  @sx           signed hexadecimal
@t              UTF-8 text (cord)
  @ta           ASCII text (span)
    @tas        ASCII symbol (term)
@u              unsigned integer
  @ub           unsigned binary
  @ud           unsigned decimal
  @uv           unsigned base32
  @uw           unsigned base64
  @ux           unsigned hexadecimal

Each of these forms has a URL-safe syntax, which we'll get to. Each parses as an atomic constant in Hoon, and each is printed by the Hoon prettyprinter. "

The humble type

" a type in Hoon has two roles.

One, it defines a set of nouns. Any finite noun is either in this set, or not in it.

Two, it ascribes semantics to all nouns in this set. For example, a Hoon type exports a semantic namespace. "

Now the tutorial begins to introduce 'tile syntax'. Despite saying earlier that "there is no direct syntax for defining or declaring a type", actually the tile syntax is pretty much a syntax for specifying a type (although, to humans, the tile syntax is a language for defining types, the Hoon compiler doesn't (directly) interpret them that way; to Hoon, all tiles are converted to normal non-tile twigs through one of the tile reductions that we'll talk about later; the Hoon type checker only operates on these normal twigs, after the tiles have been converted). Tiles are covered in a later chapter, and also in http://doc.urbit.org/doc/hoon/lan/tile/ . An example is given:

++  type  $|  ?(%noun %void)
          $%  [%atom p=term]
              [%cell p=type q=type]
          ==

'++' means that what follows is a 'gate', which is like a function. If a gate is specified by a tile, that means that the function will be a 'normalizing' function that attempts to coerce values to the type indicated by the tile. '++ type' means that this gate is named 'type'. We are looking at an abbreviated version of a function called 'type' inside the interpreter.

'$

' is the 'reed' tile constructor. 'Reed' takes two tiles, and describes a type in terms of these two tiles; the first tile describes which atoms are in the type, and the second tile describes which cells are in the tile. In this case, the atom tile is '?(%noun %void)', and the cell type is everything from '$%' to '=='.

'?()' is the 'fern' tile constructor. It takes a list of tiles, and says that a value is in the type iff it is any of these tiles.

'$%' is the kelp tile constructor, and '==' closes the kelp constructor. A kelp is a discriminated union, and is defined as a list of labeled cases. Here, one case is labeled '%atom', and the other one is labeled '%cell'.

So, in sum, this is saying that if the value is an atom, then it is of type 'type' iff it is either %noun or %void. %noun and %void are names of tiles that indicate the type of all nouns, or the type %void respectively. And if the value is a cell, then it is of type 'type' iff either (a) it is of form 'term', in which case we'll call it an 'atom', or (b) it is of form [type type], in which case we'll call it a 'cell'.

You might ask, since both atoms and cells are nouns, why do we have a case for %noun but also a case for %atom and a case for %cell? I'm not sure, but i think because this expression isn't really defining types as much as it is defining a tile 'type' in terms of other tiles. Some tiles are of the form %noun (or the irregular form of %noun, '*'), and some tiles are of the form [%atom term], and some tiles are of the form [%cell type type] (todo is this right, or is that just the internal representation of these tiles?). Now, an atomic VALUE, for example '3', would match both the tiles %noun and also the tile [%atom 3].

A tile for the type 'any noun' is '*'.

The noble cube

Canonical atom syntaxes

"If some of these syntaxes seem contrived or odd, bear in mind: none of them collides with any of the others, and they are all URL-safe and more. The canonical atom forms use only lowercase characters, numbers, ., -, and ~. A cell form adds _."

..

Transparent cell syntax

Opaque noun syntax

Noncanonical syntaxes

Hoon symbol, @tas

%$ is the eempty symbol (empty string) as cubical @tas

Loobeans, @f

Cords, @t

Strings

Chapter II - Moar Hoon Types

"What good is a phone call if you're unable to speak?" - The Matrix

More simple types

%fork

%face

Name resolution

Wing resolution

Axis syntax

Resolving forks

Chapter III - Hoon Computes

"I've only been in love with a beer bottle and a mirror." - Sid Vicious

Syntax

Glyphs

Runes

Regular forms

Tall forms

Wide forms

Irregular forms

Semantics

A new testbed

Twigs

Natural and synthetic hoons

Cores

Preparing to decrement

Actually decrementing

Making it pretty

Exercises

Chapter IV - Gates

"Anyone who thinks he's original is merely ignorant." - Nicolás Gómez Dávila

Gates and "lambda"

We make a gate

We understand a gate

walk thru

< (deq a) :: call '(deq a)', after replacing the subject with the following:

++ deq :: define a gate named 'deq' =+ x=0 :: assign 0 to x
% :: define a core (e.g. add this core to the subject)
% :: define a core
  ++  $              ::   the core has a gate named '$'
    =+  y=0          ::   assign 0 to y
    =<  $            ::   call '$' (defined below), after replacing the subject with the following:
    |%               ::     define a core
    ++  $            ::     the core has a gate named '$', whose body is below
      ?:  =(x +(y))  ::       if x == (y+1)
        y            ::         then return y
      $(y +(y))      ::         otherwise, call $ (3 lines above), after replacing y with (y+1)
    --
  ----

now, with synthetic runes:

< (dec a) :: call '(deq a)', after replacing the subject with the following:

++ deq :: define a gate named 'deq' =+ x=0 :: assign 0 to x
% :: define a core (e.g. add this core to the subject)
. :: define a core, with one gate, named '$' (whose body is below)
  =+  y=0         ::   assign 0 to y
  |-              ::   define a core, with one gate, named '$' (whose body is below), and call it
  ?:  =(x +(y))   ::   if x == (y+1)
    y             ::     then return y
  $(y +(y))       ::     otherwise, call $ (3 lines above), after replacing y with (y+1)--

todo: in the second code example, why do we use '=+' instead of '^=', and what is the difference between '=+' and'^='?

We call a gate

(name arg) is an irregular form for (%- name args)

so '=< (deq a)' is short for:

< %- deq

    a

walk thru

++ twig :: 'twig' is a gate from hoon.hoon

% [%cnhp p=twig q=tusk] :: one kind of twig is something of the form [%- p q], where p is a twig and q is a tusk (todo why the %, is this a core?)
  ==

++ tusk (list twig) :: a tusk is a list of twigs

++ open :: 'open' is a gate from hoon.hoon ^- twig :: todo ?- gen :: switch/case on 'gen' (presumably 'gen' is a parameter being passed into 'open') [%cnhp *] :: the case where 'gen' is of the form [%- noun] ?~(q.gen [%tsgr p.gen [%cnzy %$]] [%cncl p.gen [%cltr q.gen]]) :: if q is nil, then return (=> p [%cnzy %$]) (cnzy is an internal twig that pulls the specified limb from the subject [1]). Else, return (%: p (:* q)) (that is, fetch gate '$' from core p, set its sample to the arguments given in q, converted to a tuple, and then evaluate that gate). :: [%cncl *] :: the case where 'gen' is of the form [%: noun] [%cnsg [%$ ~] p.gen q.gen] :: (%~ [%$ ~] p q). fetch gate '$' from core p with its sample set to q :: [%cnsg *] :: the case where 'gen' is of the form [%~ noun] [%cntr p.gen q.gen [[[[%& 6] ~] r.gen] ~]] :: (%* p q [[[[%& 6] ~] r] ~]) . todo what is %&? :: [%cntr *] :: the case where 'gen' is of the form [%* noun] :+ %tsls :: :+ means produce a cell, so (:+ %tsls q.gen (:+ %cnts x y)) yields [=< q [%= x y] (in this case x is an expression whose head is 'weld' and y is an expression whose head is 'turn') q.gen :+ %cnts (weld p.gen `wing`[[~ 2] ~]) :: 'weld' is list concatenate. Backquotes are a form of typecasting, `p`q is short for ^-(p q) (typecast q to the icon of tile p). So we are concetenating the list p with the wing defined by the single-element list whose element is [~ 2]. (turn r.gen

([p=wing q=twig] [p [%tsgr [~ 3] q]])) :: turn is 'transform'. todo
  ==

walk thru

++ twig :: 'twig' is a gate from hoon.hoon

% [%cnhp p=twig q=tusk] :: one kind of twig is something of the form [%- p q], where p is a twig and q is a tusk (todo why the %, is this a core?)
  ==

++ open :: 'open' is a gate from hoon.hoon ^- twig :: todo ?- gen :: switch/case on 'gen' (presumably 'gen' is a parameter being passed into 'open') [%cnhp *] :: the case where 'gen' is of the form [%- noun] :+ %tsls q.gen :: :+ means produce a cell, so (:+ =< q (:+ => [%cnzy %$] x)) yields [=< q [=> [%cnzy %$] x]], that is, =<(q =>(%cnzy($) x)), that is, evaluate q with the subject obtained by evaluating x with the subject obtained by (%cnzy $), that is, evaluate q with the subject obtained by evaluating x with the subject obtained by pulling $ from the current subject (where x is the line below starting with %cnhp). :+ %tsgl [%cnzy %$] [%cnhp [[%& 2] ~] [[[%& 6] ~] [%tsgl [~ 3] p.gen]] ~] :: %- [[%& 2] ~] [[[%& 6] ~] [%tsgl [~ 3] p.gen]]. I think this means, call the head of the current subject (the formula, assuming the subject is a gate) after replacing the head of its tail (the sample, assuming the subject is a gate) with =>([~ 3] p), which i don't understand. todo ==

walk thru (note don't read this alone, this only makes sense in context of the part of the tutorial it is talking about)

    $.-(+< =>(+ a))       ::  replace the sample and invoke

this is a Wing with two Limbs, separated by a '.'. We read Wings from right to left, so first we evaluate "-(+< =>(+ a))", then we evaluate '$' on that.

"-(+< =>(+ a))" is of the form "W(W2 Z)", where W is the wing to be operated on (interpreted relative to the current subject), W2 is a wing specifying a location within W, and Z is a value. W(W2 Z) means "Start at the subject. Traverse wing W. Call the subtree found here 'w'. Now traverse wing W2. Now replace whatever you find there with Z to form a new w, call it w'. Return w'.".

So, we start at the subject, and we traverse W, which is "-", which means to take the head. The head of the subject is currently 'deq' (which is a gate). We're going to make a replacement within 'deq', generating a new, changed version of deq (call it deq2). Now we traverse W2, which is "+<", which is read from left to right, and which says to "first take the tail, then take the head". This means "head(tail)" or in other words "the head of the tail". We are currently starting at a gate, and the head of the tail of a gate is that gate's sample.

Now we are going to swap out the W2 of w with Z in order to make a w', which in our case means that we are going to swap out the sample of deq with "=>(+ a)" in order to make a deq2.

"=>(+ a)" means "a, with respect to the subject +" (where + is itself taken with respect to our current subject). + is the +-<> axis limb syntax for 'tail', so "+ with respect to the current subject" means the tail of the current subject. The current subject is whatever it was before we added 'deq', but with 'deq' prepended to that. So, the tail of the current subject means the old subject, e.g. the subject of our caller before we started messing with things with "=< =+ deq". We have to do that, because as the tutorial says, "the programmer who wrote a meant to write a twig against the original subject, not against the cell of gate and subject"; in other words, 'a' is an expression defined with respect to a subject, and it was meant to be interpreted with respect to the subject as it was just before we were called, that is, just before the line "=< =+ deq".

So, to summarize "-(+< =>(+ a))", "=>(+ a)" evaluates 'a' (with respect to the subject that the expression 'a' was meant to be evaluated against); then we take deq and find its sample and replace that sample with the result of evaluating 'a', to form deq2. deq2 is just deq but with its sample replaced by (the evaluation of) a.

So now the whole expression looks like "$.deq2", where deq2 is as defined above. So now that we've computed deq2, all that remains is to follow the limb "$" within deq2. Since deq2 is a gate, this just means to invoke the gate.

So, to summarize, we evaluated a, created a copy of deq, deq2, with its sample replaced by the evaluation of a, then we invoked the gate deq2.

Sequel hook

Chapter V - Tiles

"Not to get knowledge, but to save yourself from having ignorance foisted upon you." - Carlyle

Tiles

Tiles are patterns that specify a type.

Reductions

Bunting

~(bunt al sec) appears to be the irregular form of this rune:

%~: %~(p q r) "pulls p from the tray q with its sample set to r". Irregular syntax ~(p q r). [2]

So then what is 'al' here? todo.

walk thru

++ twig $% [%brts p=tile q=twig] :: one type of twig is of the form

(p q)
          ==

walk thru

++ open :: with gate 'open' in hoon.hoon ^- twig :: todo ?- gen :: switch/case on variable 'gen' [%bctr *] [%ktsg ~(bunt al p.gen)] :: $*(p) reduces as: ^~(%~(bunt al p)), that is, at compile time, pull 'bunt' from tray 'al' with its sample set to 'p' [%brcb *] [%tsls [%bctr p.gen] [%brcn q.gen]] ::

_(p q) reduces at: =<($*(p)%(q)), that is, use the product of q as the subject of $*(p)
    [%brts *]  [%brcb p.gen (~(put by *(map term foot)) %$ [%ash q.gen])]
       :: |=(p q) reduces as: |_(p (~(put by *(map term foot)) %$ [%ash q.gen])). todo (what is put? map? foot?; *() is an irregular form of $*)
  ==

Clamming

Fishing

Whipping

Tile syntax and abuse

The tiles

[%axil p=base]

[%leaf p=term q=@]

[p=tile q=tile]

[%bark p=term q=tile]

[%reed p=tile q=tile]

[%kelp p=[i=line t=(list line)]]

[%bush p=tile q=tile]

[%fern p=[i=tile t=(list tile)]]

[%herb p=twig]

[%weed p=twig]

Chapter VI - Type Inference

"Ever, as before, does Madness remain a mysterious-terrific, altogether infernal boiling-up of the Nether Chaotic Deep, through this fair-painted Vision of Creation, which swims thereon, which we name the Real." - Carlyle

On type inference algorithms

Low-powered type inference

Hoon's type inference system has three disadvantages:

Upon first reading about the infinite loop, i immediately thought, 'that's terrible, this type inference system is fatally flawed and unfit for use'. The author says "this never happens once you know what you're doing" but i was skeptical. But, after reading more, i've come to agree that the possible situations where the inference algorithm may hang sound like they are easily avoided, so i now think it may be a good system. The parts that talk about where the inference algorithm may hang are the sections 'Pseudolazy evaluation: %hold' and 'How we use %hold'. In short, in Hoon you can build recursive types, which is how you construct the type for a lazy list (an infinite list). When descending into a recursive type, the type inference algorithm will check if the node it's about to expand is just a recurrent instance (an exact repetition) of any ancestral node, and if so, it won't descend into that node (so it will avoid getting caught in an infinite loop). However, it is possible in theory to construct a type that is infinite but doesn't repeat exactly, in which case the type inference algorithm would hang. But this wouldn't happen accidentally because you can't do this without using the "%hold" construct for recursive type construction. You can avoid using %hold directly by instead only using prebuild lazy constructs such as lazy lists. So this is indeed avoidable; just pretend %hold is an implementation detail rather than part of the language, don't use it, and then the type inference algorithm terminates.

Facts to keep in mind

If you want to read the source code for the semantics of Hoon, it's in the file hoon.hoon, gate ++ut, gate ++mint. Note that "gol - which looks for all the world like a mechanism for backward inference, is not. It is semantically irrelevant and only exists to get better localization on error reports."

What mint does is map [subject-type hoon-twig] to [product-type nock-formula].

walkthru

++  open     :: this is a gate named 'open'
  ^-  twig   :: the output type of this gate is 'twig'
  ?-  gen    :: switch on variable 'gen'
    [%tsls *]  [%tsgr [p.gen [~ 1]] q.gen]  :: if 'gen' is of the form [%tsls *] e.g. =+ [p q],
                    :: then return [%tsgr [p.gen [~ 1]] q.gen]
                    :: e.g. => [[p [~ 1]] q]
  ==

todo: i don't understand what the [~ 1] is for there for, but presumably it's concatenating p onto the current subject.

"=+(a b) is =>([a .] b)"

that is, =+ (the hoon that adds a to the current subject and then executes b) is a macro for =>([a .] b), which means "replace the current subject with [a .] and then execute b", where the . in [a .] refers to the current subject.

walkthru

++  mint                             :: this is a gate named 'mint'
  |=  [sut=type gen=twig]            :: it takes two input arguments, 'sut' which is of type 'type', and 'gen' which is of type 'twig'
  ^-  [p=type q=nock]                :: it yields two outputs, 'p' which is of type 'type', and 'q' which is of type 'nock'
  ?-    gen                          :: switch on 'gen'
      [%tsgr *]                      :: case gen is of the form =>(p q)
    =+  fid=$(gen p.gen)               :: let fid=mint(gen=gen.p) 
                                       ::   that is, $ means a recursive call to this function ('mint'). 
                                       ::   In the call, we are leaving parameter 'sut'
                                       ::   as-is and reassigning 'gen' to 'gen'. Since 'mint''s output is of the form
                                       ::   [p=type q=nock], that means that the contents of 'fid' will now be of that form.
                                       ::   So, here we are (generating Nock code to)
                                       ::   evaluate p in the context of
                                       ::   a subject of the type of the current subject; p.fid is the 
                                       ::   type of the product, and q.fid is the product.
    =+  dov=$(sut p.fid, gen q.gen)    :: let dov=mint(gen=gen.q, sut=fid.p) 
                                       ::   So, here we are (generating Nock code to) 
                                       ::   evaluate q in the context of a subject
                                       ::   of the same type as the product of p.
    [p.dov (comb q.fid q.dov)]         :: return [dov.p (comb fid.q dov.q)]
  ==                                 :: end case

todo: i don't understand the syntax of '[dov.p (comb fid.q dov.q)]'. Does p.dov evaluate to a function, and then we call that function with arguments '(comb fid.q dov.q)'? if so, does '(comb fid.q dov.q)' mean "call function 'comb' with 2 positional arguments, fid.q and dov.q"? If so, why doesn't the call use the same syntax as $(gen p.gen), where we have a list of pairs of (argument name, argument value)? Or, is this just a cell with two elements, the first element is p.dov, and then second element is the result of calling 'comb' with two arguments, q.fid and q.dov?

'comb' takes as input two pieces of Nock code (named 'mal' and 'buz'), and returns Nock code for their composition, by which is meant, execute mal against the current state, then execute buz against the result of that. Nock code for their composition is just [7 mal buz], which would be really easy, but 'comb' also includes a check for a bunch of special cases; if the code happens to fall in one of these, 'comb' perform some minor simplifications on its output, perhaps to make the result more efficient, or more readable/idiomatic.

++  comb                                :: this is a gate named 'comb'
  |=  [mal=nock buz=nock]               :: it takes two input arguments, 'mal' and 'buz', both of type nock
  ^-  nock                              :: it produces one output, of type nock
  ?:  &(?=([0 *] mal) !=(0 p.mal))      :: if mal is of the form [0 *], AND p.mal is not zero
    ?:  &(?=([0 *] buz) !=(0 p.buz))    ::   if buz is of the form [0 *], AND p.buz is not zero
                                        ::     (if we've reached here, then both mal and buz are of the form [0 *], e.g. selections
      [%0 (peg p.mal p.buz)]            ::     return [0 (peg p.mal p.buz)]
                                        ::       (presumably 'peg' is composing the selections in mal and buz)
    ?:  ?=([2 [0 *] [0 *]] buz)         ::   if buz is of the form [2 [0 *] [0 *]]
                                        ::     (if we've reached here, then both mal is of the form [0 *], e.g. a selection,
                                        ::     and buz is of the form [2 [0 *] [0 *]], e.g. do some selections and then
                                        ::     eval the result)
      [%2 [%0 (peg p.mal p.p.buz)] [%0 (peg p.mal p.q.buz)]]
                                        ::     return [2 [0 (peg p.mal p.p.buz)] [0 (peg p.mal p.q.buz)]]
                                        ::       i.e., pre-compute the composition of the selections 
                                        ::             (recall that Nock instruction 0 is selection)
                                        ::       then do [2 a b] where a and b are the results of those composed selections.
    [%7 mal buz]                        ::   else (buz is neither of the form [0 *], nor of the form [2 [0 *] [0 *]]
                                        ::     return [7 mal buz] (recall that Nock instruction 7 is composition)
  ?:  ?=([^ [0 1]] mal)                 :: if mal is of the form [^ [0 1]], that is, if mal is a cell whose first element 
                                        ::   is a cell and whose second element is is [0 1],
    [%8 p.mal buz]                      ::     then return [8 p.mal buz].
                                        ::     If mal is a cell, then its Nock evaluation will be
                                        ::     an autocons of the result of evaluating each of the two elements in the cell.
                                        ::     The first element of the result cell will the result of evaluating p.mal.
                                        ::     The second element of the result cell will the result of evaluating [0 1],
                                        ::     which is just /[a 1] (where 'a' is the current state), which is just
                                        ::     a. In other words, the result of the Nock evaluation of mal will be the concatenation
                                        ::     of the evaluation of p.mal under the current state, with the current state. If we then
                                        ::     replace the current state with that result, and then execute buz, the overall effect will
                                        ::     be identical to Nock operator 8 applied to p.mal and buz.
  ?:  =([0 1] buz)                      :: if buz is of the form [0 1],
    mal                                 :: then return mal
                                        ::   we are composing mal and buz, and buz says 'just return the state that i was given', and
                                        ::   the state that we gave the buz was the result of executing mal. So the code to accomplish this is
                                        ::   just to execute mal; buz was a no-op.
  [%7 mal buz]                          :: or, if none of the other 'if's were hit, just return [7 mal buz]

todo: here i interpreted the != as 'not equal to', because i think '&' is logical AND, so both input arguments to the & should be booleans (loobeans), but this is at odds with http://doc.urbit.org/doc/hoon/lan/rune/#zaptis , which says that != takes a twig and returns a noun of that twig's formula. Which is it?

todo: if ?=([2 [0 *] [0 *]] buz) is true, then why doesn't p.buz refer to '2'?

note: if you are looking at the Nock spec in http://doc.urbit.org/doc/nock/tut/1/ or http://doc.urbit.org/doc/nock/tut/2/, you might be wondering, why am i saying that [0 1] is the Nock code for /[1 a], when the spec says that the Nock code for that should be [a 0 1]? The answer is that there are really two forms of Nock code, one in which the state is explicitly given as the first element of each instruction cell, and one in which the state is implicit. The spec is written in the former format, but we are working in the latter format. The latter format is, i think, called a "Nock formula" but i'm not sure. Todo: bring this up and ask if some text about this distinction could be added to to http://doc.urbit.org/doc/nock/tut/1 to clarify this.

Pseudolazy evaluation: %hold

++  type  $|  ?(%noun %void)                     :: this is a gate named 'type'
                                                 :: $| is the reed tile constructor. Reeds are atom/cell tiles.
                                                 :: $| takes two arguments, first, the type of the thing if it's an atom, second, the
                                                 :: type of the thing it it's a cell.
                                                 :: ?() is the fern tile constructor (?() is an irregular form of $?).
                                                 :: ferns are plain selection tiles, e.g. their icons are forks.
                                                 :: ?(%noun %void) means something whose type is either %noun or %void.
                                                 :: So, $| ?(%noun %void) B means something whose type, if it's an atom,
                                                 :: is either %noun or %void, and whose type, if it's a cell, is B.
          $%  [%atom p=term]                     :: $% is the kelp tile constructor. Kelps are tag selection tiles (discriminated union). 
                                                 :: They take a list of cases, one on each line, each case with a label.
                                                 :: Here, the first case in the list is [%atom p=term]. The label is %atom
                                                 :: (here, because this kelp is listing types, the label, %atom, is the type stem).
                                                 :: After the label, we have an element 'term' whose axis is labeled 'p'.
                                                 :: Semantically, recall that %atom types are parameterized by a term, which is the
                                                 :: odor of the atom.
              [%cell p=type q=type]              :: The next case in the kelp is labeled %cell. After the label, there are two elements,
                                                 :: type and type, with axes 'p' and 'q' respectively.
                                                 :: Semantically, recall that %cell types are parameterized by two types, which are the
                                                 :: type of each element in the cell.
              [%core p=type q=coil]              :: Recall that %core types have a type and a coil. Todo, see http://doc.urbit.org/doc/hoon/tut/3/
              [%cube p=* q=type]                 :: %cube types ('constant' types with only a single value in them) are characterized by
                                                 :: a noun that says what the constant is ('p=*', recall '*' is the (axil) tile for any noun),
                                                 :: and also a type, which is the type of that constant.
              [%face p=term q=type]              :: %face types are labeled values. 
                                                 :: the term p is the label, and the type q is the type of the vale.
              [%fork p=type q=type]              :: %fork types are things which could be of type p, or of type q (union types)
              [%hold p=(list ,[p=type q=twig])]  :: todo
          ==                                     :: this == terminates the $%

How we use %hold

Where we use %hold

%hold is used in lots of places. The most common use is when a wing resolves to an arm. Rather than immediately evaluate the core in the arm; rather, we make a %hold.

You'll see things that look like type signatures (with the ^- rune). But these aren't type signatures; only typecasts. When the compiler wants to tell what type a gate produces, it doesn't look at any type signature, instead of involkes ++play, which is like ++mint, except that it only computes the output type for the code it is passed, instead of the both the output type and the Nock formula for the code it is passed, which is what ++mint does (also, i think that ++play doesn't bother to verify correctness).

potential mistake or at least a misunderstanding on my part: "Logically this iteration is always the same as q:(mint p q), but in actual practice ++mint is making a formula we don't need. Also, we only need to actually verify correctness when we generate a formula - not when we traverse pseudolazy types. Hence, we have the lighter ++play gate for traversal." -- shouldn't that be p:(mint p q)? because isn't the first output of mint the inferred type of the result?

Head and tail recursion

"p:(mint sub ~(bunt al p.gen))": 'gen' is the ^- rune and its input arguments; p.gen is the tile argument and q.gen is the twig argument. i dunno what 'al' is, todo. ~() is the irregular form of %~; ~(p q r) means "pulls p from the tray q with its sample set to r". Irregular syntax [3]. So ~(bunt al p.gen) means "pull 'bunt' from tray 'al' with its sample set to 'p.gen'". This is apparently how you take the bunt of p.gen. So we're bunting the input tile argument. 'sub' is our subject type. So, we're bunting the tile we're being given, and then computing the type of that results if you execute that against a subject whose type is sub.

q:(mint sub q.gen) takes the twig we're given, and produces the Nock formula for that twig assuming the subject is of type 'sub'.

^- returns that type, and that formula.

"checking that p:(mint sub q.gen) nests in p:(mint sub ~(bunt al p.gen))"; ie checking that the type produced by the formula that we're returning is a subtype of the type we're returning.

todo: i don't understand what is meant by 'head recursion'. Is this a special kind of recursion, or just any recursion that is not tail recursive?

=+  b=0         :: let b = 0
|-              :: this is the beginning of our subloop
^-  @           :: this subloop returns something of type 'atom' 
                :: (ie cast the return of this subloop to type 'atom', 
                :: ie to type '@')
?:  =(a +(b))   :: if a == b+1
  b             :: then return b
$(b +(b))       :: else, recurse back to the |-, but replace b with b+1, and return whatever the |- returns

Branch analysis

The example in this section deals with what in Scala or Swift would be called Option types, and in Haskell called Maybe types. Namely, an alternative to nullable types in which a variable that can be null is considered to have a union type (sum type) which could be either the normal type of the variable, or the 'null type'; it is then a statically-knowable type error to use the variable in a way that assumes that it is not null; instead, to get to the variable's value, you must pattern match the variable against the null type, and branch on the result (one branch if it is null, one branch if it is not). In this way, the language helps to prevent careless errors arising from forgetting about the possibility of a null.

The section is called "branch analysis" because this mechanism is more general and can be used for things other than Option types; in general, it can be used to pattern match and branch on any sum type (union type), and the compiler will infer that the type in each branch is narrower than the original type before the pattern match; within each branch, the type is constrained by the pattern matching condition that would lead to that branch.

~waclux-tomwyc/try=> =foo `(list ,@)`[1 2 3 4 ~] :: set 'foo' equal to [1 2 3 4 ~], coerced to the type '(list ,@)',
                                                 :: which is a list whose elements are of type @
                                                 :: , is the irregular form of $, which produces a normalizing gate (clam)
                                                 :: so, ',@' is a gate that takes any input and produces atomic output,
                                                 :: such that is the input is atomic it just returns the input, and
                                                 :: otherwise it returns the default (bunt) of @, which is 0. (todo doublecheck)
                                                 :: 
                                                 :: in Hoon, a list of atoms is something that is either null, 
                                                 :: or of the form [i=@ t=(list ,@)], that is to say, it's either null, 
                                                 :: or it has a head ('i') and a tail ('t'), in which case 
                                                 :: the head is of type atom, and the tail is a list of atoms.
~waclux-tomwyc/try=> foo
~[1 2 3 4]
~waclux-tomwyc/try=> :type; foo
~[1 2 3 4]
it(@)                                            :: list of atoms? todo, doublecheck
~waclux-tomwyc/try=> i.foo                       :: dereference the field 'i' of 'foo'
! -type.it(@)                                    :: error: within -type.it(@)
! -find-limb.i                                   :: error: within -find-limb.i (error trying to find field 'i')
! find-fork                                      :: error: within find-fork (the error is that we tried to find
                                                 ::        something on a value of type fork?? todo)
! exit
~waclux-tomwyc/try=> ?~(foo 42 i.foo)            :: explained in the next paragraph; in short, 
                                                 :: ?~(a b c) returns b if a is nil, or otherwise returns c.
~waclux-tomwyc/try=> ?:(?=(~ foo) 42 i.foo)      :: ?~(p q r) is short for ?:(?=(~ p) q r)
                                                 :: ?:(p q r) is 'if p then q else r'
                                                 :: ?=(p q) is the predicate 'q is of type p' (todo: doublecheck)
                                                 :: so this says "if foo is of type ~, then 42, else i.foo"
1
~waclux-tomwyc/try=> :type; ?~(foo !! foo)       :: ?~(foo !! foo) means 'if foo is nil, then !!, else foo'
[i=1 t=~[2 3 4]]
[i=@ t=it(@)]

Returning something of type %void (as is done by rune !!) tells the type system to neglect this branch. This can be used to e.g. generate the type of a non-empty list. E.g. here, the type of "?~(foo !! foo)" is "[i=@ t=it(@)]"; this type does not admit the possibility of an empty list. If !! is ever evaluated at runtime, it causes a crash.

~waclux-tomwyc/try=> :type; ?~(foo %foobar foo)  :: if foo is nil, then %foobar, else foo
[i=1 t=~[2 3 4]]
{ %foobar [i=@ t=it(@)] }

Here, we've replaced the !! with %foobar. Unlike the special !!, %foobar is an ordinary value, so the type system has to content with the possibility that the result is either (if foo is nil) %foobar, or (if foo is not nil) foo. So the resulting type is a fork type (also known as union type, or sum type) (todo really? i don't see a %fork), which says that the result might be of type %foobar, or it might be of type [i=@ t=it(@)]. The reason that we have a union type here, compared to only [i=@ t=it(@)] for the type of '?~(foo !! foo)', is that !! is special.

braces in hoon type prettyprints are disjuncts, e.g. the type of "?~(foo %foobar foo)" is "{ %foobar [i=@ t=it(@)] }", meaning that it could be something of type %foobar, or it could be a non-empty list.

" the only hoons recognized in branch analysis are ?= (wuttis, %wtts, fish), ?& (wutpam, %wtpm, logical and), and ?

(wutbar, %wtbr, logical or). (Boolean logic is fully understood within the test itself, so the second twig in a ?& can depend on the result of the first.)

Of course, synthetic hoons that reduce to these work as well. However, we don't learn from any other test, even if we could "

~waclux-tomwyc/try=> ?:(?=(~ foo) 42 i.foo)
1
~waclux-tomwyc/try=> ?:(=(~ foo) 42 i.foo)
! -type.it(@)
! -find-limb.i
! find-fork
! exit

This example demonstrates that branch analysis learns from ?= but not from =. In the first example, '?:(?=(~ foo) 42 i.foo)' said "if foo is of type nil, then 42, else i.foo", and the branch analysis realized that if i.foo is evaluated, it is within a branch in which foo is guaranteed to not be nil. In the second example, '?:(=(~ foo) 42 i.foo)' said, 'if foo is equal to nil, then 42, else i.foo'. In the second example, even though we can look at it and prove that i.foo will only be evaluated in the case that foo is not equal to nil, the typechecker can't figure that out, and so gives a type error because it is worried that i.foo might be trying to reference the field 'i' of a nil. The typechecker can only understand branch conditions made of ?=, ?& and ?

(expressions containing 'and', 'or', and pattern match).
~zod/try=> ?:(?=(%foobar foo) 42 i.foo) :: if foo is of type %foobar, then 42, else i.foo
! mint-vain  :: ?= was testing for a condition that is statically known to be impossible
! exit

In this example, the compiler rejects a pattern match where the first branch is impossible (todo: would it reject a match where the second branch is impossible, too?). We are testing if 'foo' is of type '%foobar', but the compiler knows that 'foo' cannot be of type %foobar.

" Type inference in Hoon is anything but magic. It's a relatively simple algorithm, which you nonetheless need to understand if you're going to make it as a Hoon programmer. "It just works" and "do what I mean" are not in the building. "

Geometric and generic polymorphism

(bartis, %brts): build a gate with geometric polymorphism
* (bartar, %brtr), build a gate with generic polymorphism

"Which should you choose? There's a simple rule for this, which is that unless you are a wizard doing wizardly things, you should use

. Generic polymorphism is a way to build and use tools like containers (lists, etc)"

The difference is that

loses type information, e.g.

Consider the function f([a b]) = [b a]. Say the argument type ("sample tile") is [a=* b=*] (that is, both arguments are nouns). Assume the caller knows something more about a and b, e.g. that a is a %foo and b is a %bar. If the gate is built with

, the result type will just be [* *], but if it is guilt with *, the result type witll be [%bar %foo]. Example:

" ~zod/try=> :type; (

[7.496.034 7.303.014] [* *]
([a=* b=*] [b a]) %foo %bar)

~zod/try=> :type; (

[%bar %foo] [%bar %foo]
*([a=* b=*] [b a]) %foo %bar)

"

" The polymorphism model is not an attribute of the core, but of the arm - %ash means geometric, %elm means generic. "

todo: i don't understand what the advantage of

(geometric polymorphism) is; why wouldn't you just always use *?

Geometric polymorphism

++nest = isSuperTypeOf

(nest sut ref) = "can I take any noun in type ref, and use it as if it was within sut?"

"Polymorphism in Hoon supports four kinds of variance: %gold (invariant), %lead (bivariant), %zinc (covariant), and %iron (contravariant)."

todo i don't understand this:

" The %core frond is actually:

++ type $% [%core p=type q=coil] == ++ coil $: p=?(%gold %iron %lead %zinc) q=type r=[p=?(~ ^) q=(map term foot)] == ++ foot $% [%ash p=twig] [%elm p=twig] ==

..

First, q in the coil is the original payload type for use in a %hold. Hence geometric polymorphism. The question we have to answer whenever we use an arm is: is this core corrupt? As in: is the payload that's in it now geometrically compatible with the payload that its formulas were compiled with? Can we use the present payload as if it was the original, default payload?

It is actually not a type error of any kind to produce a modified core with the payload set to any kind of garbage at all. It's just a type error to use it - unless the payload is compatible. And when we use a geometric arm, we test this compatibility and then treat the present sample as if it was the original.

q.r in the coil is the map of arms. The polymorphism model is not an attribute of the core, but of the arm - %ash means geometric, %elm means generic.

p.r, if not null, is the actual generated battery of Nock formulas. Why do we need this at compile time? Remember %ktsg, which folds constants.

p.r is null while we are compiling the core's own arms, for obvious reasons (though we could get a little smarter about circularities), but once we complete this task we can put the battery in the type. The result is that, if we are building a reef of cores, we can fold arms of this core in the next one down.

For example, one of the main uses of %ktsg is simply to make default samples constant, so that we don't have to perform some simple but unnecessary computation every time we use a gate. Because we can only fold an arm in a completed core, a good general practice in building applications is to use a reef of at least two cores - inner for the tiles, outer for the functions. " and p is variance:

todo i don't understand this:

" For core A to nest within core B (any A can be used as a B), it seems clear that A should have the same set of arms as B (with formulas at exactly the same axes in the battery), and that any product of an A arm should nest within the product of the corresponding B arm.

But what about the payloads? Worse - what about the contexts? A simple rule might be that the payload of A must also nest within the payload of B. Unfortunately, this doesn't work.

Suppose, for instance, that I write a sort gate, one of whose arguments is a comparison gate producing a loobean. (This is generally a problem calling for generic polymorphism, but let's assume we're sorting lists of a fixed type.) Okay, great. We know the sample type of the comparator - but what about the context? The sort library cannot possibly have the same context as the application which is using the sort library. So, the cores will be incompatible and the invocation will fail.

This rule, which doesn't work in this case, is the rule for %gold (invariant) cores. Every core is created %gold, and remains %gold from the perspective of its own arms.

But the type of some arbitrary comparator, which is an argument to our sort function, cannot possibly be %gold. Rather, we need an %iron (contravariant) core. You can turn any %gold core into an %iron one with ^

(ketbar, %ktbr), but the transformation is not reversible.

The rules for using an %iron core are that (a) the context is opaque (can neither be read nor written), and (b) the sample is write-only. Why? Because it's absolutely okay to use as your comparator a gate which accepts a more general sample than you'll actually call it with. You can write a more specialized noun into this sample - but if you read the default value and treat it as more specialized, you have a type loophole.

A %zinc (covariant) core is the opposite - the context remains opaque, the sample is read-only. We don't use any %zinc at present, but this is only because we haven't yet gotten into inheritance and other fancy OO patterns. (Hoon 191 had inheritance, but it was removed as incompletely baked.) You make a %gold core %zinc with ^& (ketpam, %ktpm).

Finally, the entire payload of a %lead (bivariant) core is immune to reading or writing. So all that matters is the product of the arms. You make a lead core with ^? (ketwut, %ktwt). "

question: " It is actually not a type error of any kind to produce a modified core with the payload set to any kind of garbage at all. It's just a type error to use it - unless the payload is compatible. And when we use a geometric arm, we test this compatibility and then treat the present sample as if it was the original. "

is this saying that for cores, we don't do static tests of type, only dynamic? so you'll have runtime type errors all over the place whenever the type of a 'function' (gate) doesn't match the type of the arguments passed to it?

todo: afaict, the hoon variance is all messed up. variance isnt about readonly or writeonly. i'm probably missing something, ask mailing list for help.

Generic polymorphism

Generic polymorphism action sequences

walk thru

++  list  |*(a=_,* $|(~ [i=a t=(list a)]))

in pieces:
++  list         :: a gate called 'list'
|*               :: generic polymorphism; |*(p q) where p is a tile and q is a twig generates a gate whose 
                 ::                       arm is q and whose sample is the compile-time bunt of p
_                :: tile constructor 'bunt'
,                :: tile constructor 'clam'
*                :: tile constructor 'noun'
a=_,*            :: tile for the bunt of the clam of a noun, with fieldname 'a'?
$|               :: the reed tile construtor (cell/atom tiles). $|(p q) is a tile whose icon contains
                        two kinds of nouns: atoms of tile p and cells of tile q.
~                :: nil literal (here, used as the list terminator
[i=a t=(list a)] :: a cell (cell tile?) whose head has field name 'i' and whose tail has field name 't' and whose
                 ::   head is of type a, and whose tail is a list of elements of type a


todo: why do they generate a clam only to bunt it? why do they bunt it if

* is going to bunt its sample anyways? isn't the sample of this gate the list that is being tested, but if so, why is 'a' used inside the cell tile to refer to the element type? below, we see that '((list ,@) 1 2 ~)' yields ~[1 2], but why?

todo: is 'list' not transforming the representation, just labeling the type? but if so, then why is it printed differently? is this the case or not?

~zod/try=> =lust |*(a=$+(* *) |=(b=* ?@(b ~ [i=(a -.b) t=$(b +.b)])))  :: see below
~zod/try=> ((lust ,@) 1 2 ~)                                           :: @ is the tile for an atom
                                                                       :: , is a clam, so ,@ is the clam of an atom
                                                                       ::   (a normalizing gate producing an atom)
                                                                       :: (lust ,@) is a function that producing a list of atoms
~[1 2]                                                                 :: (lust ,@) applied to [1 2 ~] produces the list ~[1 2]
~zod/try=> ((list ,@) 1 2 ~)
~[1 2]
~zod/try=> `(list ,@)``(lust ,@)`[1 2 ~]                               :: 'list' applied to something that is already a list
~[1 2]                                                                 ::    is just the identity

todo:

|*  a=$+(* *)            :: generic polymorphism; |*(p q) where p is a tile and q is a twig generates a gate whose 
                         ::                       arm is q and whose sample is the compile-time bunt of p
                         ::   todo: so i guess this is saying that the sample is the compile-time bunt of 
                         ::         a tile for a gate which accepts p and produces q?
|=  b=*                  :: build a gate from the tile b=*, ie this is a subroutine which accepts a single argument, b,
                         ::   whose value must be a noun
?@  b  ~                 :: isAtom; returns ~ if b is an atom, otherwise returns [i=(a -.b) t=$(b +.b)]
[i=(a -.b) t=$(b +.b)]   :: a cell whose head is the product of applying a to the head of b, 
                         ::   and whose tail is the product of recursing back up to the |=, while setting 
                         ::   the new b to the tail of the current b

" $+(p q) is a tile for a gate which accepts p and produces q. The spectre of function signatures once again rears its ugly head - but $+(p q) is no different from $_(

"
+(p _q)).

when (list ,@) is called, this is equivalent to the previous, while setting 'a' to ',@'. In other words:

|=  b=*                  :: build a gate from the tile b=*, ie this is a subroutine which accepts a single argument, b,
                         ::   whose value must be a noun
?@  b  ~                 :: isAtom; returns ~ if b is an atom, otherwise returns [i=(,@ -.b) t=$(b +.b)]
[i=(,@ -.b) t=$(b +.b)]  :: a cell whose head is the product of applying ,@ to the head of b
                         ::   (,@ is a function which coerces/normalizes b to an atom),
                         ::   and whose tail is the product of recursing back up to the |=, while setting 
                         ::   the new b to the tail of the current b

"This function is easily recognized as a gate accepting a noun and producing a list of atoms - in short, perfectly formed for an herbaceous tile." an "herb" tile is a custom tile defined by a function.

++  weld                             :: a gate called 'weld'
  ~/  %weld                          :: replace this with a jet called 'weld'
  |*  [a=(list) b=(list)]            :: generic polymorphic gate with two arguments, a and b, each of type 'list'
  =>  .(a ^.(homo a), b ^.(homo b))  :: todo
  |-  ^+  b                          :: |- means "loop here ('label' $ is here)"; ^+ is typecast. todo
  ?~  a  b                           :: if a is nil, then return b
  [i.a $(a t.a)]                     ::   else, return i.a (the head of a) consed with $(a t.a) (call $ after replacing a with the tail of a)
++  homo           :: a gate called 'homo'
  |*  a=(list)     :: generic polymorphic gate with one argument, a, of type 'list'
  ^+  =<  $        :: todo
    |%  +-  $  ?:(_? ~ [i=(snag 0 a) t=$]) :: |% defines a core.
                                           :: +- $ defines a generic polymorphic gate named '$'
                                           :: ?: _? if todo
                                           ::   then return nil
                                           :: else return cons [head tail] where
                                           :: head = snag 0 a
                                           :: tail is $
    --
  a
++  snag                   :: a gate named 'snag'
  ~/  %snag                :: replace this with a jet called 'snag'
  |*  [a=@ b=(list)]       :: generic polymorphic gate with two arguments, a of type 'noun' and b of type 'list'
  |-                       :: |- means "loop here ('label' $ is here)"
  ?~  b                    :: if b is nil:
    ~|('snag-fail' !!)     ::   then run !!, and if it crashes, add 'snag-fail' to the stack trace. (and !!, 'crash', always crashes)
  ?:  =(0 a)               :: if a == 0:
    i.b                    ::   return the head of b
  $(b t.b, a (dec a))      ::   else return the result of calling $ (going back to |-) after replacing b with the tail of b, and replacing a with (a-1)
  ++  open                 :: a gate named 'open'
    ^-  twig               :: typecast to the icon of tile 'twig' todo is this correct?
    ?-  gen                :: case/switch on variable 'gen'
      [%ktdt *]  [%ktls [%cnhp p.gen q.gen ~] q.gen]  :: if 'gen' is %ktdt (^.), in fact (^. p q) then return [%ktls [%cnhp p.gen q.gen ~] q.gen], that is, ^+ [%- p q ~] q
    ==

todo : walk thru ^.(a b) is ^+((a b) b) is ^-(_(a b) b)

Chapter VII - Odds, Ends, Quirks

A bestiary of cores

Book

Reef

Trap

Gate

Tray

Categories

Using cores

On jet propulsion

Gonads

Gonads are pretty much just another name for a popular programming language construct called 'monads'. This section is a good intro to monads.

On debugging your urbits

Parse error

Type failure

Generate your trace

Printf, %sgpm

Trace

Stackf, %sgbr

The urban bull

Too much information

Mitigating TMI

TMI and other bulls

Another convenient convenience

Vulcanization

Naming conventions

misc notes/questions

tiles are patterns, as in pattern matching icons are the types implies by the pattern

hoon kelp -- whats a line? mb that will explain the diff bt kelp and fern bush __ whats $%?

diff between clam and whip?

---

more examples from elsewhere

=>  [load rest]=.
=+  [main step]=.*(rest load)
|-  ?~  main  step
    $(main +.main, step (step -.main))

Or in pseudocode:

from pair $load and $rest
let pair $main and $step be:
  the result of Nock $load run on $rest
loop
  if $main is empty
    return $step
  else
    continue with
      $main set to tail of $main,
      $step set to:
        call function $step on the head of $main
=>  [fab=- src=+< arv=+>- seq=+>+]
=+  ken=(fab [[%atom %ud] 164] src)
?>  =(+>.fab +.ken)
[seq (fab ken arv)]

In pseudocode:

with tuple as [$fab, $src, $arv, $seq], // these are I2 I3 I4 and I5...
  let $ken be:
    call $fab on type-and-value "uint 164" and Hoon $src
  assert environment from $fab is equal to the value from $ken
  return pair of the remaining $seq and:
    call $fab on environment $ken and Hoon $arv

-- [4], [5]

  ++  add
    |=  [a=@ b=@]
    ^-  @
    ?:  =(0 a)  b
    $(a (dec a), b +(b))


   attribute add {
     function(left-operand: atom, right-operand: atom)
     produce atom
     if equals(0, left-operand) {
      right-operand
     } else {
        recurse(left-operand (decrement left-operand)), 
                right-operand (increment right-operand))
     }
   }

-- https://news.ycombinator.com/item?id=8585178

; http://an-animal-imagined-by-poe.tumblr.com/post/97790330928/i-have-been-busy-distracted-absent-for-the-past

(defn find-in
  ([needle haystack] (find-in needle haystack 'x))
  ([needle haystack varname]
   ; varname remminds us of how we got the haystack we're looking at from x
   (or
     ; base case
     (when (= needle haystack)
       varname)
     ; if the haystack is a symbol we're screwed
     (when-not (symbol? haystack)
       (or
         ; try looking at the first element of the haystack
         (when (seq haystack)
           (find-in
             needle
             (first haystack)
             (list 'first varname)))
         ; or in the rest of it?
         (when (seq haystack)
           (find-in
             needle
             (rest haystack)
             (list 'rest varname)))
         ; maybe we can cons together the parts of the needle
         ; from different parts of the haystack
         (when-not (symbol? needle)
           (when-let
             [car (find-in
                    (first needle)
                    haystack
                    varname)]
             (when-let
               [cdr (find-in
                      (rest needle)
                      haystack
                      varname)]
               (list 'cons car cdr)))))))))

(defn exercise-find-in [needle haystack]
  (eval
    `(let [~'x (quote ~haystack)]
       ~(find-in needle haystack))))
@ohAitch
ohAitch commented on Sep 17, 2014

In rosetta-code interests, slightly unconventional to be more verbatim:

++  conv  $&  [p=conv q=conv]
              [%0 p=@b]
++  find-in
  |=  [needle=noun hay=noun]
  ^-  (unit conv)
  =|  axe=(list ?(%0 %1))                               ::  axis acummulator
  |-
  ?:  =(needle hay)
    [~ u=(rap 0 [1 (flop axe)])]                        ::  (some {axis})
  ?@  hay
    ~                                                   ::  fail if hay is atom
  =+  top=$(hay -.hay, axe [%0 axe])
  ?^  top
    u.top                                               ::  cell is success
  =+  bot=$(hay +.hay, axe [%1 axe])
  ?^  bot
    u.bot
  ?@  needle                                            ::  descend unless atom
    ~
  =+  car=$(needle -.needle)
  ?~  car
    ~                                                   ::  lost head is failure
  =+  cdr=$(needle -.needle)
  ?~  cdr
    ~
  [~ u=[-.car -.cdr]]


-- https://gist.github.com/MichaelBlume/17a227cc839f52f68c97

    ++  lax                                           ::    lax:as:go
      |_  [her=ship dur=dore]                         ::  per client
      ++  cluy                                        ::    cluy:lax:as:go
        ^-  [p=life q=gens r=acru]                    ::  client crypto
        ?~  lew.wod.dur  !!
        ?.  =(fak.ton r.i.lew.wod.dur)  ~|([%client-wrong-fake her] !!)
        :+  p.p.q.i.lew.wod.dur
          q.q.i.lew.wod.dur
        (haul r.q.i.lew.wod.dur)
      ::



	
	
urbit 446 days ago

Suppose we wrote this with keywords. I can't stomach a naked keyword so I'll put a colon on the front. We might write:

  :function lax {
    :object :tuple{her: ship dur: dore}  {
      :function cluy {
        :produce :tuple{p: life q: gens r: acru}
        :ifno lew.wod.dur  :error
        :unless (equals fak.ton r.i.lew.wod.dur) {
          :record :tuple{%client-wrong-fake her}
          :error
        } :then {
          :tuple {
             p.p.q.i.lew.wod.dur
             q.q.i.lew.wod.dur
             (haul r.q.i.lew.wod.dur)
          }
        }
      }
    }
  }

-- https://news.ycombinator.com/item?id=8580780

centis, %=, %cnts

Evaluate with changes

%= is a natural rune that evaluates p with the changes specified in q. %= is used to change a batch of wings inside a ++wing all at once, ensuring that the product is type checked.


Produces

Twig: [%cnts p=wing q=tram]
Sample

p is a ++wing. q is a ++tram.

Tall form

%=  p
  p.i.q    q.i.q
  p.i.t.q  q.i.t.q
==

Wide form

%=(p p.i.q q.i.q, p.i.t.q q.i.t.q)

Irregular form

p(p.i.q q.i.q, p.i.t.q q.i.t.q)


Examples

/~zod/try=> =+  a=[p=5 q=6]
            a(p 2)
[p=2 q=6]

In this example we are using the irregular form of %= to replace p in a.

/~zod/try=> =+  a=[p=1 q=2 r=3 s=4]
    a(p 5, q 6, r 7, s 8)
[p=5 q=6 r=7 s=8]

Here we show how you can replace multiple faces at once. We start with a new a and replace all of its values with the irregular form of %=.

/~zod/try=> =+  step=0
            =+  leng=10
            =+  valu=0
            |-
                ?:  =(step leng)
                   valu
            $(valu (mul 2 valu), step +(step))
1.024

In this case we create a simple loop, using |-. To recurse, we use %= with $, the empty name — our |- — replacing our valu with (mul 2 valu) and step with +(step).
"
-- https://github.com/urbit/docs/blob/master/docs/dev/hoon/runes/cn/cnts.md


---

from http://urbit.org/docs/dev/hoon/runes/bc and sibling pages:

==== buc ====
{{{
buc $ %buc
Tile construction

The $ runes construct ++tiles. ++tiles are one of our primary building blocks in hoon, they define how we reduce our ASTs into well typed nouns that nock can compute. You can think of a ++tile sort of like a typeclass in Haskell.
Forks
++tiles that can be one of multiple cases.

$? Non-empty list of cases.

$| Fork between atom and cell.

$& Fork between cells whose head is a cell and cells whose head is an atom.
Tuples

$: Unlabelled arrays.

$= Tuple with ++faces.
Reductions
Important convenience methods for working with tiles, and are very broadly used.

$* bunt

$, clam

$@ whip

bar

bar | %bar
Core construction

The | runes construct cores. In the broadest case you can think of a core as similar to an object with named properties that can contain either functions or data.

The | runes accept an associative array of names (++term) to (++foot), each pair of which are called an ++arm, producing one of the three basic categories of core.
The natural, generic core:

|% Generic core, with [++arms] generally containing ++twigs
Doors, generic core with a sample:
Gates:

Cores with one arm [$], the empty name and which takes a sample p. The closest thing in Hoon to a function.

|= dry gate, where the sample is typechecked at compile time. |* wet gate, where the sample is typechecked at runtime against its product type.
Traps:

Traps reduce to a |% with one arm $, the empty name. A trap is just some computation that has been put inside of a wrapper so that it may be passed around.

|. Generic trap. |- Trap automatically [kick]ed (called) after construction.

col

col : %col
Tuples

The : runes construct tuples.

There is no natural : rune. Instead, all of them derive from the autocons property of ++twig, as show below.

++  twig  $&  [p=twig q=twig]

Namely, a cell of two twigs is a twig producing a cell of the results of the two original sub-twigs.

The : runes produce one of three categories of tuples
Tuples of determinate size:

:- Tuples of two elements, p and q. :_ Tuples of two reversed elements, q and p. Used to ensure vertical code flow.\

:+ Tuples of three elements p, q, and r. :^ Tuples of four elements p, q, r, and s.
Tuples of indeterminate size:

:* Tuples of n elements. :~ Null-terminated tuples of n elements.
Tuples used for interpolation:

:/ Tuple designed to be interpolated into an XML structure.

cen

cen % %cen
Pulling

The % runes pull ++arms and ++wings from the subject, often modifying their values. The % runes are similar to function calls, or execution tools. We use them to either compute some code with changes in the subject, or produce the output from a core.

There are two categories of % runes.
Pull with changes:

%= The natural % rune. Evaluates p with the changes specified in q.

%_ Evaluates p with the changes specified in q, then casts the product back to the type of p.

%* Evaluates p from a door q with changes r.Terminated by ==.
Pull the empty arm $, similar to calling a function:

%- slams a core with one argument.

%+ slams a core with two arguments.

%^ slams a core with three arguments.

dot

dot . %dot
Nock operators

The . runes access Nock operators to maniuplate nouns. They are all natural runes.

There are two basic categories of . runes.
Nock operators

.+ Nock 4. Increments an atom.

.= Nock 5. Tests two atoms for equality.

.^ Nock 11. Loads a file from the global namespace in virtual userspace nock.
Nock evaluators

.* Executes a Nock expression in userspace.

hax

hax # %hax
Pretty printing

The # runes are used for pretty printing. They are both synthetic.

#< pretty prints one or more values to a ++tape.

#> pretty prints one or more values to a ++tank.

ket

ket ^ %ket
Type operations

The ^ runes perform type operations. With the exception of ^- and ^., they are all natural runes.

There are three basic categories of ^ runes.
Cast values to specific types

^- Casts q to the type of the bunt of p.

^+ Casts q to the type of p.
Change the variance of cores

^| Converts core from %gold to %iron.

^& Converts core from %gold to %zinc.

^? Converts core from %gold to %lead.
Those that add a variable name

^= Changes the type of a value by adding a ++face.

sig

sig ~ %sig
Hints

There are two categories of ~ runes, which are used to provide information at runtime. You can think of them as hints to the interpreter.
For Printing:

~& Prints to the console.

~! Prints the type of a twigif compilation fails.
For jetting:

~> The natural ~ rune. Implements Nock 10, applying an arbitrary [++arm]() hintptoq`.

~/ Attaches a formula-level jet hint. ~%\ Attaches a core-level jet hint.

sem

sig ~ %sig
sem ; %sem
tis = %tis
wut ? %wut
zap ! %zap
sem ; %sem
Composition

The ; runes are used for composition. They are all synthetic.

There are three categories of ; runes.
Gate composers

;: Applies a binary gate to multiple arguments.

;~ Uses a monad to compose multiple gates, producing a gate of a similar type.

;; Composes a clam with a value, slamming the value through the clam and forcefully verifying that it is unchanged. See also: ++hard.
Interpolators

;" Composes ++tapes and twigs that produce ++tapes. Generally used in its irregular form, of (text surrounded by double quotes).
[++sail] runes:

;-, ;+, ;*, ;%, and %smzz all compose a tree structure. See also: ++sail.

tis

sem ; %sem
tis = %tis
wut ? %wut
zap ! %zap
tis = %tis
Subject modification

The = runes modify the subject. They are used whenever you want to push something on the subject, modify the subject, or use the results of one formula in another.

There are three basic categories of = runes.
Replace the subject

=> The natural = rune. Uses the product of p as the subject for the formula q.

=< Inverse of =>. Uses the product of q as the subject for the formula a. Used to make code more readable by making it flow vertically.
Expand the subject

=+ Nock 8. Pushes data and code p on the subject for the formula q.

=- Inverse of =+. Pushes data and code q on the subject for the formula p.

=| Pushes the bunt of a tile p on the subject for the formula q.

=* Pushes on an alias onto the subject.
Alter the subject

=. Modifies the subject with a single change.

=: Modifies the subject with a multiple changes.

wut

wut ? %wut
Conditionals

The ? runes, with the exception of ?=, produce conditional statements. The ? runes are the workhorse of control-flow.

There are three basic categories.
If-then-else

?: A natural ? rune. The traditional if-than-else statement.

?. Inversion of ?:. An "unless" statement. Used to keep the code flowing vertically.
Switches

?- Tradional switch.

?+ Switch with a default return value.
Type Tests

?= Tests if the icon of ++tile p contains q.

zap

zap ! %zap
Special operations

The ! runes perform special operations. With the exception of !> and !?, they are all natural.

There are two ! runes used the most frequently.

!! Automatically causes a crash.

!> Turns a value into a cell of its type and itself as compile time constant.

---

things with irregular forms (produced by grep irregular -B5 -A5 docs/docs/dev/hoon/runes/*/* ):

(alphanumeric name + description (todo for many of them); rune; irregular form)

bccb

Default value. Produces a tile-by-example, called a `%weed`. A `%weed` contains the twig used to make the example.
$_
_p

bccl

Tile autocons. Tile rune that constructs the tile of a tuple from a tuple of tiles.
$:
[p q]

bccm

Normalizing gate, `%clam`.  Synthetic rune that produces a `%leaf`, a normalizing gate or [clam]() for `p`. `$,` is used to ensure an input value fits a certain type: if it does match, the value is produced. If it doesn't, the default value for the desired type is produced.
$,
,p

bcpt

Wing to tile. $@ is a natural rune that whips (normalizes) wing p into tile q.
$@
p@q

bctr

Default value.  Synthetic rune that produces the [bunt]() (default value) of a tile as a compile-time constant if possible. If it is not possible, then it is produced dynamically.
$*(p)
*P

bcwt

Fork. $? produces a fork, called a %fern. A %fern is a non-empty list of cases.
$?
?(p q)

clhp

Cell.  Synthetic rune that produces the cell `[p q]`.
:-
[p q] ; also p^q; also p/q; also prefix `; also postfix ~

clsg

Null-terminated tuple. Synthetic rune that produces a null-terminated tuple.
:~
~[i.p i.t.p i.t.t.p]

cltr

Tuple. Synthetic rune that produces a tuple.
:*
[i.p i.t.p i.t.t.p]

cnhp

Slam, n-arguments. Synthetic rune that that pulls $ from the door p with its sample set to [%cltr q]. %- in both its tall and wide forms is like a function call with one argument. However, %-is most commonly used in its irregular form, where p and q are enclosed within () where q is a list of n arguments.
%-
(p q)

cnsg

Pull with sample. Synthetic rune that [pull]()s `p` from the [door]() `q` with its sample set to `r`. `%~` is used to produce an arm `p` inside of a door `q` with the door's sample set to `r`.
%~
~(p q r)

cnts

Evaluate with changes. Natural rune that evaluates `p` with the changes specified in `q`. `%=` is used to change a batch of [wing]()s inside a [`++wing`]() all at once, ensuring that the product is type checked.
%=
p(p.i.q q.i.q, p.i.t.q q.i.t.q)

dsls

Increment.  Natural rune that generates nock operator `4`, which increments an atomic operand.
.+
+(p)

dtts

Equality. Natural rune that applies Nock 5 (equality) to determine if the products of p and q are equivalent, and produces a loobean.
.=
=(p q)

hxgl

Prettyprint `++tape`. Synthetic rune that slams the gate [`++noah`]() with `p`. `#<` is used to produce a [`++tape`]() of the computation passed to it, and is only used in the irregular form.
#<
<i.p i.t.p i.t.t.p>

[6]

Prettyprint `++tank`. Synthetic rune that slams the assumed gate `cain` on `[%zpgr %cntr p]`. Slams the gate [`++cain`]() with `p`. `#>` is used to produce a [`++tank`]() of the computation passed to it, and is only used in the irregular form.
#>
>i.p i.t.p i.t.t.p<

kthp

Cast to bunt. `^-` is a synthetic rune that casts `q` to `~(bunt al p)`, i.e. the icon of `p`.
^-
`p`q

ktts

Wrap. Natural rune that wraps `q` in the [`++toga`]() `p`. `^=` is most commonly used for assignment, adding one or more names to values.
^=
p=q

smcl

Apply fold. Synthetic rune that applies `p`, a binary gate, to the n-ary tuple `q`. `;:` is similar to folding over the tuple `q`, producing the final result.
;:
:(p i.q i.t.q i.t.t.q)

smdq

String. Synthetic rune used to make strings, interpreted or not...Composes [`++tapes`]() and twigs that produce `++tape`s.
;"
"foo"

smhp

`++sail` tape. `;-` is a virtual rune used within [`++sail`]() to interpolate a tape.
;-
-{a}
note: mistake? https://raw.githubusercontent.com/urbit/docs/blob/master/docs/dev/hoon/runes/sm/smhp.md says Irregular form: none but then speaks of an 'irregular form' at the bottom; and -{} looks irregular to me

smtr

`++sail` list `++manx`. `;*` is a virtual rune used within [`++sail`]() to interpolate a list of marl.
;*
*{a}
note: mistake? https://raw.githubusercontent.com/urbit/docs/blob/master/docs/dev/hoon/runes/sm/smhp.md says Irregular form: none but then speaks of an 'irregular form' at the bottom; and -{} looks irregular to me

smzz

`++sail` tag. `semzaz`, is a virtual rune frequently referred to as [`++sail`]() in reference to `++sail`:vast which parses it. It is the most basic part of the templating structure frequently used to produce [`++manx`](). In its simplest and most common form we see `semzaz` as `;tagname(property1 "value1", property2 "value2")`.
none
;p;, ;p:(i.r i.t.r i.t.t.r), ;p(p.i.q q.i.q, p.i.t.q q.i.t.q):"{...}", ;p(p.i.q q.i.q, p.i.t.q q.i.t.q):(i.r i.t.r i.t.t.r), ;p(p.i.q q.i.q, p.i.t.q q.i.t.q):"{...}", `;{p}`, `{p i.r i.t.r}`, `;{p(p.i.q q.i.q, p.i.t.q q.i.t.q) i.r i.t.r}`
this is a 'virtual rune' (!?? what does that mean?) and all of its forms seem irregular to me (!??)

tsgl

Product as subject, inverse. synthetic rune that uses the product of `q` as the subject of `p`. `=<` is the inverse of `=>`. Use `=<` when your computation, `q` is lengthier and `=<` makes for more readable code.
=<
p:q

wtbr

OR. synthetic rune that computes the "or" of the loobeans in `p`.
?|
|(i.p i.t.p i.t.t.p)

wtpm

AND. synthetic rune that computes the "and" of the loobeans in `p`.
?&(i.p i.t.p i.t.t.p)
&(i.p i.t.p i.t.t.p)

wtzp, zpcb (mistake, is zpcb supposed to be the same as wtzp?)

NOT. synthetic rune that produces the logical "not" of `p`.
?!
!p

---

jetted functions

There are a few lists of jetted functions in https://github.com/urbit/urbit/tree/master/include/jets , as of feb 2, 2016. In https://github.com/urbit/urbit/blob/master/include/all.h they are described as k: jets (transfer, args), q: jets (retain, args), w: jets (retain, core). The implementations appear to be in 7 subdirectories of https://github.com/urbit/urbit/tree/master/jets , organized by 'tier', where a is Tier 1, b is Tier 2, ... g is Tier 7. https://github.com/urbit/urbit/blob/master/include/jets/w.h seems to contain a long list of jets, organized into tiers. https://github.com/urbit/urbit/blob/master/include/jets/q.h seems to contain another long list, disjoint with w (eg q has orth but w has down); q also appears to give input type signatures (with arities, but with the only types of each input being 'noun' or 'atom'). https://github.com/urbit/urbit/blob/master/include/jets/k.h seems to give a much shorter list, with some comments describing what each one does. Perhaps the Tiers are ordered from lowest-level or most fundamental operations to higher-level or less fundamental? But looking at the jets in them below, it doesn't seem like that.

the list in k is:

The lists in q and w, with descriptions from the library docs at http://urbit.org/docs/dev/hoon/library/ , with repetition removed, are:

Tier 2 (these are all from http://urbit.org/docs/dev/hoon/library/2b "basic containers":

Tier 3 (these are all from http://urbit.org/docs/dev/hoon/library/2c , "simple noun surgery" or http://urbit.org/docs/dev/hoon/library/2a "basic unsigned math"):

Tier 4:

Tier 5 (in http://urbit.org/docs/dev/hoon/library/2ea "packing", http://urbit.org/docs/dev/hoon/library/2ep "diff", http://urbit.org/docs/dev/hoon/library/2em "regular-expressions", http://urbit.org/docs/dev/hoon/library/2ej "formatting (basic text)"):

Tier 6:

i looked for a bunch of these in the library docs and couldn't find them. However they are Hoon vocab words. I bet these are part of the Hoon implementation. For example, 'mint' is here, and older docs [7] indicated that within the Hoon interpreter, the function (gate) ++mint contains the semantics of Hoon. That is, ++mint maps [type twig] to [type nock]. ++mint is found within ++ut. They are in the urbit code, sometimes with a short comment.

this block seems to be OS stuff, plus some 'big' Hoon implementation methods like mint, nest, play:

todo: word frequency histogram of urbit source

Links

Unfortunately the folks in charge of Urbit have a habit of periodically posting chapters of a Hoon tutorial onto the public website, getting a few chapters in, then before they finish, throwing everything out and replacing it with a 'Coming soon!' page, and then sometime later starting a new tutorial again from scratch. Sometimes they even post complete (but incomprehensible, without reading a tutorial first) reference documentation (and then later replace those pages also with 'Coming soon!'). Luckily you can still read the old docs from various public Github repos.

todo: the [p=twig q=tile] un-type signature format