Revision 24 not available (showing current revision instead)

proj-oot-ootDataNotes4

See also [[proj-oot-plChDataLangs?]].

---

'pattern matching' is a key operation here; this is a SWITCH statement on the form of the data ('form' meaning which is the outermost constructor; this is then extended to be a nested top-down pattern from the outermost constructor to some ways in), combined with a destructuring bind in each case.

in fact, we should generalize pattern matching to more general 'patterns'. First, we should replace the design where a 'pattern' consists of nested constructor applications with a design where it consists of nested applications of (classes of constructors); this is like generalizing types to typeclasses. Note that there's no need for the constructors in a class to be from the same (Haskell-style) 'type'; this could come in handy for sum types in which you want to do the same thing when two different constructors within two different types within the sum type are encountered. Also, that should be recursive in the sense that a 'class of constructor' may be another pattern. Second, we should make it more like graph regexs. These two steps can be thought of in analog to regexs (and in precise analogy to graph regexs): the first step, the movement from constructors, to enumerated classes of constructors, to recursively defined patterns, is like moving from regex matching on single characters (explicitly given), to regex matching on character classes, to parenthesized expressions within a regex; and the second step, the movement to graph regexs, is like the addition of ? and * to the language of regular expressions.

---

Now there are languages like Haskell (and Coq, i think?) that put pattern matching in core languages like F-lite. Conditionals are done via 'switches' (or 'case's as they call them, or implicit switches via guaraded/partial pattern-matching function definitions), and data composition is done by pattern matching. Haskell says it's a 'graph reduction' (graph redex) language, and f-lite is compilable to Reduceron 'template code', and by the name Reduceron, and its parallel-ness, you can tell that they might of it in terms of graph reduction. If you think about it, 'pattern matching' can be envisioned in terms of graph reduction, which can be seen by thinking about the Reduceron operational semantics:

An expression in the program is a node in the graph. When an expression refers to another expression using a variable, this is an edge in the graph. As Haskell begins to execute the program, at first these nodes are just lazy thunks containing expressions to possibly be evaluated, but then as execution continues, the runtime determines that it has to actually evaluate some of these expressions. So, we visit the node corresponding to the expression, and we 'reduce' that expression, which involves traversing the local edges to other nodes (and possibly creating new intermediate nodes as we partially apply stuff, i guess). Then, because Haskell probably wants to cache/memoize already reduced expressions, we write the reduced version of the expression back to the node of the graph that corresponds to that expression. So if you had a graph visualization of this, you would see the CPU focus it's attention on various graph nodes, then move its attention to their children, possibly creating new nodes for partially applied functions, and eventually reach primitive 'leaf nodes', substitute them into the expression in their parents that is was trying to evaluate, and simplify that subset of the graph by removing the temporary partially applied nodes that are no longer needed because they have already computed a fully applied version of them and the requesting parent node only needs that fully applied one so now there are no more edges to the partially applied one.

Pattern matching refers to checking the pattern of the descendent nodes at the other side of an edge. Each node is labeled with its type, and if a constructor, with which constructor it is, and the 'pattern' just refers to these labels, except it can also go more than one hop and include the labels of further nested descendents.

So this seems great, and i bet it is great for pure expressions, but what about side effects? Now ordering matters. You can use monads to guarantee an order of evaluation, i guess. But it seems to me that you may also want to just print out debugging and logging stuff, store profiling information, and cache stuff, as nodes are evaluated, without otherwise messing up the program. I can't decide if the monads are a dirty hack, and you should confine this graph redex stuff to pure expressions and just do sequential imperative programming for interactivity, or if monads are just a beautiful way to do interactivity in a graph redex system (leaning towards the former; graph redex for expression evaluation, imperative sequence for interactivity). I do think that you should have something like Oot Statemasks to allow you to have some 'inconsequential' (literally; the constraint is that they don't change the result of the expression, except for weird special cases like disk full exceptions during logging) side-effects even within the graph redex.

---

as a control primitive, pattern matching on ADTs treats the nested query on the form of the constructor as a shortcut for a bunch of nested if-thens.

in this way, since ADTs and pattern matching are generally seen together, ADTs are closely related to a control structure, too.

---

as a data primitive, contrast ADTs to dicts (hashtables; associative arrays) and multidimensional arrays. ADTs can do structs pretty easily (the different fields of the struct are the different arguments to a constructor), but associative arrays seem a bit more expressive than that, and probably multidimensional arrays too. Of course you can build these structures out of ADTs but i bet it's cumbersome to operate on them that way.

---

does that 'uniform' requirement on pattern matching in f-lite prohibit having more than one 'case' that might match any given thing? Because this is still order-independent as long as you have a rule that 'the most specific one matches'. But you can't just look at them one-by-one, so maybe that's what they want to rule out with the 'uniform' requirement.

---

" The operations of an abstract type are classified as follows: · Constructors create new objects of the type. A constructor may take an object as an argument, but not an object of the type being constructed. · Producers create new objects from old objects; the terms are synonymous. The concat method of String, for example, is a producer: it takes two strings and produces a new one representing their concatenation. · Mutators change objects. The addElement method of Vector , for example, mutates a vector by adding an element to its high end. · Observers take objects of the abstract type and return objects of a different type. The size method of Vector , for example, returns an integer. " [1]

---

in Haskell etc, pattern matching against constructors is against (the set of all) fixed length tuples

how to extend that to matching patterns on (non-fixed-length) graphs? maybe graph regex?

---

stuff in Python that is like just a struct, a class with no methods:

1) class C: def __init__(self, a): self.a = a

x = C(1)

2)

x = type('C', (object,), dict(a=1))()

[2]

3) import types; x = types.SimpleNamespace?(a=1)

4) import collections; C = collections.namedtuple('C', ['a']); x = C(a=1)

5) import attr

@attr.s class C(object): a = attr.ib()

x = C(1)

https://glyph.twistedmatrix.com/2016/08/attrs.html makes a case that the 'attr' library is the best out of these. In short, (1) and (2) have no __repr__ or __eq__, and (4) (i) has fields accessible as numbered indices, too, (b) compares equal to raw tuples of the same values, (c) is (mostly?) immutable. E doesn't treat SimpleNamespace?, which seems to cover the basics. attrs does have various more advanced features than SimpleNamespace?, though, including __lt__(), asdict(), validators, optional closedness via __slots__ (no additional attributes added later).

---

_asummers 4 days ago [-]

Const does not mean immutability, only immutable references to the outermost pointer. It is equivalent to final in Java. While that solves the issue with numbers changing state, it does not help objects e.g. For that you need something like immutable.js from Facebook.

reply

---

"Canonical" S-expressions (csexps)

example:

(4:this22:Canonical S-expression3:has1:55:atoms)

"a binary encoding form of a subset of general S-expression (or sexp)...The particular subset of general S-expressions applicable here is composed of atoms, which are byte strings, and parentheses used to delimit lists or sub-lists. These S-expressions are fully recursive. ... While S-expressions are typically encoded as text, with spaces delimiting atoms and quotation marks used to surround atoms that contain spaces, when using the canonical encoding each atom is encoded as a length-prefixed byte string. No whitespace separating adjacent elements in a list is permitted. The length of an atom is expressed as an ASCII decimal number followed by a ":". ... A csexp includes a non-S-expression construct for indicating the encoding of a string, when that encoding is not obvious. Any atom in csexp can be prefixed by a single atom in square brackets – such as "[4:JPEG]" or "[24:text/plain;charset=utf-8]". " [3]

pros:

Links:

---

these XML features are cool:

"The first atom in a csexp list, by convention roughly corresponds to an XML element type name in identifying the "type" of the list. "

also http://json-ld.org/ looks cool

--- i skimmed:

my summary:

CapnProto? is sort-of a successor to protobuf (the guy who made it was a principle designer of protobuf). Unlike Protobuf, it is big into 'zero-copy', meaning that instead of parsing an incoming message, it just keeps the bytes around and provides accessors functions to use it.

Protobuf is supported by Google and is cross-platform. CapnProto? is made by a few guys at a startup and doesn't support Windows very well yet. One commentor found that Flatbuffer's Java implementation was 'more mature' than CaptnProto?. The CapnProto? guy thinks Thrift is ~categorically worse than Protobuf.

Other 'zero-copy' guys include flatbuffers and SBE. I have heard of flatbuffers.

---

I think for Oot serialization and interop, if possibly, we should choose one of these sorts of guys and use it rather than coming up with our own JSON-like format and/or our own pickling format.

a list of contenders is in plChData. These are not all the same type of thing, but whatever, i'm putting them in the same list. Here they are again with a short summary:

Not in the running:

hmm.. a lot of these aren't great.. i guess what i want is a 'JSON+'. Plus dates, plus references.

EDN/Transit seem to provide this, except for references. Transit has a built-in efficient MessagePack? encoding and even a JSON encoding, so that sounds good. TOML looks cool. YAML is popular but the spec is too complex, and i don't like significant indentation. StrictYAML? still has significant indentation. Should check out HJSON, JSON5, SDLang. CSEXPs are a great lower layer but they don't even have a set of types so they don't solve the problem.

So far i guess Transit seems like the best. Having a JSON and a MessagePack? encoding is a pretty big advantage. And, being a Clojure thing designed for language interoperability, instead of for RPC protocols, i bet it'll fit my use case better than CapnProto?