proj-oot-ootDataNotes6

" Rust already has things like indexing, iteration, and hashing baked in to the standard library, but extending that to cover things like: a richer form of key-based access (e.g. the Entry pattern seen in HashMap?/BTreeMap?), the general class of structure (map/graph/tree/etc.); intrusiveness; whether a data structure is purely functional or not (i.e. changes to the structure create a new structure, rather than mutate the old). This would allow libraries to implement functionality on top of any data structure that implements a requisite set of traits, rather than needing to implement/provide their own. " -- [1]

---

https://github.com/maxhumber/redframes "A pythonic dplyr"

" jcmkk3 4 hours ago

link flag

I was excited to see this pop up on github a few days ago. I think that it has made a lot of smart decisions and practiced restraint in its capabilities. There have been quite a few attempts at a grammar of data manipulation for python, but most seem to be obsessed with creating an infix pipe operator. This one feels the most pythonic of the ones that I’ve seen.

Python’s data ecosystem has a lot of very strong libraries. I feel like array libraries (numpy, pytorch, etc) have solid APIs, scikit-learn has a great API, and Altair + seaborn’s new object interface are good grammar of graphics implementations. Pandas has been the biggest sore spot when comparing usability between R (tidyverse) and python. Don’t get me wrong, pandas is a great library, but there is only so far that it can adapt to ideas and new standards in API design without breaking a bunch of code or creating more confusion in its users.

I don’t necessarily think that redframes is the answer, but I think that it could be some good inspiration for future API designs or a base to build upon. Ibis and polars are currently the best alternative options. Ibis has been undergoing heavy development as of recently and I think that with the duckdb and arrow/datafusion backends, it will start to become more appealing for local data analysis, while also having the flexibility to operate on remote analytic engines/databases. "

---

https://www.jot.fm/issues/issue_2022_02/article2.pdf Implementation Strategies for Mutable Value Semantics Mutable value semantics is a programming discipline that upholds the independence of values to support local reasoning. In the discipline’s strictest form, references become second-class citizens: they are only created implicitly, at function boundaries, and cannot be stored in variables or object fields. Hence, variables can never share mutable state. Unlike pure functional programming, however, mutable value semantics allows part-wise in-place mutation, thereby eliminating the memory traffic usually associated with functional updates of immutable data.

https://www.val-lang.dev/ Safe by default: Val’s foundation of mutable value semantics...

---

" C# has value types and java is working on them. Julia's combination of value types and parametric struct types provides even more control - allowing writing eg btrees which store data inline. Newer languages also make it easier to provide zero-cost abstractions over manual byte-packing (eg blobs.jl). " -- [2]

---

" I’ve spent the last few years doing back end work, and, yes, this is infuriating. Compare the experience of writing something in Cocoa with Core Data to schlepping behavior from SQL to an HTTP API to the browser. There’s no intrinsic reason for it. There’s no reason I could not have something like Core Data, and generate the whole back end and front end client for it, and use it in my event handlers. "

---

https://en.wikipedia.org/wiki/State_diagram#Harel_statechart https://en.wikipedia.org/wiki/Class_diagram https://en.wikipedia.org/wiki/Activity_diagram https://en.wikipedia.org/wiki/Sequence_diagram https://en.wikipedia.org/wiki/Component_diagram

https://en.wikipedia.org/wiki/Unified_Modeling_Language#Diagrams

---

https://blog.saeloun.com/2022/11/22/data-immutable-object

Ruby's new Data class for simple immutable value objects (like Struct, but immutable) ---

"You need, at a minimum, integers, pointers, arrays, and records in a language to be able to build everything else in the library. "

-- [3]

---

"

    1
    xigoi 7 days ago (unread) | link | 

I’d say that in statically typed languages, enums replace the main use cases of symbols.

    1
    ignaloidas 7 days ago (unread) | link | 

In my view, symbols are just a special case of a global enum where you can add extra values extremely easily.

    2
    xigoi 6 days ago (unread) | link | flag | 

At the cost of not having type safety and exhaustiveness checking.

"

---

To give two examples of where Gleam has made (imo) interesting design decisions to keep the language simple:

    There is no Option[T], only Result[T, E]. If you don’t want an error, just use Result[T, Nil]. Is it more typing? Sure, but now you only have one type instead of two.
    A type (think struct) can have multiple constructors, each with their own fields. This means they effectively double as enums. Have a look at the docs, they explain it better than I can.https://gleam.run/book/tour/custom-types.html#multiple-constructors

-- https://lobste.rs/s/yxwubn/gleam_v0_25_introducing_use_expressions#c_ogjqqz

---

immutable value structs

eg Ruby's https://blog.saeloun.com/2022/11/22/data-immutable-object.html

" The backwards compatibility reasoning for not adding an immutable flag to Struct (my first instinct) makes sense. Struct has always been kind of wonky anyway. Getting both positional and kwargs for free is great. I also appreciate that they kept the implementation minimal for now. The pull request shows a very elegant way of implementing default arguments using only language features, which is a good sign:

      Measure = Data.define(:amount, :unit) do
        def initialize(amount:, unit: '-none-') = super
      end
      "
      -- https://news.ycombinator.com/item?id=33796557

bigtunacan 52 days ago

root parent next [–]

The libraries that provide immutability often provide stronger enforcement than this new Data class. I like the idea of something like this as part of Ruby core, but this was a chance to do it right and I don't think they took it far enough.

djur 52 days ago

root parent next [–]

You're thinking deep immutability? I'm not sure that's ever going to make sense to bundle with the language. Too many caveats.

I'd rather see incremental additions to this minimal Data API than trying to do too much at once. Lots of love for the Ruby dev team but they don't always have the clearest idea of what problems their users are trying to solve.

bigtunacan 52 days ago

root parent next [–]

Yes, that's exactly what I was thinking. The problem is that without deep immutability this is still open to abuse/miss-use. When I'm creating an immutable struct I'm trying to prevent future developers from accidental mistakes.

kayodelycaon 52 days ago

prev next [–]
 So… I was trying to figure out why freezing a Struct wasn’t a suitable solution here and the article pretty much answered it: Data is not enumerable.

prescriptivist 52 days ago

prev next [–]

If you want to use something like this right now (with a threadsafe bonus) concurrent-ruby has some useful specialized structs that offer similar behavior:

https://github.com/ruby-concurrency/concurrent-ruby

---

https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i https://wesmckinney.com/blog/apache-arrow-pandas-internals/

---

snej 28 days ago

link
            Go has this standard Context type, which is by convention passed down the call stack to convey bits of state, as an alternative to things like thread-local variables. Like many features of Go it’s annoyingly verbose, but it has the useful feature that as a caller you can set a timeout or expiration time on it, and as a caller there are various affordances for detecting or receiving a channel message when it expires. I think that’s quite a good idea.

---

https://www.tedinski.com/2018/01/23/data-objects-and-being-railroaded-into-misdesign.html

sounds like they are asking for something close to: '& prefix sigil for objects, no prefix sigil for values', which is close to what i had already been thinking

---

(hyper)graphs could have roles (port types) which are themselves parameterized with an arbitrarily large number of arguments, or the roles could even be subgraphs themselves

wait do i actually mean roles? or do i mean RDF predicates? i guess those are related; RDF triples can be used to enumerate typed edges in a graph

also, mb we want a mechanism to reify subgraphs into single nodes (that is, to create nodes whose semantics is a reference to another graph and/or a subgraph of this graph)

another thought: nodes could be typed, and the node type determines which ports (typed 'slots' for edges) that the node has

see also examples from https://chat.openai.com/c/d00ebe3c-0ddc-407c-98fb-5270cfb3f5f1 :

"

  1. ## 1. Directional N-Ary Relations

A directional relation (DirRel?) indicates a spatial relationship between entities.

DirRel?(entity1, entity2, direction)

Example: `DirRel?(river, village, north-of)`

...

  1. ## 3. Path and Motion Relations

A path relation (PathRel?) signifies the trajectory or movement of entities.

PathRel?(entity1, entity2, path)

Example: `PathRel?(bird, tree, from-sky-to-branch)` " -- https://chat.openai.com/c/d00ebe3c-0ddc-407c-98fb-5270cfb3f5f1

ChatGPT? is thinking of the third argument in the triple here (which is like an RDF predicate, or an edge type/role/port) as a node parameterized by an arbitrary number of arguments (eg from _ to _ has two arguments). But i think you could also stick a reified subgraph in there (so "from sky to branch" would just be a subgraph, and a node reifying (referring to) that subgraph would go in the triple in the example above).

Also note the DirRel? and PathRel? prefixes. These look like node types to me.

---

should reread my older notes about the generalized graphs data structure idea. I feel like maybe i watered it down over time. By ootDataNotes2.txt i'm already getting distracted by less important concerns. I think that the generalized graphs data structure idea seems more important to me now than it did at the time because i am becoming an adherent of the "data structures are central" design philosophy. I should re-read the following:

""" ... reiGets(n, k), content(n), topnodes, 'anonymous'

content(n) is the set of nodes n2 such that n \in get(n2, 'contentOf') topnodes is the set of nodes n such that get(n, 'contentOf') = {} reiGet(n, k) is the set of pairs (n2, es) of the form (nodes, {nodes}), such that for each e in es, get(e, 'source') = n, get(e, 'target') = n2 and (either get(e, 'key') = k or k = 'anonymous')

from this we can formulate gets, and from that plus the naturals and a terminator, we can formulate enumeration, from which which can formulate an enumerated gets and a matching enumerated rei, which are more intuitive to work with """ -- meta.txt via ootDataNotes1.txt

and parts of the following, in descending order of importance:

maybe i better copy the most important excerpts here, and then just reread those excerpts from here? Yeah, i'll do that, delimited by '--' separators:

--

""" ... reiGets(n, k), content(n), topnodes, 'anonymous'

content(n) is the set of nodes n2 such that n \in get(n2, 'contentOf') topnodes is the set of nodes n such that get(n, 'contentOf') = {} reiGet(n, k) is the set of pairs (n2, es) of the form (nodes, {nodes}), such that for each e in es, get(e, 'source') = n, get(e, 'target') = n2 and (either get(e, 'key') = k or k = 'anonymous')

from this we can formulate gets, and from that plus the naturals and a terminator, we can formulate enumeration, from which which can formulate an enumerated gets and a matching enumerated rei, which are more intuitive to work with """ -- meta.txt via ootDataNotes1.txt

--

"""

One data structure

Oot graphs

In pursuit of as much generality as possible, Oot's graphs are directed, labeled, reified, multi, hyper graphs with ports. Directed, meaning that there may be an arcs from one node to another one, without an arc from the second to the first. Labeled, meaning that nodes and arcs may have data attached to them. Reified, meaning that the target of an arc may be a node, or another arc, or it might be the graph as a whole. Multi, meaning that there may be multiple arcs going from one node to another node. Hyper, meaning that we generalize binary arcs, which have a source and a target, to n-ary arcs, which involve a number of nodes in a number of roles. With ports, meaning that we can assign labels to the 'connection' between a particular node and a particular edge, in such a way so that within any given node, each edge has a unique label; and we can group these ports into 'port types'.

List is a homoiconic language based on lists; function calls in the source code are written as lists of arguments appended to the function identifier. Oot is 'almost homoiconic' but is based on keyword arguments rather than positional arguments; therefore, an associative array is the fundamental structure, rather than a list like in Lisp.

So why do we say graphs, instead of associative arrays? Because we often want to reason about the nodes underneath the first level of the array (eg the node 'female' in the associative array: {Alice: {id: 1, gender: female}, Bob: {id: 2, gender, male}}).

Also, note that (a) graphs can embed almost all of the popular computer science data structures, and (b) Oot is good at manipulating ASTs, which assists metaprogramming.

Views

A value may have different representations in memory. For example, the integer '-1' might be represented as an 8-bit integer with a sign bit, as an 8-bit integer with twos complement, as a 16-bit integer with a sign bit, etc.

Often we find that one piece of data must be represented in multiple forms because the different functions that need to operate on it expect different data types. For example, a persistence framework may wish to attach metadata to persisted arrays, but this metadata must be hidden when the arrays are to be passed to library that expects plain arrays, yet still present after the mutated arrays are returned.

In Oot, one can easily pass between (a) a 'non-meta' graph, and (b) that same graph, but with additional annotation nodes, in the form of specially-typed metanotes which have "annotates" arcs which point to nodes and arcs in the 'non-meta' part of the graph. In Oot, one can use views to 'hide' the annotations, pass the non-meta graph to some function that traverses all of the nodes and mutates them (or rather, since we like immutable data, creates a 'mutated' copy of the graph as a new value), and then to recover the annotation nodes from the new value, because the runtime secretly copied the hidden annotation nodes when the relevant underlying nodes were copied into the new value.

Another example of when views are useful is when a tree data structure must appear as a linear array of nodes for the purpose of passing to a filtering function, but in actuality must retain its tree structure after the undesirable nodes have been filtered out.

Popular constructs for this purpose include type classes and OOP ad-hoc polymorphism; both of which work via the creation of an interface in the form of a set of functions supported by a type (or type class). To borrow the language of RESTful web architecture, this solution amounts to the creation of a new set of verbs for each interface.

Oot discourages types defined by the unnecessary creation of verbs in cases where they can instead be defined more simply as graphs conforming to a certain shape. This allows the pervasive graph operations to be applied to more types. For example, trees and arrays are both graphs with different shape constraints. If data in the shape of a tree needs to be filtered by a function that expects a linear list, instead of creating an adaptor that exposes a set of functions such as next(), the Oot way is to specify the shape transformation that says, for each node in the tree, which item in the list it corresponds to.

To assist in this, Oot provides the concept of 'views'. Oot views allow the same piece of data to be viewed in different ways. They are defined by a specification of the shape transformations (mathematically, homomorphisms) which define the various representations of the same piece of data. A mutation of the data in any of its views affects it in all views. When this sort of thing is implemented in other languages it tends to involve things such as pointers linking together data fields within distinct variables, but by contrast, Oot presents the abstraction of a single variable that holds a single value which can be seen from different views. Views are a type of ad-hoc polymorphism, a way for a piece of data to expose a different 'interface' to various consumers, but these interfaces differ via shape rather than via a distinct function signature.

Binding

todo

like spreadsheets

Domains

binding is tightly restricted between different state domains (or just 'domains')

todo

Boundaries

Nodes in a graph might correspond to values of a given data type; for example, you might have a graph whose nodes have integer values. But what happens when the type of the nodes itself has a graph structure? For example, if you have a data type A, where A is a list whose nodes are type B, where B is the type of arrays of integers? You might conceptualize an instance of A in two ways; first, as a simple list-shaped graph with opaque nodes of type A; or second, as one big graph whose nodes are integers. The question at hand is: when do we stop recursing into node values and decide to just consider something an opaque 'node'? The difference is not merely academic; for example, when a value of type A is operated upon by a generic library function that visits each node in a graph and applies some transformation, different results will obtain depending whether the graph is composed of arrays of integers, or integers.

Oot allows the programmer to fluidly move between these conceptions via the construct of 'boundaries'. A boundary can be visualized as a wall surrouding nodes in a graph; any remaining structure inside the wall is considered opaque (until such time as the boundary is moved or deactivated). A single graph can be marked with multiple boundaries of different 'color', only one of which is considered the 'active boundary color' at any one time.

Boundaries seem even more important when one contemplates the binding together of different variables in a Oot state domain. In essence, each Oot state domain could be considered as one big graph. The question of where one variable stops and another begins becomes merely a question of boundaries. Taking this viewpoint even further, the different state domains within a program can themselves be thought of components of a larger graph, a single graph containing the contents of all of the variables of the program as subgraphs, separated by various boundaries.

In other words, Oot data values (what Oot variables hold) are graphs; where one graph references another, one can consider these two graphs as two separate values, or one can consider them as one graph value with two entry points. Oot allows the programmer to flexibly shift between these conceptions.

...

Uniform treatment of metadata

Oot has a system to allow various frameworks to attach their metadata to values.

Powerful graph abstractions

The Oot nets are labeled, directed, multi, hyper, reified graphs. Oot networks are:

Nodes in Oot nets can be arbitrary functions.

Each node in a net can used as a simple array or lookup table. However, Oot supports more powerful idioms that regard the network in its totality, rather than each node individually. For example:

Nets can be used to represent regular multidimensional labeled arrays, similar to R dataframes.

Boundaries

A net boundary is a border that cuts through a set of edges in the graph.

Boundaries in code are used to represent atomic operations, exception catching domains, etc.

Patterns

Net patterns are a matching language that generalize regular expressions and database queries to nets. Net patterns are first-class objects.

Views

A view is a different set of labels and edges associated with the same nodes. Views allow the same data to be presented in different ways to different functions operating upon it. For example, a dictionary could be presented as a list of key, value pairs, and mutations on this list could alter the dictionary.

A mutation to data in one view can affect the same data in other views. When one view is a projection of another, syntax is provided to specify the 'lineage' of data so that metadata attached to a node N1 in another view remains attached to the correct node(s) if N1 is mutated by way of replacing it with a new node.

""" -- whyOot.txt

""" Oot is a readable, massively concurrent, programmable programming language.

Oot is a general-purpose language that particularly targets the following application domains:

""" Lisp is a homeoiconic language built upon lists. Oot is a homeoiconic language build upon labeled (hyper multi reified) graphs. ... -- built around a powerful data structure: labeled graphs. ...

G-constructor syntax

G-constructors are literals used to construct directed graphs (for technical notes on what we mean by directed graph, see [4]).

A graph in Oot is something that you might draw by drawing a bunch of circles and then drawing some arrows between some of the circles, and then attaching notes to some of the arrows. To non-CS people, a graph might be called a "network". We call the circles "nodes", the arrows "arcs", and the notes on the arrows "edge labels".

G-constructors are constructs surrouded by [] delimiters (can be implicitly closed by indentation).

The outermost node in the constructor is called the "root node". The constructor can only construct one connected component, although other unconnected nodes can be added later by merging graphs.

Example: a graph with only the root node:

  []

Nodes can contain a list of objects, separated by spaces. These objects are considered to be other nodes such that directed edges go from this node to those. These edges are labeled by their position in the list (zero-indexed).

Example: a node that contains a list of the first two even numbers:

  [2 4]

This represents the following directed graph:

   root
    /\
 0 /  \ 1
  /    \
 2      4

If a variable X holds a node, then that node's edges can be traversed using the "." operator.

Example: after putting the above node into "x", x.1 == 4:

  x = [2 4]
  x.1 == 4    # evaluates to t (true)

Edges can have multiple labels. To assign an additional label to an edge, use "label = destination".

Example: same as above, but the edge to 2 is labeled "yellow" as well as 0:

  ["yellow"=2 4]

This represents the following directed graph (note: should we use '/' instead of '=' to make the directedness clearer?):

            root
             /\
 0,"yellow" /  \ 1
           /    \
          2      4

Any of the labels may be used in traversal:

  x = ["yellow"=2 4]
  x."yellow" == 2

An object pointed to by a node may itself be a node that points to other objects. However, primitive values, such as "2", cannot point to anything.

Example: A node that points to two nodes, one that points to strings describing some fruits, and one that points to strings describing some vegetables.

  ["fruits"=["apple" "pear"] "vegetables"=["carrot" "brocoli"]]

                 root
                  /\
      0,"fruits" /  \ 1,"vegetables"
                /    \
               /      \
              /        \
             /          \
            /            \
           /              \
          /\              /\
         /  \            /  \
        /    \          /    \
  "apple"  "pear"  "carrot"  "brocoli"

todo num labels

"." is left-associative and may be used to traverse: x = ["fruits"=["apple" "pear"] "vegetables"=["carrot" "brocoli"]] x."fruits".1 == "pear"

 <code>[name | value1, value2, etc]</code> is notation for a node with a node id (global label) name. It is an error (compile-time if possible, but runtime in general) to assign the same name to two different nodes. 

Example:

 ["x" | fruits=["apple" "pear"] vegetables=["y" | "carrot" "brocoli"]]


                 root, "x"
                  /\
      0,"fruits" /  \ 1,"vegetables"
                /    \
               /      \
              /        \
             /          \
            /            \
           /              \
          *               "y"
          /\              /\
         /  \            /  \
        /    \          /    \
  "apple"  "pear"  "carrot"  "brocoli"

(the * represents the unnamed node in the example)

Labeled nodes may be pointed to using the notation x..name, where x is a variable that refers to the G-constructor.

  x = ["x" | fruits=["apple" "pear"] vegetables=["y" | "carrot" "brocoli"]]
  x.."y".0 == "carrot"

Within the constructor, ".." may be used with no variable to its left. Using node labels, cycles may be created:

 Example: ["s" | .."s"]
 "s"
 / \
 \_/
 

By referencing parts of other G-constructors, one can create nodes with multiple parents without cycles

Example:

  x = ["a" | ["b" | 2], 3]
  y = ["c" | x.."b"]

Now y looks like:

 root,"c"  "a"
        \  / \
        "b"   3
         |
         2

Note: the graph in "x" is unchanged! the values of x and y have not been "linked" together in any way, it is as if a copy was made of some of the contents of x and put into y. If you don't want time and memory to be spent making a copy, and you don't have any more use for the graph in x, then don't use the variable "x" afterwards and the compiler will probably notice and optimize under the hood by just using the part of memory that it used to call "x" for y and adding node "c" in-place. If you want to make sure, use the variable "x" throughout, "x = ["a" | ["b" | 2], 3]; x = ["c" | x.."b"]", which guarantees that the compiler will mutate x in-place rather than copying. In any case, the compiler may choose to defer the copy unless and until it wants to do it. Unless x is a mutuable variable, the compiler will probably simply internally refer to x when "b" is called for.

Note: only the part of the graph that is reachable by a directed path from x.."b" is included in y.

Note: the node labels are only available from variables bound to the G-constructor in which they were created:

Example:

  x = ["a" | ["b" | 2], 3]
  y = ["c" | x.."b"]

  x.."b"    # succeeds
  y.."c"    # succeeds 
  x.."c"    # error
  y.."b"    # error

The reason it works this way is that a G-constructor actually returns not the root node, but rather an "index node object", which is a node that contains an assoc table from node labels to nodes. one of the nodes is labeled "root". for convenience, "." is actually a shortcut for "..root." ".." is used to directly access the index node.

To create a variable with all the labels, use the midx ("merge index") command, which takes a list and returns a list of the same length. Each item in the return list contains the same node as the corresponding item in the input list; but all items in the return list can access any of the node names anywhere in the input list:

  [x y] = ([x y] midx)
  x.."c"    # succeeds

Again, however, it should be noted that the values of the nodes are not "linked".

It is an error if there is a name clash between the labels of any of the nodes being merged (except for the label "root"). This can be avoided when names are hardcoded by using symbols for names (todo). When labels are not symbols, use of midx generates a compiler warning. When labels are dynamically computed at runtime, errors can be avoided by using the midxrn function which renames clashing names and returns a table listing the new names (todo), or by using midxic ("merge index ignore clash"), which resolves name clashes differently for each variable in the list by preferring that index's names over incoming names, and then preferring name bindings from the beginning of the list over names from the end, or by using midxo ("merge index overwrite"), which resolves name clashes uniformally for all variables in the list (i.e. all variables end up with copies of the same index) by preferring name bindings from the beginning of the list.

As you might guess from the compiler warning, my recommendation is that you avoid midx except for when you are using hardcoded labels (preferably symbols). Usually midxrn is the right choice otherwise.

Within a graph, the reserved word "this" always refers to the current node.

Example: a node with a self-loop may also be created as follows: [this]

Within a G-constructor, the reserved word "root" refers to the root node. Example:

[1, [2, root]]


    root _
     /\   |
    /  \  |
   1   /\_|
      2  

Using the "unshadow" token operator, "^", on "this", the (lexical) parent node may be referred to. Repeating the unshadow operator allows reference to the (lexical) grandparent.

Example:

 [1, [2, ^this]]
    root _
     /\   |
    /  \  |
   1   /\_|
      2  
 [1, [2, ^this.0]]  
    root 
     /\   
    /  \  
   1   /\
      2  1
  1. note; the ^this.0 is evaluated at the time of graph construction; if one of the 1s is changed later, the other wont be affected
 [1, [2, [3, ^^this]]]
    root ___
     /\     |
    /  \    |
   1   /\   |
      2 /\  |
       3  \_|

Not all edges of a node have to have integer labels (in fact, none of them do). To separate those edges with integer labels from those without, use the "--" symbol. Edges with integer labels go on the left. Edges on the right must have an explicit label. The integer labels are no different from other labels; the -- syntax just tells the constructor not to automatically add them.

  [1 -- "yellow"=2]
            root
             /\
          0 /  \ "yellow"
           /    \
          1      2
  [-- "yellow"=2]
            root
             /
   "yellow" /  
           /  
          2   

many different edges from the same node may all point to the same destination object:

  [1, ["red"=root, "yellow"=root]]
        <-------------|
    root<--           |
     /\   |           |
    /  \  |2, "yellow"|
   1   /\_|           |
      |               |
      |_______________|  
           1, "red"

(btw the only reason i'm not drawing arrows on most edges here is that it's hard to do -- all edges are directed)

An edge may have multiple edges with the same label, but in this case . will only follow the "first" one (first is the one with the lowest integer label; if all the edges with the given label are non-integer, todo)

To add an edge programmatically, use "ins":

x = [1 2]
x = 3 x ins
x == [1 2 3]
x = x "red"="apple" ins
x == [1 2 3 "red"="apple"]
# alternative
x = [1 2 3]
x = "apple" x lb="red" ins
x == [1 2 3 "red"="apple"]
# use i=n keyword arg to ins to not assign the next integer to new label
x = "banana" x lb="yellow" i=n ins
x == [1 2 3 "red"="apple" -- "yellow"="banana"]
# or, alternatively, if x.__ordered = f, ins will not assign an integer:
x = [1 2 3]
x.__ordered = f
x = x "yellow"="banana" ins
x == [1 2 3 -- "yellow"="banana"]

To get or set a node's label use its "lb" edge:

 x = [1 [2]]
 x.lb = "apple"
 x == ["apple" | 1 [2]]

To operate on nodes other than root, use assignment syntax:

 x = [1 [2]]
 x.1.___lb = "apple"
 x == [1 ["apple" | 2]]

To access a node which represents an edge (this is called "reifying" the edge), use the '^.' operator in place of '.'. The reified edge has labels "src", "dst", "lb". Note that the contents of lb is a list:

 x = [1 "yellow"=2]
 x^.1.dst == 2
 x^."yellow".lb == [1, "yellow"]

You can change an edge's labels or even its source or destination:

 x = [1 "yellow"=2]
 x^."yellow".lb = [3 "red"]
 x == [1 -- [3,"red"]=2]

To insert the contents of one list into another, you can use the @ operator:

  x = [2 3]
  y = [1 @x 4]
  y == [1 2 3 4]

example: a single node with value "10": ex = [10] ex.n == 10

example: a single node that points to itself:

ex = ['s | s]
ex = [this]

todotodo

example: a list

ex = ["apple" "banana" "cherry"]
ex = ["apple" "banana" "cherry"]
ex.0 == "apple"
ex.1:2 = ["banana" "cherry"]
ex = $[apple banana cherry]
ex = "apple", "banana", "cherry"
ex2 = ["grapefruit", @ex]
ex2 = ["grapefruit","apple","banana","cherry"]
fruit2 = "banana"; ex = $[apple `fruit2 cherry]
fruit23 = $[banana cherry]; ex = $[apple `@fruit23]

example: an association table
ex = [
         apple = red
	 banana = yellow
	 cherry = red
ex = assoc $[[apple red] [banana yellow] [cherry red]]
ex = assoc $[
	apple red
	banana yellow
	cherry red
ex = assoc [
	"apple" "red"
	"banana" "yellow"
	"cherry" "red"

todotodo

idea: when multiple edges have the same name, allow to get the set, but also "any" and "all" operators. "any" picks an arbitrary edge out of the set (like, the first one), and "all" does something to all of them (parallel programming). actually, better: the std get is short for getSet composed with any (or should it be all? these are the same in the degenerate case of unique names). use the same mechanism to deal with multiple nodes with the same name. so, when a node is linked to a named node, it can be linked to "any" node with that name (creating one edge), or to "all" of them (creating possibly more than one edge). now you don't have the ugly problem of namespace conflicts when you merge graphs.

...

Code inside G-constructors

Children of G-constructors are whitespace-delimited. That means that, in order to put code inside of a G-constructor, it must contain no spaces. So

 [y x f]

isn't interpreted as y applied to the result of x applied to f, but rather as three separate children, y, then x, then f. To get y applied to the result of x applied to f, you'd do:

 [(y)(x)(f)]

or

 [y,x,f]

Another example:

 [y,x,f b,a,g] has two children, y applied to the result of x applied to f, and b applied to the result of a applied to g.

...

Boundarys

In a language where data values of all types are represented as graphs, and containment of one object within another is represented as a link from some node of one object to some node of the other object, it can be hard to tell where one object ends and another begins.

For example, if one had a binary tree, in whose leaves were stored lists, some of which may have nested lists, how would one distinguish a list of two lists from an interior node of the tree? For instance, the graph might look like:

 [ [[5 3]] [[[3 1] 2]] [[3 2]] ]

is [[3 1] 2] an interior node or a leaf?

Note that this problem doesn't occur when you know beforehand the shape of the graph -- it only appears when you want to have general algorithms that apply to graphs of different depths and shapes.

One solution to this problem is to put leaf nodes inside a wrapper that identify them as such. In Oot, a wrapper type is provided, called __boundarys__. The wrapped values are called __leaf__ values, and the graph as a whole is said to be __bounded__. There are a number of functions in the standard library that recurse through a graph, do something with the leaf values, and then terminate the recursion, i.e. even if the leaf values are themselves nodes, the algorithm doesn't recurse into them and follow their arcs. A concise syntax is provided to make an expression terminal within a G-constructor:

  {{expr}}

Now we can disambiguate the previous example:

  [ [{{[5 3]}}] [[{{[3 1]}} 2]] [{{[3 2]}}] ]

is the G-constructor used to express the case where [[3 1] 2] is an interior node, and

  [ [{{[5 3]}}] [{{[[3 1] 2]}}] [{{[3 2]}}] ]

is the case where [[3 1] 2] is a leaf.

The boundary syntax is only available within G-constructors. The terminal value may be of any type. You cannot have a terminal value outside of a graph; if you fetch a terminal value, you just get the value, i.e. it is transparently unwrapped:

  [{{3}}].0 == 3

Flavored boundarys

In some cases we may want to have a single graph which contains various different object boundaries. For example, perhaps we have TODO.

Oot provides a way to attach an arbitary value to a boundary, called the boundary's __flavor__, with the idea being that the flavor tells you what kind of object is being bounded (or, to put it another way, what kind of boundary it is). We might have called this "labeled boundarys", except that a terminal object may have a node label and may be the destination of a labeled edge, so we want to avoid confusion.

Functions in the core library which make use of boundarys can be told to ignore all boundarys except those of a certain flavor, or to ignore all boundarys of a certain flavor. Functions are also provided to find all boundarys of a given set of flavors and replace them with a different flavor (flavor map).

The flavor of a boundary is specified by a value, followed by a colon, in the boundary:

 {{flavor : terminal value}}

Terminatators within a flavor are of flavor nil.

Graph transformations

A convenient way to structurally transform a graph is to pattern match on it and then to construct a new graph based on the values bound to the pattern variables. This can be composed with two graph comprehensions (allowing only a map and a conditional) (one for the forward direction (for gets), and a simple map function for the reverse (for "__set"s); unfortunately, Oot cannot calculate the inverse of a function) to yield a restricted way to transform both the graph's structure and the values within it. This is called a Oot __graph transform__.

There is a convenient syntax for specifying a new data interface as a graph transform of an existing data interface: TODO """ -- ootOldMain.txt

"""

Technical notes on Oot "graphs"

What we call "graphs" in Oot is a type of value which are what is called in some parts of computer science a "labeled multidigraph with unique node labels and a distinguished node", and where the node and edge labels are arbitrary Oot values, and some nodes without outgoing edges are arbitrary Oot values.

To non-CS people, a graph (of either sort) might be called a "network". A graph in Oot is something that you might draw by drawing a bunch of circles and then drawing some arrows between some of the circles, and then attaching names to some of the circles, and attaching notes to some of the arrows (some arrows might have more than one note). We call the circles "nodes", the arrows "edges", the circle names "node labels", and the notes on the arrows "edge labels".

Formally, (following http://en.wikipedia.org/wiki/Pseudograph) a Oot __graph__ is an 9-tuple G=(node labels, edge labels, interior nodes, value nodes, edges, s, t, L_n, L_e) where

In other words, a Oot graph consists a set of interior nodes, which may have names, a set of value nodes, and a set of labeled edges. Value nodes, node names and edge labels may be arbitrary Oot values. Interior nodes may or may not have a node name; a node name is an arbitary Oot value that corresponds to exactly one node. A node is either an interior node, or "value node", which is just an arbitrary Oot value. are any value besides interior nodes. __Edges__ are things that each have a __source__, which is an interior node, and a __destination__, which is an arbitrary node, and one or more __edge labels__, which are arbitary values. One of the nodes is named "root".

A node is reachable from another node if you can start from the one node and get to the other by following intervening edges from source to destination.

A node is connected to another node if you can start from one node and find a sequence of edges such that the first edge's source or target is the one node, and the last edge's source or target is the other node, and each edge's source is either the source or the target of the previous edge. A graph is connected if any pair of nodes is connected. Basically, a graph is connected if you can reach any node from any other node by following the edges, if you're allowed to follow them "backwards". If you draw a graph on paper, it's connected if you can't find any proper subset of nodes such that you can draw a circle around those nodes without intersecting any of the edges.

A Oot G-constructor can't create any graph value, only some of them, namely (i think) those in which each node is reachable from the "root" node. The values that a single G-constructor can create are kind of like S-expressions in Lisp, except that in addition there is a facility to create cycles.

(I think that) a series of G-constructors, where later constructors make use of values created in earlier G-constructors, can create any connected graph value. By also using a function that merges graphs, (i believe that) any graph value can be created. """ -- graphTechnicalNotes.txt

"""

oot data design todos

...

to generalize (hyper)nodes further (?!):

n-ary Chu-ish spaces

(note that ordinary Chu spaces can already represent n-ary relations, see http://chu.stanford.edu/ ; i say Chu-ish b/c i'm not really thinking of Chu spaces here, just an n-ary matrix that is like the characteristic matrix of a relation, except with an arbitrary alphabet)

...

is boundary mode per-perspective?

can you layer perspectives, e.g. layer a boundary perspective beneath another perspective?

...

ok, two things:

one might give 'perspectives' a different name to make the names easier for people to remember. one might identify views with structural types and perspectives to nominal types, b/c the ad-hoc (typeclass instance) resolution should still happen at compile time when possible. however i still like the idea of being able to give a name to a structural type pattern and to have the type inference algorithm not create union types by itself, e.g. rely on user naming to control complexity.

so note that the choice of ad-hoc instance is made by the person supplying the data and is carried with the data, it is not made at the use site where the ad-hoc operator is called.

for functions that take multiple arguments one has to determine which argument controls the choice of perspective. One idea is to just have a simple rule like 'the leftmost one', and another idea is to allow the programmer to determine it somehow.

...

four fundamental forms, realizations in graphs:

relations functions tables (non-graphical???) orders

these can each be realized in hypergraphs:

relations: a hyperedge with nodes a, b, and c represents the tuple (a,b,c) in the relation functions: a --c--> b (an edge from a to b labeled c) represents a(c) = b functions, multiple arguments, curried: a--c-->b b--e-->d (an edge from a to b labeled c, and an edge from b to d labeled e), represents a(c) = b and b(e) = d, which is the curried form of a(c,e) = d (and, b is the partially instantiated version of uncurried a, with first argument c) tables, which we identify with: functions, multiple arguments, uncurried (is this identity a good idea?): a--(c,e)-->d (an edge from a to d labeled (c,e)) represents a(c,e) = d orders: 0 -> 1 -> 2 -> 3 represents a total ordering on {0,1,2,3}

note:

these can also each be represented in each other (except for orders; embedding the others within orders seems unnatural to me):

representations as relations:

representations as functions:

representations as tables: same as for functions

note: perhaps preorders (drop the antisymmetric condition/allow cycles) should be used here instead of partial orders

...

--

need stuff like boundaries and annotations.

use case: need to be able to reduce (compile) the form of some expression in the AST, but annotate each reduced bit with the corresponding part of the original expression

--

one simple, inefficient strategy for keeping perspectives consistent is to convert data implemntation each time the data is changed from a perpective. e.g. if you do x = 3 + 7 and then y = S(x), actually behind the scenes run the conversion from integer representation to successor representation, and then if z = y + 2, convert back

--

there is an algebraic structure on the operations that transition between representations.

sometimes you can/must apply a chain of operations to get to a new representation. e.g. you can reify a hypergraph, but then you can reify the reification, and the result is a different representation.

even for different representations that are isomorphic (such as reifications), they form a non-free category, that is to say, that different chains of operations can get you always the same representation, for example, if you reify twice then anti-reify twice, you are back where you started

...

motivation for views:

1) we want one object to have multiple interfaces; a Dict is also a List of tuples 2) but we also believe, Clojure-style, that it's better to have one or a few fundamental data types with common operations on them, e.g. better to have only a Sequence interface than Lists and Stacks and Queues, and have methods for each of .push, .pop, .enqueue, .dequeue, .insert, .delete, .getAtIndex. 3) and the same object may implement an interface in multiple ways, e.g. a Dict is a List of pairs, but it's also a List of values (e.g. sometimes you want to map over the pairs in a dict, and sometimes you want to map over the values; and sometimes you want to treat the keys as a list, too) 4) the std prescription is to provide methods on a dict that return the list of keys, list of values, list of pairs, etc. But we want to modify the dict itself through at least some of these representations (e.g. map over the list of pairs to product a new dict). 5) the std prescription is to create another object with side-effects on the first one. but we want make such linkages more explicit.

...

--

from meta.txt:

reiGets(n, k), content(n), topnodes, 'anonymous'

""" ... reiGets(n, k), content(n), topnodes, 'anonymous'

content(n) is the set of nodes n2 such that n \in get(n2, 'contentOf') topnodes is the set of nodes n such that get(n, 'contentOf') = {} reiGet(n, k) is the set of pairs (n2, es) of the form (nodes, {nodes}), such that for each e in es, get(e, 'source') = n, get(e, 'target') = n2 and (either get(e, 'key') = k or k = 'anonymous')

from this we can formulate gets, and from that plus the naturals and a terminator, we can formulate enumeration, from which which can formulate an enumerated gets and a matching enumerated rei, which are more intuitive to work with """ -- meta.txt

an older version (which i no longer like) but which i'm including because it has an interesting note on some library functions that should be in Oot:

"""

getNodeWhichIsTargetAlongKey(n, k, le), keys(n), rei(n,k), 'zero', successor, 'terminator', topnodes

so for A--e_01---e_1---e_11-->B,

getNodeWhichIsTargetAlongKey(A, '0', 'target') = B keys(A) = {0} (equivalent to a node with one edge, to 0) rei(A,0) = e_1 keys(e_1) = {'source', 'target', 'label'} getNodeWhichIsTargetAlongKey(e_1, 'source', 'target') = A getNodeWhichIsTargetAlongKey(e_1, 'target', 'target') = B getNodeWhichIsTargetAlongKey(e_1, 'label', 'target') = '0' rei(e_1,'source') = e_01 rei(e_1,'target') = e_11

much better, yes? now if you have 'nonstandard edges', e.g., Christine is Alice's child, represented as subject=Christine, verb=childOf, object=Alice, which is at key '0' at Christine, you can do:

getNodeWhichIsTargetAlongKey(Christine, '0', 'verb') = childOf getNodeWhichIsTargetAlongKey(Christine, '0', 'object') = Alice keys(Christine) = {0} (equivalent to a node with one edge, to 0) rei(Christine,0) = e_1 keys(e_1) = {'subject', 'verb', 'object', 'label'}

and the later on we can introduce operations to transform the graph into one in which only edges with verb 'childOf' are selected, e.g. a 'parent' graph, etc, or one in which all edges with verb 'childOf' are selected are grouped into one set, which is then the target of an edge with label 'childOf'. """

...

oot variables could have a 'metaness' field that tells you whether to interpret the payload as data, pointers, reified edges, etc, according to my meta notation scheme

e.g. this could implement aliases

of course if the client accesses such a node from a sufficiently high meta-level vantage point, then they get the pointer itself instead of auto-resolving the alias

...

TODO

""" -- ootDataNotes1.txt

""" """ -- ootNotes1.txt

""" """ -- ootNotes2.txt

""" """ -- ootViewNotes1.txt

""" """ -- ootNotes3.txt

""" """ -- ootDataThoughts.txt

""" """ -- ootMainSyntax.txt

-- TODO: find good excerpts about Oot graphs from:

---

masklinn 10 hours ago

parent prev next [–]

> Python ended up 'specializing' in data contexts, thanks to Numpy / Pandas

Yeah it was a success 20 years in the making, had a fair amount of input from science men and that being GvR’s? background he responded positively e.g. the extended slices (striding) and ability to index with an unparenthesised tuple (multidimensional indexing) were very much added for the convenience of numpy’s ancestors.

I would not say it had no competition, Perl used to be quite popular in at least some sci-comp fields (I want to say bioinformatics but I’m probably wrong).

reply

---

"So we have the Architect type, who loves lots of classes which delegate work to each other, and we have the Enlightened type, who wants to write and read less. And the Enlightened type can rant and rave all day how Python, or Ruby, or Lisp make it oh-so-easy to define data structure literals, or to factor out stuff using meta-programming, or some other thing an Architect just never gets. And I'm all with it." -- https://yosefk.com/blog/i-cant-believe-im-praising-tcl.html

---

    ~
    dkl 24 hours ago | link | flag | 

Can you point me to Turtle? It’s not something I’ve heard of before.

    ~
    hufman 24 hours ago | link | flag | 

It’s a human-readable format, simplifying from N3/Notation3, to encode RDF tuples. I heart RDF and Linked Data and wish I had more excuses to use it!

https://en.wikipedia.org/wiki/Turtle_(syntax) https://en.wikipedia.org/wiki/TriG_(syntax) https://en.wikipedia.org/wiki/Notation3#Comparison_of_Notation3,_Turtle,_and_N-Triples

---

we want multidim arrays

---