proj-oot-ootDataNotes3

"

lkrubner 727 days ago

I love this:

"Key := Val Updates an existing Key The key must be present in the old map. If Key is not in the old map Erlang will complain loudly. Using two operations instead of one has several advantages: A spelling mistake cannot accidentally introduce a new key"

Lord knows how many times I accidentally created a new key with a spelling typo, and then later when I wanted to retrieve the data, I was like "What the hell is wrong? I know I put data in that key, so why is nothing coming back? I guess I have discovered a rare bug in the compiler that no one has ever discovered before." "

---

 roncohen 823 days ago | parent

They should do the world a favor and include a datetime type.

samatman 823 days ago

Indeed, this is a known terrible mistake, easily avoided.

The lack of the string "UUID" in the RFC is also cause for concern.


MichaelGG? 823 days ago

They have a tag for date strings, or you can use seconds from epoch, as an integer _or floating-point_. So if you actually want to represent time with proper fractional seconds, you're stuck representing them as strings. Hardly concise.


Someone 823 days ago

Whats terribly wrong with http://tools.ietf.org/html/rfc7049#section-2.4.1?


drdaeman 823 days ago

The minor issues are missing timezone and precision information.

But, most importantly, use of integers for datetime values hides type-level semantics. It's just integers and you, the end user, and not the deserializer, is responsible for handling the types.

I think it's quite inconvenient to do tons of `data["since"] = parse_datetime(data["since"])` all the time, for every model out there.


---

samatman 840 days ago

parent on: ECMA-404: The JSON Data Interchange Format [pdf]

To summarize the top HN complaints: No date format, no comments, needlessly strict commas.

Is it too late for edn? https://github.com/edn-format/edn

Symbols that aren't "strings" are kinda neat too, and you get downright attached to arbitrary key-value mappings once you have them.

---

samatman 823 days ago

parent on: RFC 7049 - Concise Binary Object Representation (C...

Indeed, this is a known terrible mistake, easily avoided.

The lack of the string "UUID" in the RFC is also cause for concern.

---

can we unify truthiness (bools; True/False) and Maybe?

---

grex: graph regex

can generalize matching on a character to matching on an opaque 'node' by letting the grex contain arbitrary code (a predicate) at each location. The arbitrary code is stateless. Its input is a node, and its output is 'accept' or 'reject'. If the graph is in the shape of a string (linear) then the running time of the grex IN TERMS OF THE RUNNING TIME OF THE PROVIDED CODE (eg if the runtime of the provided code is taken to be a single step, or at least O(1)) is still O(m*n).

if the graph is not linear, how to generalize? One thing to do is to say that the grex follows each edge (if directed, only those edges in the correct direction) from the current node. Edge direction is just one kind of 'edge type', so to generalize this, it follows all edges of a given type. Generalizing further, we follow all edges that meet some predicate. Generalizing further, this predicate may change at each location, so we embed edge predicates like te node predicates above. So one way to do grexs is to provide an 'edge matching' construct for each 'space' in between the node matches. These are implicit in string regexes because there is only one type of 'space' (edge) to match. Like the node, the most general thing is to pass the edge to a provided predicate subroutine. But here we come to additional complexity. What do we do if an edge is rejected? I guess we should backtrack and try another edge at the same node. So there is additional nondeterminism here. I guess this is okay because it seems appropriate that the special case of a linear graph should lead to simpler behavior. Is the runtime still O(m*n)? I bet it is, but i'm not sure. The reason i bet it is is that i think of the first example that comes to mind for something that might be exponential in runtime, and it's a regex given by [1] for a regex which has an exponential blowup in space when you convert it from an NFA to a DFA: (a

b)*a(ab)(ab)(ab) (exponential, 2^k, in the number of (ab)s at the end; here we have k=3). You can see that this sort of thing may backtrack a lot if each node it checks has two children, b/c if one descendent tree isn't suitable it will want to backtrack up and try the other child of each ancestor. But for each node to have two children, the size of the graph itself must be exponential, so it's still O(m*n) in this case. To get exponential behavior in a non-exponential graph, we'd have to backtrack on something greater than the choices given by all of the choices of which edge comes out of the child.

Maybe if the graph has cycles then that could happen? Imagine a graph with two pairs of nodes where each node in each pair has an outgoing edge to each node in the other pair, a, b, c, d: a->c, a->d, b->c, b->d, c->a, c->b, d->a, d->b. Label all edges going to the first node in each pair (a or c) with '0' and label the other edges, the ones going to b or d, with '1'. Now if the node predicate just accepts any node and you write the grex as for the edges only as: (0

1)*0(01)(01)(01). What will happen? Well, in the best case, the edge ordering is such that all 0s are chosen first, and the NFA accepts without ever considering more than 1 edge. But even if the 1s are ordered first, the 0 bottleneck (after the *) will choose the 0 edge, and there will be no backtracking. The thing is, to force backtracking, you need a situation where the searcher is thinking like, 'This sequence doesn't match; but if only i had chosen a different child at the grandfather of the current node, maybe i would have found a different chain of descendents that would match'. But in this example graph, all sequences of 0,1 are available from any position.

So what about the graph where one of the pairs forces a '1', eg a, b, d: a-1>d, b-1>d, d-0>a, d-1>b. We left out the a->c and b->c edges, which are the only ones labeled '1' from the a,b pair of nodes; since there are now no incoming nodes to 'c', it is unreachable (assuming the initial state is a or b or some node with epsilon edges that goes to a and b), so we removed c. In this graph, the initial (0

1)* just 'populates' all of the nodes with walkers; then the 'bottleneck' 0 just 'prunes' those walkers on a or b, leaving the walker on d to survive and successfully transition to a with remaining grex (01)(01)(01). There is only one path out of a, (a->d) so the walker follows it, with remaining grex (01)(01). But now the walker bifurcates, going both to a and b. I suppose if the sequence at the end was longer than 3 then every other (01) there would be a bifurcation. If the walker then hit an impossible condition at the end (eg if the grex was (01)*0(01)(01)(01)2; there are no '2' labels in this graph), it would then 'backtrack', the possibility of which in my visualization here is already represented 'nondeterministically without backtracking' by the 2^(k/2) walkers present by the end (is there some more complicated construction that eliminates some of those walkers based on recognizing the finite, cyclic nature of the graph? i conjecture that this is np-complete (in other words, that computing edge label grexs on possibly-cyclic graphs is NP-complete).

so i guess my initial hunch may be wrong, and that there can be an exponential complexity blowup, if the graph is cyclic. So i reduce my original conjecture to: grexs have runtime O(m*n) on DAGs. Again, even if grexs have NP-complete performance on cyclic grexs, this seems appropriate as we transition from a linear graph to a cyclic one.

otoh my analysis here may be wrong. In implementations of ordinary regexs by backtracking, (a

aa)*b causes exponential blowup [2]; but my reasoning here would also say that (aaa) would also create exponential 'walkers' on ordinary strings. https://swtch.com/~rsc/regexp/regexp1.html explains how NFAs avoid exponential blowup in this sort of situation. At each execution step, one character in the string to be matched is consumed, and the NFA puts one walker on every state that could be reached by this character. So let's reconsider the cyclic graph this way. We still may have bifurcation because instead of 'consuming' a character in the string, we 'consume' a node, but this means bifurcating and visiting all of that node's children. So this may not help, and we may still face exponential blowup with grexs on cyclic graphs.

Now we come to my issue with this construction, it seems like rather than allowing cyclic graphs to blow us up in memory we'd prefer to have exponential time, which means visiting nodes one at a time and choosing a search strategy, depth-first, breath-first, iterative deepening, if we allowing other computation then a*, etc. But choosing this strategy doesn't seem like the sort of thing that should be hardwired, nor does it seem like it fits within the language of regular expressions.

btw, on the related question of should we allow backreferences and just use a backtracking search? i come down on the 'no' side, it would be better to have an O(m*n) regex without advanced capabilities like backreferences.

maybe we could add notation to the regular expression notation to allow the user to recognize and cut off cycles though? So that a given grex, if it properly uses that notation, could be applied to all graphs, even cyclic ones, yet still have O(m*n) time complexity. Of course, we may want notation to REQUIRE cycles too. I imagine that something like an 'equals' (like the equals in first-order logic) would do; then you could require that each node be not equal to any previous node. But isn't 'equals' backreferences? Well... if the 'equals' is applied to the VALUE of the node or edge label, then it is backreferences; but if it's the IDENTITY of the node or edge label, it's a little different, because in the special case of a string (a linear directed graph), all characters would be 'unequal' since we are really asking about the equality of the locations that the characters are in.

---

ok to see how we could represent node identity equations and their effect on complexity, let's look back on NFA regular expression recognizers, why they are efficient, and what their limitations are.

---

first off, there are three basic ways that regular expression engines are implemented:

An example given by https://en.wikipedia.org/wiki/Regular_expression for a regexp that can have exponential running time when matched via backtracking is (a

aa)*b. You can see that the darn thing will, if it hits a long string of 'a's with no b, try to backtrack by considering every combination of partitions of the string into groups of length 1 and 2.

An example given by http://regular-expressions.mobi/catastrophic.html is (x+x+)+y. Similarly to the previous example, when faced with a long string of 'x's, it will consider every possible number of groupings (x+x+), and for each number of grouping, every partition of the 'x's within that grouping into the first inner x+ and the second inner x+.

How does the NFA regular expression algorithm get around these problems and deliver O(m*n) performance? The speed of the NFA is due to two things:

aa)*b, think 'well maybe if i had chosen differently before i would have found a match', the NFA implementation doesn't remember what it chose before, and doesn't have to, because in any case it already known all the possible states that it 'could be in' at this point. If the NFA tried to remember all possible paths TO each state that it 'could be in', then it would face the problem of having to remember an exponential number of paths in cases of (aaa)*b. When there are starred subexpressions, a backtracker will have to remember stuff like, "the first time i hit (aaa) i chose aa, and then second time, i chose a', thereby letting each single state within the subexpression take up multiplie memory locations during the search; by contrast, the NFA not only doesn't separately remember which substrings it used to satisfy the separate 'instances' of (aaa), it goes further and doesn't remember how it satisfied any part of the regex at all.

Note that the second property, ignorance of history, really only needs the NFA to not care about ALL POSSIBLE PATHS to the current potential states. It is okay for each marker to be associated with ONE potential history. In fact, this is a way of implementing capture groups (submatch extraction), eg http://laurikari.net/ville/spire2000-tnfa.ps

On backreferences, https://en.wikipedia.org/wiki/Regular_expression says "..backreferences). This means that, among other things, a pattern can match strings of repeated words like "papa" or "WikiWiki?", called squares in formal language theory. The pattern for these strings is "(.+)\1". The language of squares is not regular, nor is it context-free, due to the pumping lemma. However, pattern matching with an unbounded number of back references, as supported by numerous modern tools, is still context sensitive.[28]"

The proofs of this stuff tend to use the 'pumping lemma', "there must exist a positive integer N such that every string in a regular language of length at least N must be of the form R S T such that R S^k T are also in the language for all natural k. Here R, S, T are strings and S may not be empty." [4]. Intuitively, the pumping lemma says that the regular expression has limited memory, and if you make a regex, which has fixed, finite memory, recognize a very long string, the only way it can do that is by forgetting part of the string, and repeating its internal state somewhere along the way while processing the long string.

How do backreferences in regexes make backtracking required? goes further and states the language specified by the extended regex "^(b*a)\1\1$" is not only not regular but is not even context-free!" and says that these proofs amount to proofs that there are "languages that can't be recognized with any machine having finite memory, nor any with only an infinitely large LIFO memory"; it seems that the finite memory part of that sentence is meant to correspond to regular languages (after all, regular languages are those languages that can be recognized by finite automatons [5]), and the infinite LIFO is meant to correspond to context-free languages.

But clearly it would be easy for a backtracking implementation to efficiently recognize even something of the form ^(b*a)\1\1$, without backtracking even; start at the beginning of the string, eat all the initial 'b's, eat an a, look at what you've eaten so far, see if that is repeated two more times. This is assuming that you have enough memory to store the input string, but that's true in many real-world programs (in fact, this assumption is pretty much the definition of 'context-sensitive language', the next step up on the Chomsky hierarchy from context-free languages). So there is a disconnect here between the Chomsky hierarchy and the time complexity of individual extended regular expressions; not every regular expression with backreferences is hard to compute.

In fact if each marker in the NFA remembered one history, and used that to try to match backreferences, that would already be sufficient for the particular case of ^(b*a)\1\1$. This does not conflict with the idea that ^(b*a)\1\1$ does not define a regular language; the length of the history that might need to be remembered would scale with O(m*n), which clearly means that no finite automaton can recognize this language, and regular languages are those languages that can be recognized by finite automatons. Note that the reason that the complexity does not explode is that each marker is only remembering one history, which means that potential histories must be lost every time two or more markers both transition to the same state (because on the next iteration there will only be one marker on that state, so only one of the histories can 'live'; the same concept is why programming languages have rules like '* is greedy' to define WHICH possible match will be the one returned by a capture group, when there are multiple possible matches).

Other cases with backreferences might be more tricky, however. For example, what about if the capture group is something like (a

aa)* and then there is something else like (aaa)* in between the capture group and a backreference to it. Perhaps the first time that the backreference is hit, it doesn't match. But this doesn't imply that it couldn't be matched if different choices had been made earlier; perhaps one of the (exponentially many) forgotten histories would have led to a different capture, and to a match.

See also http://stackoverflow.com/questions/11101405/how-do-backreferences-in-regexes-make-backtracking-required

random note: Remember that the ? and + operators can be written in terms of * (?x = (epsilon

x), x+ = xx*). So only and * are the fundamental, interesting operators here (assuming you have epsilon, the empty string, already).

---

the disconnect between actual difficulty and the Chomsky hierarchy shakes my faith in my impression that grexs should eschew backreferences. After all, the Oot philosophy is to pick very general generalizations of fundamental operators. Imo a proper implementation of augmented regular expressions would run the NFA algorithm when backreferences aren't present, and backtracking otherwise,

an in fact, http://stackoverflow.com/questions/11101405/how-do-backreferences-in-regexes-make-backtracking-required/11767511#11767511 links to a paper Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions that extends regular expression NFAs to be able to handle (limited?) backreferences efficiently.

---

so, preliminary thoughts on 'node equality' grex operators:

---

incidentally is the notation

a-1>d b-1>d d-0>a d-1>b

good for a graph literal with edge labels? or can we just do something like

a/1/d b/1/d d/0/a d/1/b

and what if we want multiple labels for some edges (rather than multiple edges with distinct labels between the same two nodes)? then i guess we'd we giving edges themselves rather than one attribute of edges:

n1/e1/n3 n2/e2/n3 n3/e3/n1 n3/e4/n2

e1.label1 = 1; e2.label1 = 1; e3.label1 = 0; e1.label1 = 1;

and what about hypergraphs? now we have more than two nodes per edge, maybe (assuming that in this graph, there are 3-part edges, and n4 is in the third place of each edge):

n1/e1/n3/n4 n2/e2/n3/n4 n3/e3/n1/n4 n3/e4/n2/n4

but what if we want to specify the roles of the nodes in each edge by name rather than by position? Or even use variables giving the roles? Then i guess we'd need a more complicated notation, eg a reification (note i guess in the previous, N1 and E1 etc should really be uppercase, b/c they are symbol names):

E1/[ROLE1/N1 ROLE2/N3 ROLE3/N4 LABEL1/1] E2/[ROLE1/N2 ROLE2/N3 ROLE3/N4 LABEL1/1] E3/[ROLE1/N3 ROLE2/N1 ROLE3/N4 LABEL1/0] E3/[ROLE1/N3 ROLE2/N2 ROLE3/N4 LABEL1/1]

(if you wanted to give a variable name to eg node 1 instead of the name 'N1' as a symbol, then i guess you'd say 'nodename1' in place of 'N1'? or are the N1s here the value of node 1, not just its 'name'? Well if two nodes can have the same value then you couldn't do the latter; but i think two nodes could have the same LABEL, but not the same 'value', because if they have the same value then it's the same node; then what is a node 'value'? i'm thinking it would be an object such that if you have a reference to that object, you can ask what the edges incident to that node are; but you can't give that immutably and finitely and explicitly in a graph literal (graph constructor), because in order to specify an edge you would have to specify its destination, and that destination would be another such node struct, and that node struct has an edge to this one; so it seems like, at least in a graph literal, you just HAVE to use names; what we could do though is have those names only have meaning within this constructor, rather than being permanent node labels. hmmm...)

---

i think i said this before, but i think it's key that OOP class methods should ONLY be for providing a small set of primitive operations (to encapsulate the data representation); other stuff built on top of this should just be ordinary functions or procedures. So most modules will have some interface (type) definitions, some small OOP classes implementing these types, and then a large number of functions and procedures that operate on them. So the OOP classes should not include much 'business logic' (aside from very abstract generic things like a very abstract double-entry accounting ledger).

One potential issue with this is that if you want to enforce certain constraints/invariants on the state, you might want to do this by providing methods for every macrooperation that, if done wrongly, could leave the state in an inconsistent state. If you can get this by creating a slightly larger set of primitives built on top of the original primitives, great, do that. But if you would have to mix in a bunch of business logic and get a huge variety of methods, i think it's too much. Perhaps we need a separate transaction-like language facility for making assertions of that form.

also, note that this would appear to go against some other people's design principals: see https://en.wikipedia.org/wiki/Anemic_domain_model ; "an anemic domain model is the typical result of not applying the information expert principle, i.e. you can avoid an anemic domain model by trying to assign responsibilities to the same classes that contain the data"

What about Erlang-style actors? Perhaps these are different from OOP classes.

how does this jive with my desire for everything to be overridable by getters/setters? well that's fine b/c getters/setters are by definition interacting with state

---

rust has cool 'enums' (discriminated unions, i guess this is like ADTs? todo what is the difference between discriminated unions and ADTs?):

" enum IpAddress? { V4(IPv4Address), V6(Ipv6Address), }

fn connect(addr: IpAddress?) { Check which version it was, and choose the right impl match addr { V4(ip) => connect_v4(ip), V6(ip) => connect_v6(ip), } } "

-- http://cglab.ca/~abeinges/blah/rust-reuse-and-recycle/#enums

---

Rust interfaces ("traits") are cool:

" struct MyType? { data: u32, }

Defining an interface trait MyTrait? { fn foo(&self) -> u32; }

Implementing an interface impl MyTrait? for MyType? { fn foo(&self) -> u32 { self.data } }

fn main() { let mine = MyType? { data: 0 }; println!("{}", mine.foo()); }

For the most part, you can just think of traits as interfaces in Java or C#, but there's some slight differences. In particular, traits are designed to be more flexible. In C# and Java, as far as I know, the only one who can implement MyTrait? for MyType? is the declarer of MyType?. But in Rust, the declarer of MyTrait? can also implement it for MyType?. This lets a downstream library or application define interfaces and have them implemented by types declared in e.g. the standard library.

Of course, letting this go completely unchecked would be chaos. People could inject functions onto arbitrary types! To keep the chaos under control, trait implementations are only visible to code that has the relevant trait in scope. This is why doing I/O without importing the Read and Write traits often falls apart. "

-- http://cglab.ca/~abeinges/blah/rust-reuse-and-recycle/#traits

Can a third party (neither the declarer of MyTrait?, nor the declarer of MyType?) implement the trait? Can the same trait be implemented multiple ways, and namespaced? This issue is apparently called 'coherence':

" Aside: Coherence

Those familiar with Haskell may recognize traits to be quite similar to Haskell's type classes. Those same people may then raise the (incredibly reasonable) question: what happens if there are multiple implementations of the same trait for the same type? This is the coherence problem. In a coherent world, everything only has one implementation. I don't want to get into coherence, but the long and the short of it is that Rust has more restrictions in place to avoid the problems Haskell has with coherence.

The bulk of these restrictions are: you need to either be declaring the trait or declaring the type to impl Trait for Type, and crates can't circularly depend on each-other (dependencies must form a DAG). The messy case is that this is actually a lie, and you can do things like impl Trait for Box<MyType?>, even though Trait and Box are declared elsewhere. Most of the complexity in coherence, to my knowledge, is dealing with these special cases. The rules that govern this are the "orphan rules", which basically ensure that, for any web of dependencies, there's a single crate which can declare a particular impl Trait for ....

The result is that it's impossible for two separate libraries to compile but introduce a conflict when imported at the same time. That said, the restrictions imposed by coherence can be really annoying, and sometimes I curse Niko Matsakis' name.

The standard library (which is secretly several disjoint libraries stitched together) is constantly on the cusp of breaking in half because of coherence. There's several implementations that are conspicuously missing, and several types and traits that are defined in weird places, precisely because of coherence. Also that wasn't even sufficient and a special hack had to be added to the compiler called #[fundamental] which declares that certain things have special coherence rules.

Coherence is really important.

I really hate coherence.

Specialization might make it better.

I should probably explain the orphan rules properly.

I'm not going to. " -- http://cglab.ca/~abeinges/blah/rust-reuse-and-recycle/#aside-coherence

---

See also the graph data model in [[oot-ootAssemblyNotes7?]].

todo: how does that compare to the data model i was exploring before?


oot graph regexs and/or grammars could be useful for working in AI where the nodes represent concepts, eg if A is parent-of B and A is male then A is father-of B; if there is a set of nodes S that match pattern 'ancestors-of B' and B is parent-of C then (S union B) is a subset of ancestors-of C, etc

---

example of 'Accessors' from Elixir:

iex> user = %{name: "john", ...> languages: [%{name: "elixir", type: :functional}, ...> %{name: "c", type: :procedural}]} iex> update_in user, [:languages, Access.all(), :name], &String.upcase/1 %{name: "john", languages: [%{name: "ELIXIR", type: :functional}, %{name: "C", type: :procedural}]}

-- [6]

---

so my summary of the technique used in https://fsharpforfunandprofit.com/posts/designing-for-correctness/ :

the task is that we have a shopping cart app with the following constraints:

    You can only pay for a cart once.
    Once a cart is paid for, you cannot change the items in it.
    Empty carts cannot be paid for.

We want F# to guarantee these at compile time. The way we do it is by explicitly modeling the state of the cart using ADTs. We have a type 'Cart' which is a discriminated union of three types, EmptyState?, ActiveState?, PaidForState?. EmptyState? and ActiveState? each has methods 'Add' to add items to the cart. ActiveState? also has Remove and Pay. PaidForState? doesn't have any of these methods. So now if you have a variable of type 'Cart' and try to call 'cart.Remove' on it, you'll get a compile-time error, because the compiler knows that PaidForState? doesn't support Remove. In order to get it to compile, you have to do an explicit 'match' on the cart, which in this case is a 'switch' statement over the discriminated union (ie the branches of 'match' are EmptyState?, ActiveState?, and PaidForState?); only in the ActiveState? branch are you permitted to call Remove.

---

so i guess it should be a compile-time error to try and 'switch' over a discriminated union if your cases are non-exhaustive, unless you explicitly include a 'default' branch?

---

notes on What ORMs have taught me: just learn SQL

cites http://blogs.tedneward.com/post/the-vietnam-of-computer-science/ approvingly; "Neward, in his well known essay, lays out many cogent reasons why ORMs turn into quagmires. In my experience, I've had to deal directly with a fair number of them: entity identity issues, dual-schema problem, data retrieval mechanism concern, and the partial-object problem. I want to talk briefly about my experiences with these issues and add one of my own. "

Partial objects, attribute creep, and foreign keys:

You have a table with many attributes, but typical ORM mechanisms map this to one object class with many attributes and then load all of these attributes each time the object is loaded; this is like 'SELECT *' and is inefficient b/c you usually only need a few attributes in any given context. In addition, ORMs represent links between classes as foreign keys and typically do joins to resolve these when loading an object, which again is usually inefficient because you usually don't need all those joins every time.

Data retrieval:

"unless you have a really simple data model (that is, you never do joins), you will be bending over backwards to figure out how to get an ORM to generate SQL that runs efficiently. Most of the time, it's more obfuscated than actual SQL.

And if you elect to keep the query simple, you end up doing a lot of work in the code that could be done in the database faster. ((eg)) Window functions...In these cases, I've elected to write queries using a templating system and describe the tables using the ORM. I get the convenience of an application level description of the table with direct use of SQL. "

Dual schema dangers:

" The problem is that you end up having a data definition in two places: the database and your application....This one seems to be one of those unavoidable redundancies. If you try to get rid of it, you only make more problems or add excessive complexity....I much prefer to keep the data definition in the database and read it into the application. It doesn't solve the problem, but it makes it more manageable. I've found that reflection techniques to get the data definition are not worth it and I succumb to managing the redundancy of data definitons in two places.

...

I work on the principle that the database's data definitions aren't things you should manipulate in the application. Instead, manipulate the results of queries. That is, the queries are your API to the database. So instead of thinking about objects, I think about functions with return types. "

Identities:

" When you have foreign keys, you refer to related identities with an identifier. In your application, "identifier" takes on various meanings, but usually it's the memory location (a pointer). In the database, it's the state of the object itself. These two things don't really get along because you can really only use database identifiers in the database (the ultimate destination of the data you're working with).

What this results in is having to manipulate the ORM to get a database identifier by manually flushing the cache or doing a partial commit to get the actual database identifier. "

Transactions:

transactions are dynamically scoped, which isnt supported in most languages

"This leads to a lot of boilerplate code with exception handlers and a careful consideration of where transaction boundaries should occur. It also makes you pass session objects around to any function/method that might have to communicate with the database....can make modularity tricky ("Here's a useful function that will only work in certain contexts")."

stored procedures:

"At this point, I'm starting to question the wisdom behind the outright rejection of stored procedures. "

comments from HN:

"If you're not using an ORM, then you ultimately end up writing one....I'm currently at a company that is not using an ORM - what has happened is that developers have written endpoints with inconsistent data formatting for similar data types, which prevents the possibility of creating abstractions cleanly on the client. It would have been nice to have a lightweight ORM, if only for object consistency....relational databases and the normalized storage of data is completely different from the way OO languages deal with rich nested objects"

to the contrary:

"A lot of things people use ORMs for are rather easily solved with stored procedures, especially in Postgres where you can write stored procedures in Perl, Ruby, etc. Validations, “fat models”, etc are all managed with SQL easily (and this means you get that functionality from _anywhere you access the database_, not just from your framework with an ORM). For convenient access you can roll 30 lines of Perl to wrap DBI or whatever (I use Perl for most web backends these days) and call your stored procedures in normal syntax with a little metaprogramming."

"If your ORM is just providing a mapping between select/map, where/filter, join/zip, etc., you have a fairly list-of-records-ish functional application and your objects are only nominally objects....a lot of ((the benefits of ORMs)) can be realized with a simple wrapper around SQL (i.e. LINQ). "

" Marshalling support is useful. SQL result -> struct and struct -> SQL insert code is repetitive to write. Getting fancier than that may be overkill.

Marshalling in general needs more compile-time support. Kludges such as Google protocol buffer preprocessors are a fast but clunky way to do it. It would be useful if languages could be given a reference to an SQL CREATE TABLE and could use that information usefully. Field names, type information, and enum values should come from the CREATE TABLE info. ... This is the sweet spot of ORMs. "

" An ORM provides type checking at your application layer"

"doing queries over relational databases in a type-safe way requires either 1) some form of structural typing (for projections), or 2) ignoring projections altogether, which brings a slew of perf problems to the table. Ironically, this means that OCaml is probably the best [reasonably popular] language to use an ORM in. "

"If you're not using an ORM, then you ultimately end up writing one.

There are other ways, like when using "event sourcing", or servers like postgrest [0] that give you a REST-api on top of your database. "

" ORMs are useful for solving certain types of problems and less suitable for other types. If you work on web stuff or applications that are report-oriented where most of the time you're just fetching data (possibly with complex queries) and rendering it to display, then maybe ORMs aren't a good fit.

On the other hand if you work on client-side apps where your objects are backed by a database but are otherwise long-lived, then sometimes the other features (beyond SQL generation) that ORMs provide come in handy (tracking units of work over the object graph, maintaining an identity map for consistent access to objects, and providing change notifications when an object or collection of objects is manipulated). "

" At Standard Chartered we even went so far as to add relations as a datatype to our Haskell-like language. It's a charm; and comparable for me to my experience first going from C-style arrays only to eg Python's dicts.

https://www.reddit.com/r/haskell/comments/2u0380/slides_from_don_stewarts_google_tech_talk/ "

" https://www.reddit.com/r/haskell/comments/2u0380/slides_from_don_stewarts_google_tech_talk/ :

https://hackage.haskell.org/package/opaleye

...If you mean in memory relational algebra library, then it should be feasible to port the Opaleye API to work with in memory data using the fastest indexing methods for any given data structure (key lookup for Data.Map, sequential scan for lists etc.).

... There is literally zero magic or sophistication in the relational algebra library we have.

All it is, is a straightforward, efficient implementation, in C++, of in-memory relational algebra with a mostly untyped interface, with primitives like

relation :: (IsTuple? row) => [ColumnName?] -> [ColumnName?] -> [row] -> Relation relation columns key rows = ...

and combinators like

join :: Relation -> Relation -> Relation extend :: (a -> b) -> [ColumnName?] -> [ColumnName?] -> Relation -> Relation

As you can see it's all untyped; in extend, for example, there's no static check that the function takes a tuple of the right arity and returns a tuple of the right arity (maching the number of input/output column names). ...

We have a typed layer as well, but it needs some more work to be smooth. .... Even the untyped relational algebra stuff was often better than mucking around with something like Data.Map by hand, even if the latter is typed.

See http://elegantcode.com/2014/05/09/out-of-the-tar-pit/ for some more on functional relational programming.

"

---

My experience with hibernate, entity framework, nhibernate, and lastly dapper has basically left me thinking Dapper is all you'll ever want. I can't imagine a use case where I'd rather opt for NHibernate or EF at the moment.

(I might add, Dapper in combination with C#6 even gives me enough type saftey to be happy. String interpolation and the nameof operator complements dappers DTO approach nicely)

...

Same here. And I don't know anybody who has worked on large enterprise EF systems that hasn't come to roughly the same conclusion. The type safety EF offers is extremely nice to have, no doubt, but the problem is it makes it so easy to create performance problems that _every_ system ends up with them. Probably half of my billable hours in the last few years has been addressing EF performance problems.

Dapper, with a few extensions, can give you type safety for 80% of your typical queries, and the rest can easily be done in stored procedures or with (my preference) Dapper's SqlBuilder?. And using SSDT instead of EF code-first, you can easily and in a version-controlled way manage your schema, views, stored procedures, indices, etc., simplifying performance management and getting static analysis in your sql while still not losing straightforward and configurable migrations.

---

" I think ORMs are a great tool to get something off the ground quickly....Like with most tools you will hit a point where they make things more difficult and then it's probably time to switch to SQL only or mix SQL with ORM especially for performance critical queries. ...

I definitely find

  users = User.where(has_foo: true).limit(10)

to be a lot more readable than

  rows = connection.exec_query("SELECT * FROM users WHERE has_foo=true LIMIT 10")
  users = rows.map { |row| User.build(row) }
 (And that's an example with no user-provided input)

...

((someone else replies:))

For me thats the opposite. I prefer "rows = connection.exec_query("SELECT * FROM users WHERE has_foo=true LIMIT 10")", "

" joostdevries 13 hours ago

I prefer to use functional frameworks. They 1) express the sql with map, flatmap, filter, groupBy etc functions 2) use classes that are equivalent to tuples.

Since I develop in Scala both are very intuïtive to write.

Apparently Linq for F# is really good as well. As someone else mentioned in a comment.

I get a programming interface that doesn't attempt to hide sql and still my code is fully typed. And it's really easy to make your query logic modular: make the sql fragment a function that you can call. Like any other code you reuse.

I can't think of an advantage of ORM over this functional approach. Apart from ORM being more well known.

reply "

(in other previous comments, ppl basically said that ok but i call that sort of thing an ORM)

---

linq:

'language-integerated query' uniform query syntax over different data sources (eg drivers for SQL, XML, ordinary arrays, etc), and some sort of uniformity over different languages too (C# and VB; can LINQ values be passed between these?) the 'query' is typed as a (potentially lazy) sequence; to execute it, iterate through the sequence i'm not sure if these are actually first-class queries; i'm not sure if you can reflect on the query's structure after it is created, or if you can create the query once and then apply it to different data sources todo: i've heard this is a monad but i dont understand that 100% yet LINQ seems to facilitate greater typechecker reasoning; there seems to be a subclass of IEnumerable called 'IQueryable' that helps with this.. not sure if 'IQueryable' is more first-class/rebindable as well

example in C# (adapted from [7]):

        int[] numbers = new int[7] { 0, 1, 2, 3, 4, 5, 6 };
        // numQuery is an IEnumerable<int>
        var numQuery =
            from num in numbers
            where (num % 2) == 0
            select num;
        foreach (int num in numQuery) {
            Console.Write("{0,1} ", num);}

---

J has a concept called a 'locale' which is sort of a special case of our idea of 'semiglobals' (which is i guess another name for dynamically scoped variables) with first class 'contexts' (dynamic variable namesspaces reified as an object). However in J, the 'locale' (namespace) in which a 'verb' (function) executes cannot (afaict) be given dynamically (from a variable), but instead of given statically (eg to run function 'f' in local 'l1', you write 'f_l1_' in the source code instead of 'f').

There are a few special 'locales' in J. The 'base' local is the default. The 'z' or 'root' locale is looked in when a binding is not found in the current locale. The 'j' locale apparently has some initializationy/implementy/workspacy/environmenty stuff.

This system could (afaict) be strictly generalized in Oot by having dynamic variable 'context' objects, and then also having one of these dynamic variable 'context' objects itself be a dict of other named dynamic variable contexts, with a 'base', a 'z', and a 'j' (or 'base', 'root', and 'o' in our case).

One other feature that J makes easy is for a package of related functions to have their 'own' namespace. Eg if you have a module named 'port' you could have all the functions in it refer to the 'port' locale and then they have a private place to share variables with other modules in that package. They don't tie to to modules and don't ensure privacy (you just use the 'port' locale and hope no one else picked the same name; and you could have other fns in the same 'module' that use their own, different locale), but in Oot i imagine we could just tie it to modules by having a __CURRENTMODULE namespace.

[8]

---

https://github.com/google/protobuf/releases/tag/v3.0.0

" The main intent of introducing proto3 is to clean up protobuf before pushing the language as the foundation of Google's new API platform. In proto3, the language is simplified, both for ease of use and to make it available in a wider range of programming languages. At the same time a few features are added to better support common idioms found in APIs.

    The following are the main new features in language version 3:
        Removal of field presence logic for primitive value fields, removal of required fields, and removal of default values. This makes proto3 significantly easier to implement with open struct representations, as in languages like Android Java, Objective C, or Go.
        Removal of unknown fields.
        Removal of extensions, which are instead replaced by a new standard type called Any.
        Fix semantics for unknown enum values.
        Addition of maps (back-ported to proto2)
        Addition of a small set of standard types for representation of time, dynamic data, etc (back-ported to proto2)
        A well-defined encoding in JSON as an alternative to binary proto encoding.
    A new notion "syntax" is introduced to specify whether a .proto file
    uses proto2 or proto3:
    // foo.proto
    syntax = "proto3";
    message Bar {...}
    If omitted, the protocol buffer compiler generates a warning and "proto2" is
    used as the default. This warning will be turned into an error in a future
    release.
    We recommend that new Protocol Buffers users use proto3. However, we do not
    generally recommend that existing users migrate from proto2 from proto3 due
    to API incompatibility, and we will continue to support proto2 for a long
    time.
    Other significant changes in proto3.
    Explicit "optional" keyword are disallowed in proto3 syntax, as fields are optional by default; required fields are no longer supported.
    Removed non-zero default values and field presence logic for non-message fields. e.g. has_xxx() methods are removed; primitive fields set to default values (0 for numeric fields, empty for string/bytes fields) will be skipped during serialization.
    Group fields are no longer supported in proto3 syntax.
    Changed repeated primitive fields to use packed serialization by default in proto3 (implemented for C++, Java, Python in this release). The user can still disable packed serialization by setting packed to false for now.
    Added well-known type protos (any.proto, empty.proto, timestamp.proto, duration.proto, etc.). Users can import and use these protos just like regular proto files. Additional runtime support are available for each language.
    Proto3 JSON is supported in several languages (fully supported in C++, Java,
    Python and C# partially supported in Ruby). The JSON spec is defined in the
    proto3 language guide:
    https://developers.google.com/protocol-buffers/docs/proto3#json
    We will publish a more detailed spec to define the exact behavior of
    proto3-conformant JSON serializers and parsers. Until then, do not rely
    on specific behaviors of the implementation if it’s not documented in
    the above spec.
    Proto3 enforces strict UTF-8 checking. Parsing will fail if a string field contains non UTF-8 data.

General

    Introduced new language implementations (C#, JavaScript, Ruby, Objective-C) to proto3.
    Added support for map fields (implemented in both proto2 and proto3).
    Map fields can be declared using the following syntax:
    message Foo {
      map<string, string> values = 1;
    }
    The data of a map field is stored in memory as an unordered map and
    can be accessed through generated accessors.
    Added a "reserved" keyword in both proto2 and proto3 syntax. Users can use
    this keyword to declare reserved field numbers and names to prevent them
    from being reused by other fields in the same message.
    To reserve field numbers, add a reserved declaration in your message:
    message TestMessage {
      reserved 2, 15, 9 to 11, 3;
    }
    This reserves field numbers 2, 3, 9, 10, 11 and 15. If a user uses any of
    these as field numbers, the protocol buffer compiler will report an error.
    Field names can also be reserved:
    message TestMessage {
      reserved "foo", "bar";
    }
    Added a deterministic serialization API (currently available in C++). The deterministic serialization guarantees that given a binary, equal messages will be serialized to the same bytes. This allows applications like MapReduce to group equal messages based on the serialized bytes. The deterministic serialization is, however, NOT canonical across languages; it is also unstable across different builds with schema changes due to unknown fields. Users who need canonical serialization, e.g. persistent storage in a canonical form, fingerprinting, etc, should define their own canonicalization specification and implement the serializer using reflection APIs rather than relying on this API.
    Added a new field option "json_name". By default proto field names are converted to "lowerCamelCase" in proto3 JSON format. This option can be used to override this behavior and specify a different JSON name for the field.
    Added conformance tests to ensure implementations are following proto3 JSON specification.

"

---

a 1D list and a 2D table with 1 row are similar

a 1D list of length 1 and a scalar are similar

---

Noms is a version-controlled, typed database

https://github.com/attic-labs/noms/blob/master/doc/cli-tour.md

---

some J examples of autoarray stuff:

   m =. i. 2 2
   m
0 1
2 3

You can add arrays together that have the same rank and shape.

   m + 2 2 $ 10 11 12 13
10 12
14 16

You can add a single number to an array.

  10 + m
10 11
12 13

What if you wanted to add one number to the first row and a different number to the second row?

   10 20 + m
10 11
22 23

But what if you wanted to add those numbers to the columns instead? You have to indicate that you want to add to the columns not the rows.

   10 20 +"1 m
10 21
12 23

[9]

" Frame and cell So far nouns have been considered in their entirety. However, it is useful to think of an array as consisting of cells, parts of the array (subarrays) that when placed in a frame, make up the entire array.

   a =. 2 3 $ i. 6
   a0 1 2 3 4 5

The array a can be thought of as having 6 cells, where each cell is an atom. The frame would be the shape 2 3 that structures the 6 individual cells into the array a. Visually (cell is atom and frame of 2 3):

	0       cell 0
	1       cell 1
	...
	5       cell 5

The array a can also be thought of as having 2 cells, where each cell is a list. The frame would be the shape 2 that structures the cells into the array a. Visually (cell is list and frame of 2):

	0 1 2   cell 0
	3 4 5   cell 1

Finally, the array a can be thought of as having 1 cell, where the cell is a table. The frame would be the shape empty that structures the cells into the array a. Visually (cell is table and frame is empty) :

	0 1 2   cell 0
	2 3 4

A table with shape 2 3 can be thought of as:

    a 2 3 frame of cells that are atoms
    a 2 frame of cells that are lists of shape 3
    an empty frame of a cell that is a table of shape 2 3

Similarly, an array with shape 4 3 2 can be thought of as:

    a 4 3 2 frame of cells that are atoms
    a 4 3 frame of cells that are lists of shape 2
    a 4 frame of cells that are tables of shape 3 2
    an empty frame of a cell that is a rank 3 array of shape 4 3 2

The frame is a prefix of the shape of the array. It can be the entire shape (a prefix of all), in which case the cells are atoms. It can be empty (a prefix of none) in which case there is a single cell which is the array. Or anything in between.

The cell shape is the array shape with the frame prefix removed. The length of the cell shape is the cell rank.

The cells of an array are the subarrays that, when assembled into the corresponding frame, create the entire array. "

[10]

" Item Arrays are frequently treated as having a frame of length 1. With this frame, the array has cells of rank 1 less than the rank of the array. These cells are the items of the array.

The items of a list are the atoms in the list. The items of a table are the rows in the table. The items of a rank 3 array are the tables in the array. An array is the list of its items.

An atom has one item, itself.

The # (tally) of a noun is the number of items in the noun.

  1. 23 1
  2. 1 $ 5 1
  3. i. 5 5
  4. i.2 3 2 "

[11]

" k-cell A cell of rank k is also called a rank-k cell or k-cell. A 0-cell is an atom, a 1-cell is a list, a 2-cell is a table, and so on. If the rank of the cells of a noun is given, then the frame is whatever is left over of the shape of the noun.

Negative numbers are also used, as in _2-cell and _1-cell; the frames of such cells have length indicated by the magnitude of the numbers. You have seen _1-cells before: they are items. " [12]

" Verb rank A verb has a rank that determines how it applies to its arguments. A monad of rank k applies to the k-cells of its argument. A dyad of left rank kl and right rank kr applies to the kl-cells of its left argument and the kr-cells of its right argument. Verb rank is a powerful tool that controls the way a verb applies to arrays.

" [13]

" Agreement For a dyad the left rank of the verb and the rank of the left argument determine the frame of the left argument. Similarly the right rank of the verb and the rank of the right argument determine the frame of the right argument. If the left and right frames are the same, then there are the same number of cells in each argument, and it is simply a matter of taking each cell in turn from the left and right arguments, applying the verb, and putting the result into the frame of the result.

   a =. i. 2 3
   b =. 2 3 $ 7
   a + b
 7  8  910 11 12

Visually you can see how each atom from the left is used with the corresponding atom from the right.

0 1 2 + 7 7 7 gives 7 8 9 3 4 5 7 7 7 10 11 12

You have also seen that the following works.

   a + 7
 7  8  910 11 12

Visually you can see how each atom from the left is used with the corresponding atom from the right.

0 1 2 + 7 ... gives 7 8 9 3 4 5 ... 10 11 12

The ... indicates that the cell is repeated to provide the required arguments. The ... to the right and below the 7 indicates it is repeated in 2 axes.

But what about the following?

   a + 3 43 4 5 7 8 9

Again you can see how the cells of the right argument repeat to provide the required verb arguments.

0 1 2 + 3 ... gives 3 4 5 3 4 5 4 ... 7 8 9

But there must be some agreement between the cells in the arguments.

   a + 3 4 5¦length error ¦   a    +3 4 5

Visually what is happening:

0 1 2 + 3 ... gives 3 4 5 3 4 5 4 ... 7 8 9 5 ... error - ran out of lefts

The above cases are simple enough, but consider the following with a rank 3 noun.

   b =. i. 2 3 4
   b + a
 0  1  2  3
 5  6  7  810 11 12 13

15 16 17 18 20 21 22 23 25 26 27 28

This is more complicated to visualize.

 0  1  2  3   +   0 ...   gives   0  1  2  3
 4  5  6  7       1 ...           5  6  7  8
 8  9 10 11       2 ...          10 11 12 13

12 13 14 15 3 ... 15 16 17 18 16 17 18 19 4 ... 20 21 22 23 20 21 22 23 5 ... 25 26 27 28

Similarly:

   b + 2 3
 2  3  4  5
 6  7  8  910 11 12 13

15 16 17 18 19 20 21 22 23 24 25 26

Visually:

 0  1  2  3   +   2 ...   gives  2  3  4  5
 4  5  6  7       ...            6  7  8  9
 8  9 10 11                     10 11 12 13

12 13 14 15 3 ... 15 16 17 18 16 17 18 19 ... 19 20 21 22 20 21 22 23 23 24 25 26

The agreement rule is quite simple. If the left and right frames are the same then there is no problem. Otherwise, one frame must be a prefix of the other, and its cells are repeated into its trailing axes to provide the required arguments. " [14]

" Rank conjunction " The primitive " (double-quote, not two quotes) is the rank conjunction. ... The rank conjunction produces a new verb from its left argument with the rank information from its right argument.

   plus000 =. + " 0 0 0

The right argument for " is the rank information for the primitive + that is given in the J Dictionary (look up + in the vocabulary, turn to the definition page, and note the rank information in the heading). The first 0 is the rank of the monad argument. The second and third 0's are respectively the rank of the dyad left and right arguments. ... with same ranks as the primitive + it should behave just as does + or plus . You can verify this with a few experiments borrowed from the previous section on agreement.

   a =. i. 2 3
   a plus000 a0 2 4 6 8 10 a plus000 1 2 3 ¦length error ¦   a   plus000 1 2 3

The length error occurs because the arguments do not agree as per the previous section. The left frame is 2 3 and the right frame 3, and 3 is not a prefix of 2 3; there are extra cells from the left argument without corresponding cells from the right argument.

However, it seems reasonable to want to add the list 1 2 3 to each list in the left argument. You know what you want it to do. Visually:

0 1 2 + 1 2 3 gives 1 3 5 3 4 5 ... 4 6 8

You want a variation of + that adds lists from its left argument to lists from its right. You can do that by changing the arguments to the " conjunction to indicate that the dyad left and right ranks are lists.

   plus011 =. + " 0 1 1
   a plus011 1 2 31 3 5 4 6 8 1 2 3 plus011 a 1 3 5 4 6 8 ...
 Since + is applied dyadically and both ranks are 1, you can use the shorter form of +"1 which uses 1 for the rank of all arguments.
   1 2 3 +"1 a1 3 5 4 6 8

In this case, the left frame is empty with a cell shape of 3 and the right frame is 2 with a cell shape of 3. Empty is a prefix of 2, and so the frames agree.

" [15]

"

" In the previous sections the question of the shape of the result was glossed over. For a monad the frame of the result is the same as the frame of the argument. For a dyad the frame of the result is the frame of the longer of the frames of the arguments (or either frame if they are the same).

With a verb like + that has an atom result for each atom argument this is straightforward. Things get more interesting with verbs that have more complicated behavior.

Consider the verb $ . Look it up in the J Dictionary and you'll see it has rank of _ 1 _ . The _ indicates an infinite (unbounded) rank and means that the verb applies to the entire argument. The monad has unbounded rank and so applies to the entire right argument. If you think about the monad $ with a result that is the shape of its entire right argument this makes sense. The dyad left rank is 1 and this means that it applies to lists from the left argument. The dyad right rank is unbounded and so applies to the entire right argument.

   2 4 $ i.30 1 2 0 1 2 0 1 2 4 $"1 0 i.3 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

The first example is what you have seen before, but what is going on in the second? The $"1 0 means that $ will get cell arguments as a list (1-cells) on the left and as an atom (0-cell) on the right. The left frame is empty (nothing is left of the shape of the left argument after a 1-cell is taken) and the right frame is 3 (there are 3 0-cells in the right argument). So the result frame is 3.

2 4 $ 0 gives 0 0 0 0 left 1-cell $ right 0-cell 0 0 0 0

... $ 1 gives 1 1 1 1 repeat 1-cell $ next 0-cell 1 1 1 1

... $ 2 gives 2 2 2 2 repeat 1-cell $ next 0-cell 2 2 2 2

The frame of the result is 3 and the things in that frame are 2 by 4 tables, so the shape of the final result is 3 2 4.

   $ 2 4 $"1 0 i.33 2 4 " -- [16]

---

Chu spaces without the continuity condition are relations represented as 2d arrays whose values are generalized beyond true/false.

This suggests that yes, if we can get a language that handles both graphs and n-d multidim arrays as primitives and deals with the conversion of boolean 2d arrays, unlabeled graphs, relations, and logic, we'll get something pretty good.

---

the diagram in J Primer's 'frame and cell':

" The array a can be thought of as having 6 cells, where each cell is an atom. The frame would be the shape 2 3 that structures the 6 individual cells into the array a. Visually (cell is atom and frame of 2 3):

	0       cell 0
	1       cell 1
	...
	5       cell 5

The array a can also be thought of as having 2 cells, where each cell is a list. The frame would be the shape 2 that structures the cells into the array a. Visually (cell is list and frame of 2):

	0 1 2   cell 0
	3 4 5   cell 1"

, as well as the category-theory idea of 'diagram' (as just a labeling of a graph),

suggest that i was on the right track with my idea of 'schemas' for data structures (such as list, dict, 2d array, etc) as repeated/tiled labelings of a generic graph data structure

---

the J concept of 'rank' and 'cell' are a little alien to me, but i should probably incorporate them into my thinking.

When i think of, eg "a + b" where a and b contain arrays, i think:

the values in "a" and "b" should be passed to '+'. If a and b are 2d arrays, and '+' looks into lists, then it might see each of 'a' and 'b' as a list of rows (or a list of columns). This sort of thing is what the J primer calls 'items'; an item is whatever is stored if you consider the array as a list, where its 'major' axis (the highest/most recently added dimension) goes along the list. This is thinking in terms of what J calls 'frames', that is, 'top down' breaking apart the entire object into its parts.

in a way, this way is more 'Pythonic' because passing something to an operator has a simple, non-vectorized semantics: the operator performs its usual action on the values of its arguments.

but in J, instead of looking at values passed in from the 'top down', eg breaking an array into lists, the operator can, via its rank, look at it 'bottom up', eg looking at the 'minor' axis (the lowest/least recently added dimension). For example, a 2d table is made of 'cells' of scalars.

this is alien to me because what if each 'cell' actually contains a list?

i bet you can't do that in J; i bet in J, the language always has the ability to go 'all the way to the bottom' of data structures because i bet you can't encapsulate a composite data structure as the elements of a list or matrix.

in Oot, however, we could have the best of both worlds; 'boundaries' could be used to define the encapsulation, and rank could be relative to boundaries.

the usefulness of cells/rank appears to be to give concise ways to specify different sorts of 'vectorization' of an operator; eg with rank 0, the cells are just the actual elements of the matrix, so the operation is repeated over each element of the matrix; with rank 1 however, the cells are lists of elements, the most minor axis of the matrix, and the operation is only repeated over each of these lists.

---

so perhaps we could have simple 'pythonic' behavior by default, and then have a syntactic metaoperator to allow J-style vectorization; rank would always be '0' by default but could be changed by an operator similr to J's '"' (the 'rank' conjunction)

---

Often in low-level or minimalist systems, i see constraints on fundamental composite data structures.

the common ones:

---

recall that we want to unify function application and table lookup and object field access, because

obj.x

is like

table[x]

but we want to be able to transparently replace fields on objects with getters/setters later, so obj.x is also like

obj.get(x)

does this imply that f(x) is like f.x and table[x] is like table x? Or is is just that f.x is like f.__get__(x) and table[x] is like table.__get__(x)?

---

You can't add pointers. But you CAN concatenate paths, eg ".x.3" can be concatenated with ".y.2" to get ".x.3.y.2"

---