Bayle Shanks's website: proj-oot-ootDataNotes1

oot data design todos

representations may be related by a chain of transformations, not just single transformations; the transformations form a category
transformations between representations may not be injective, in which case some representations have more information than others. Should be require that there is always at least one representation (perhaps the 'meta' representation) that has all the information, e.g. from which there is an injective map to any other representation?
transformations between representations may not be surjective, in which case operations will be available on the transformed representation that don't correspond to any operation in the original representation. If such operations are applied, should we (a) throw an exception, (b) simply have a non-injective mapping back to the original representation, and allow the transformed representation to hold onto the extra information (in which case we have to specify what happens if then the data is altered again using the original representation; e.g. does this cause the extra information in the other representation to be lost, as if the last representation to be used is always the 'real' form of the object, which is converted after the fact? what if an object is passed to a subroutine that only takes a superclass of the object; do we lose all of the content in the extra fields in the object? it seems that the answer should be no, but then how long do we hold the extra info? what if the object is copied within the superclass-using subroutine, instead of mutated?) (c) immediately take a round-trip to the smaller representation and then back to the current one, immediately erasing the extra info

---

see also OotView?

---

" Self provides structural re ection via mirrors [25]. It can actually be argued that mirrors are a rediscovery of up and down from 2-Lisp, but put in an object- oriented setting. However, mirrors provide new and interesting motivations for a strict separation into internal and external representations. Especially, mirrors allow for multiple di erent internal representations of the same external object. For example, this can be interesting in distributed systems, where one internal representation may yield details of the remote reference to a remote object, while another one may yield details about the remote object itself. AmbientTalk?/2 is based on mirrors as well, but extends them with mirages that provide a form of (object-oriented) procedural re ection [26]."

huh i wonder if these are like my 'perspectives' or 'views'?

---

to generalize (hyper)nodes further (?!):

n-ary Chu-ish spaces

(note that ordinary Chu spaces can already represent n-ary relations, see http://chu.stanford.edu/ ; i say Chu-ish b/c i'm not really thinking of Chu spaces here, just an n-ary matrix that is like the characteristic matrix of a relation, except with an arbitrary alphabet)

lattice ops (lub and glb)

" onal Programming Language A relational programming language (RPL) is a DeclarativeLanguage? built around the RelationalModel? of data. StructuredQueryLanguage? (SQL) is an example.

See also Jay Earley's VERS 2, a ProceduralProgrammingLanguage?? built around the RelationalModel? of data. For example, this paper:

Jay Earley "Relational Level Data Structures for Programming Languages". Acta Informatica 2: 293-309 (1973)

Also LIBRA - A general-purpose programming language based on the algebra of binary relations

http://www.cs.adelaide.edu.au/~dwyer/TR95-10_TOC.html

    "Ordinary programming languages calculate functions. Sometimes a function is inappropriate. For example, 4 has two square roots, +2 and -2, but an ordinary programming language provides a sqrt function that returns only one of the roots."

The PrologLanguage? could be considered a relational programming language.

Not really. PrologLanguage? has no inherent support for labeled tuples. All it has is untyped lists of "atoms". In this respect it is quite similar to LispLanguage?. I suppose you could add support for labeled (typed) tuples, though. OTOH one of the fundaments of PrologLanguage? is an in-memory database of "facts" (forgot the actual term - clauses maybe? - but "facts" describes them better than original).

PrologLanguage? has tuples, but you examine them using pattern matching instead of using labels. Prolog is based on predicate logic rather than relational calculus. Relational calculus is I think a restricted form of predicate logic.

The truth statements of prologs are similar to the rows of a relational table. (are there your tuples above?)

 employee ('Joe Doe', 1979, 'Department of Defense').
 employee_managed_by ('Joe Doe', 'Mister X').

The thing that RelationalCalculus? is missing are the rules of Prolog. It is straightforward to have a "directReports" relationship in a relational table--associating each employee with his/her boss--but deriving the "indirectReports" relationship (the transitive closure of the directReports relationship) is trickier. You can do it with RelationalJoin??s, but that's ugly. A rule in PrologLanguage? expresses this relationship much more succinctly.

This can be made into a PrologForMassiveData?. "

-- http://cs.adelaide.edu.au/~dwyer/TR95-10.html

" let transition->{'Cold','Warm';'Warm','Hot';'Hot','Warm';'Warm','Cold'}.

The braces enclose a set of 4 elements, separated by semicolons. Each element is an ordered pair of terms, separated by a comma. The relation is given the name `transition'. The two means used in this example--set formation and pair formation--are the only ways of building data structures in Libra. The members of a set are unordered, but pairs are ordered.

There are three ways this relation could be used:

    It could generate the four pairs of values.
    It could be used to test whether a pair of values satisfy the relation.
    Given the first term of a pair or pairs, it could give the corresponding second terms as a result. For example, given 'Warm', it could yield both 'Cold' and 'Hot'.

The first two ways of looking at relations are shared by sets in general. We may enumerate their members, or test elements for membership of them. The third view puts the relation in a more active role, and is described as `applying' the relation to an argument to yield one or more values as a result. This is analogous to the view we take of applying functions in a functional programming language.

LIBRA's relations are directional and cannot be reversed and are binary, but Oot will have reversible, n-ary relations

also generalize relation from 0,1 to arbitrary alphabet

okay if 'f x' is our syntax for 'apply f to x', then we want to get the same answer whether 'f' is a function, or a relation with the functional property. So, although the natural form of returning a relation application would be as a set, by default we return just a random member of that set.

if you want the whole set you can use application modes (the .apply attr)

kant's 3 quantities (was it "quantities"?) jive with/suggest 3 application modes: implicit map (forall, universal), pick one random thing (exists, particular), treat the set as one object (singular)

e.g. if X and Y are sets, then maybe X*Y in the forall mode means elementwise multiplication, in the exists mode means to select one element of X and one element of Y and multiply them, in the singular mode means matrix multiplication

when one thing can implement one interface multiple ways, the perspective chooses which way in this way the same operator really does mean different things in different contexts

note: to Chu-ish-ify the relations, the 'apply mode' also possibly can be changed to give a condition on the K (that is the Chu-ish lookup table valuation of each tuple potentially 'in' the relation). for instance, if we have false/maybe/true as 0,0.5,1, the mode might say "accept anything >= 0.5" or "accept anything >= 1".

i'm becoming partial to the 'root convention' by which a variable holding a net is essentially holding the root or 'scope lookup table' of that net, that is, a node which is a dict that maps to each other node by label, if applicable, or without IDs (e.g. as a set), if not.

the current lexical environment also specifies the assignment of operations (which are just methods in interfaces) to punctuation symbols, e.g. '*' is literally a variable whose value is the 'multiply' interface, and when you say 'a * b', it looks up the value of '*'.

imagine that we have the platform int representation of an int, and also the successor representations.

now imagine that we have an object that supports the int interface, but that also has some metadata attached (it supports the FooFrameworkMetadata? interface).

now imagine that we have an FFI. The FFI will think, "oh i know how to represent platform ints" and will just pass that. but that will be wrong, because it leaves out the metadata. in fact a representation is a generalization of 'serialization', so this is the same sort of problem as if the object has a __repr__ method that actually only prints out the int.

what we need to do is to represent that the platformInt representation and the successor representation are just two ways of seeing the same data (two perspectives?), but that the FooFrameworkMetadata? interface is providing different data.

I suppose one might say that an object's total informational content is given by the sum of all of the perspective equivalence classes supported by that object. E.g. in this case, the 'int' equivalence class, and FooFrameworkMetadata?.

interestingly, a perspective equivalence class is the same kind as a class of typeclasses, although it is different: if you constructively have a perspective equivalence class, you can convert any perspective in the class into any other, and furthermore any composition of such conversions that gets back to the same perspective it started at is an identity.

could you sometimes want to support adding multiple perspectives that are in the same equivalence class to the same object, but to keep their information separate?

what do you do when there are multiple isomorphisms between two interfaces?

how does implicit conversion work, if it works at all? when there are multiple typeclass instances that apply to an object for the same typeclass, and/or when multiple implicit conversions are possible, is the choice between them part of the data carried with the object, or part of the static lexical context?

so is FooFrameworkMetadata? interface vs. the int interfaces two different 'parts' of the objects, one of which has one perspective and one of which has two, or do we just have three perspectives?

maybe the latter; one can imagine objects whose perspectives all share some information with at least one other perspective forming a connected graph, but in which no two perspectives are isomorphic.

so we have the actual object, the thing-in-itself, the implementation, and then we have various perspectives on it, each of which are just projections of some subset of its information, then transformed in some way. e.g. the function from the object-in-itself to a perspective on it can be decomposed into a projection and then an injective transformation.

but we dont want to actually make the implementation special at this point; we just want to define homomorphisms between parts of the perspectives. the programmer need not know whether the platform int or the successor representation is 'more real'

however it would be useful for the language to have the concept of an isomorphism class of perspectives; and also of disjoint perspectives; even if in general perspectives can related in ways more complex than this.

one way to relate the perspectives is an event-driven/listenery/dataflowy way.

one can see the perspectives as just an ordinary layer of indirection on top of each graph. the purpose of this is to hide information so that e.g. a list can be considered to be jsut the enumeration over all arc coming out of its root node, when in reality there may be other metainformation, such as the type of the list, attached to this data also.

---

is boundary mode per-perspective?

can you layer perspectives, e.g. layer a boundary perspective beneath another perspective?

if a cross-perspective connection has been specified, then when one of the nodes changes, the corresponding nodes in the other perspective must change. they should by default opt to change lazily, however, e.g. only if they are queried. so oot must perform 'dirty checking' to maintain a dirty bit for each node in each perspective that is linked to other perspectives. Note that changes may be transitive; if perspective A is linked to B, and B to C, then if a user change in perspective A causes parts of B to be marked dirty, that may cause parts of C to be marked dirty; and a query to C later may cause B to be updated from A and then C to be updated from B.

need a way to give a 'path' that includes both a filepath and the name of a variable in a serialized savefile; e.g. in Python my thesis code is cluttered with junk like:

def create_topscoring_gene_lists_from_geneclusters_pipeline(startfile_mean_path, clustersfile_path, scores_norm_hv_outputfile_base_path, n=8): D = loadmat(startfile_mean_path)['D'] D_valid = loadmat(startfile_mean_path)['D_valid']

    clustersfile = cPickle.load(open(clustersfile_path))
    clusters = clustersfile['clusters']
    protos = clustersfile['colorProtos']

    ...only now does the real computation begin...

i guess this is a stdlib issue because i guess i could make a function myself which does this in Python.. mb i should

---

https://github.com/martinblech/xmltodict

---

implicit conversion:

single step implicit conversion implicitly (e.g. if Int has a Bool interface (do you need to mark that interface 'implicit'?), then 'if 0' and 'if 1' work like 'if False' and 'if True')

multi step only by request (with the syntactic 'implicit' operator), and you must specify the path between types unless all possible paths are marked 'commutative' with each other (a commutative clique of paths) or unless all possible preferred paths form a commutative clique of paths. e.g. there is one level of priority for transforms, 'preferred' or 'non-preferred'. mb 'preferred' should be the same as 'implicit', in which case we don't have to consider the non-preferred case.

btw here's how scala does it: http://www.scala-lang.org/old/node/130 http://stackoverflow.com/questions/5598085/where-does-scala-look-for-implicits (presents priority rules) http://pietrowski.info/2009/07/scala-implicit-conversion/ (claims that there is no chaining, and that implicits must be unambiguous?!?) http://tomjefferys.blogspot.com/2011/11/implicit-conversions-in-scala.html http://stackoverflow.com/questions/5332801/how-can-i-chain-implicits-in-scala (no chaining except via implicit params) http://stackoverflow.com/questions/9557649/scala-transitive-implicit-conversion http://suereth.blogspot.com/2011/02/slides-for-todays-nescala-talk.html http://stackoverflow.com/questions/6906742/scala-implicit-conversion-scope-issues

i would mb: disallow chaining (if that's good enough for scala it should be good enough for us), yes only have 'implicit' not 'preferred', probably not have 'implicit parameters' add another level of chaining (i dont understand that yet tho), and allow one implicit to take priority over another only if one is defined in the current module and the other isnt.

but then here's my problem: shouldn't casting an Int to a Bool be the same as using a Boolable interface? And surely interfaces should be chainable, to promote making small decomposed interfaces (e.g. Int -> Num, Num -> Ord, Ord -> Sortable)

also, what's the difference between defining an Int -> Num cast, and making an Int a subclass of Num? Is it whether or not we have mutable state? or is a subclass when we have a number of functions that define an implementation data representation (e.g. a Lock implemented by a file in /var/run)? do we even need implementation data representations at all?

so, it seems we need to unify 3 things: dependent interfaces, implicit casting, subclassing. with a particular eye to transitivity.

also clearly interfaces should be first-class :)

hmm, i guess it's very important for interfaces to be chainable/transitive. so we gotta have that.

regarding the differences between OOP 'classes' with 'implementations' and typeclasses/interfaces: i think the difference is that in an OOP class, the various methods in the class are all making assumptions about the format of the internal data that is actually storing the stuff.

For example, let's say you have an object representing 8 bytes of data. It exposes an interface like this: read(byte): value, write(byte, value), where 'byte' is an int between 0 and 7 inclusive, and 'value' is an int between 0 and 255 inclusive. Now, you want this to expose an Int64 interface. But there's actually many two ways to do it, for example: big endian and little endian.

if that's true, then one might say that the difference is EXACTLY when there is more than one way to implement a given interface on a piece of data that already satisfies.

However, i still think there might be times when you don't want an interface automatically applied even if it is unique; that is, when it isn't a 'natural' part of the semantics of the original data, when it feels more like a transformation. E.g. each function can be associated with its Godel number integer encoding, but if you have a function and you add 1 to it, you probably should get a compiler error rather than it changing to whatever function has the next Godel number.

What about subclassing? I can imagine that you might want to define arithmetic on the Int64 defined above, calculated in an optimized manner that makes use of knowledge about the internal storage format; even though in theory you could define it all in recursive definitions using only zero and successor pattern matching, which can be implemented based on just the standard Int64 interface. Again, these optimized arithmetic ops (not the successor-based ones) will need to know about whether the internal storage is big endian or little endian.

So you have a superclass like 'little endian Int64' and a subclass 'little endian Int64 with arithmetic'. First note that these are what i've been calling 'implementations'. Second, one might just consider the subclass's OOP relation to its super as one defined just by inheritance; it presents a new interface (arithmetic) but that can be done without OOP. What i mean by "defined just by inheritance" is that one could consider the subclass to be a totally separate class than the super, but the inheritance is a synctactic shortcut to copy a bunch of code. However, this is missing something; if you have a 'little endian Int64 with arithmetic', and you have a function that wants a 'little endian Int64' (e.g. this function must be another implementation-dependent optimization), then you can pass the 'little endian Int64 with arithmetic' right in; you don't have to convert it, or to make the function go thru a public interface.

So in addition, subclassing adds some extensibleness/modularity to the process of writing optimized code for interacting with an object's internals.

Note that not allowing a function in another module to demand a 'little endian Int64' means that only functions in the original defining module can contribute efficient implementations of data operations (only composition is allowed, not inheritance). But allowing this goes against my doctrine of 'everything is an interface' and clears the way for ByteString?-demanding parser libraries to spring up. How to fight this?

Note that this business of having subclasses in addition to interfaces because subclasses are allowed to know about the internal data representation is like protected methods and fields.

this ties into the philosophy in that if the class is the aquinas 'essence', that's like saying the internal data format is like kant's 'thing-in-itself'.

and as usual, the reason that multiple inheritance may be confusing is that the two parent classes may use the same name for some protected field or method, which may or may not be desirable.

hmm.. to solve the multiple inheritance problem, the compiler could simply check for overlapping names, and require the inheriting programmer to specify, for each one separately, whether it is to be overlapping (accept this into the namespace even though others are also using the name) or disjoint (rename all occurences of the name in the inheriting superclass; like a hygenic macro)

also, to prevent a culture of overly diverging from everything-is-an-interface, could do: (a) every subclass method is either protected or an implementation of a public interface method (b) if you don't at least mention at least one protected method (remember fields are properties so are the same sort of thing as methods in oot) in your subclass, then it can't be a subclass method, it must be an interface method, (c) only methods attached to an object can be part of the subclass.

hey, i guess that if the perspectives are just one level of indirection, one step down from the root, this can easily be generalized to saying that a perspective is a selection of the root node! now we can have perspectives within other perspectives! and the real root is the meta node! (it's not necessarily a tree, but every node can be reached traveling from the meta node)

---

ok, two things:

views, where the network structure of a data element is changed, yet the idea is that there is a 'thing-in-itself' which has a homomorphism to each view. However, this 'thing-in-itself' may be a product of multiple implementations, as when a framework adds meta-data to an underlying data object
perspectives, which are decisions about which typeclass instance will be used to implement a particular ad-hoc operation when that operation is applied to this object. this is needed to allow you to have a data object, x, and pass it to Subroutine1, which will use the addition ('+') operator on it, where you want addition to invoke AdHocAdditionInstance?1; and then later to pass the same data object x to Subroutine1 again, but now when Subroutine1 uses addition you want that to invoke AdHocAdditionInstance?2.

one might give 'perspectives' a different name to make the names easier for people to remember. one might identify views with structural types and perspectives to nominal types, b/c the ad-hoc (typeclass instance) resolution should still happen at compile time when possible. however i still like the idea of being able to give a name to a structural type pattern and to have the type inference algorithm not create union types by itself, e.g. rely on user naming to control complexity.

so note that the choice of ad-hoc instance is made by the person supplying the data and is carried with the data, it is not made at the use site where the ad-hoc operator is called.

for functions that take multiple arguments one has to determine which argument controls the choice of perspective. One idea is to just have a simple rule like 'the leftmost one', and another idea is to allow the programmer to determine it somehow.

---

" I'm not sure the "Relational Model" is even a great conceptual model for common business or analytical processing. I think SQL is ugly because the model is a poor fit; if the model was solid, a clean syntax would follow naturally. As for alternatives, MDX has superseded it for core "pivot table" analytics. For application processing, it seems modern document/hierarchical databases are good for many transactional needs. "

upvote

clarkevans 1 day ago

link

I'm not sure the "Relational Model" is even a great conceptual model for common business or analytical processing. I think SQL is ugly because the model is a poor fit; if the model was solid, a clean syntax would follow naturally. As for alternatives, MDX has superseded it for core "pivot table" analytics. For application processing, it seems modern document/hierarchical databases are good for many transactional needs.

What modern relational databases have... is a very intuitive abstract storage model. That's awesome. We've been working on an alternative "navigational model" (inspired from CODASYL and ORMs) based on this storage model. Our experimental query language is at http://htsql.org

Our critique of SQL is at http://htsql.org/doc/overview.html#why-not-sql

upvote

koolkao 1 day ago

link

I have found SQL to be cumbersome for expressing temporal relationships, eg find all the event As that happen within one week of event B. There's not necessarily a data schema link between the table for event A and the table for event B.

how does htsql do with this?

upvote

clarkevans 1 day ago

link

Let's say you wish to list all students in a university, and then, show all other students with the same birth year and month. So, you'd start with ``/student`` and then you could select all columns using the asterisk. Next, you could use the ``fork()`` operation to join to the same table based on a set of columns that are correlated.

/student{, /fork(year(dob),month(dob))}

http://demo.htsql.org/student%7B*,%20/fork%28year%28dob%29,m...

You could wrap this up in a definition to make it a bit easier to understand:

/student .define(students_same_month := fork(year(dob),month(dob))) .select(, /students_same_month )

http://demo.htsql.org/student%0A.define%28students_same_mont...

With the more general case, let's say you're interested in listing per semester, which students started during that semester. In this case, you start with ``/semester`` and then define the correlated set, ``starting_students`` which uses the ``@`` attachment operator to link the two sets. Then, you'd select columns from semester and the list of correlated starting students.

/semester .define(starting_students:= (student.start_date>=begin_date& student.start_date<=end_date)@student) {, /starting_students }

http://demo.htsql.org/semester.define%28starting_students:=%...

Typically, you'd include the new link definition, ``starting_students`` in your configuration file so that you don't have to define it again... then the query is quite intuitive:

/semester{, /starting_students }

While this all may seem complex, it's not an easy problem. More importantly, HTSQL has the notion of "navigation". You're not filtering a cross product of semesters and students, instead, you're defining a named linkage from each semester to a set of students. The rows returned in the top level of query are semesters and the rows returned in the nested level are students.

upvote

koolkao 1 day ago

link

Thank you for the examples.

I made some queries that are analogous to my temporal query needs

Here I'm looking for every student, other students with DOB within 2 weeks of the given student: http://demo.htsql.org/student.define(similar_students:=%20(s...

Here for every semester with at least one student, I find the oldest and the youngest students enrolled: http://demo.htsql.org/semester%20.define(starting_students:=...

No being a computer scientist I have to admit I do not appreciate the intricacies of the 'problems with SQL' blog entries. But working with htsql I gotta say it seems a lot more intuitive than SQL. It feels like the logic correspond much better to my mental model. And that there is much less of the jumping up and down the code to nest my SQL code logic that I find myself doing all the time.

Is there a way to install this on a PostgresQL? instance on Win8?

upvote

clarkevans 7 hours ago

link

HTSQL requires Python 2.7 and psycopg2 binary. As long as those install, I don't see any problems. We have tested it with older versions of Windows without any issues. If you run into a problem, you could report an issue: https://bitbucket.org/prometheus/htsql/issues

upvote

dragonwriter 1 day ago

link

The two main meanings of NULL are "Does not exist" and "Exists but is unknown", though there are more.

upvote

epo 1 day ago

link

NULL is a marker that means this data value does not exist in the database.

Your second, and any other 'meanings' are a misuse of NULL, which is your fault, not SQLs

upvote

dragonwriter 1 day ago

link

> NULL is a marker that means this data value does not exist in the database.

Both of the uses I stated are data values that do not exist in the database.

> Your second, and any other 'meanings' are a misuse of NULL

No, they aren't. They are different, and substantively different, reasons why data does not exist in a database. One is the value does not exist in the domain of which the DB is model, and one is that the value exists in that domain but it is not known.

(Others have pointed out the additional case of "we do not know if it exists", which is ambiguity between those two cases.)

These two substantially different meanings of NULL were recognized fairly early in the history of the relational model by the creator of the model (E.F. Codd), who proposed refining the model by having separate markers for each of those two cases.

upvote

ams6110 22 hours ago

link

It doesn't exactly mean that the value does not exist, it's that it's unknown whether it exists. NULL means nothing at all, not the absence of a value (though applications will often abuse it to mean that, and pragmatically that's usually OK).

upvote

einhverfr 1 day ago

link

NULL can mean a bunch of things:

1. Value is unknown

2. Value does not exist (for example, outer join results).

These have ery different implications. For example:

somestring

does_not_exist you would think would equal somestring, but with unknown we dont know.

reply "

upvote

rhizome31 1 day ago

link

Another thing to understand is how NULLs are handled in different contexts, using either the familiar two-valued logic or the more exotic three-valued logic. It's kind of messy but really worth knowing if you're working with SQL. The wikipedia page actually gives a pretty good account of the issue: https://en.wikipedia.org/wiki/Null_%28SQL%29

upvote

zamalek 1 day ago

link

The way I see SQL in my mind is programming Venn Diagrams ( http://en.wikipedia.org/wiki/Venn_diagram ) followed by creating a projection that uses one or more of the areas within that diagram (contrasting strongly with imperative languages, i.e. logic programming). Unfortunately it's not straight-forward to do that because SQL is a superset of Venn Diagrams, but that line of thinking is where you need to be.

upvote

ams6110 1 day ago

link

Yes, exactly. To use SQL well you need to think in terms of sets and set operations. Venn Diagrams are a good idea to help visualize this. If you find yourself writing a lot loops over cursors there's a good chance you're doing it wrong.

upvote

lukaseder 1 day ago

link

Check out http://www.jooq.org, by the same author as the article (me)

http://htsql.org/

---

TODO:

four fundamental forms, realizations in graphs:

relations functions tables (non-graphical???) orders

these can each be realized in hypergraphs:

relations: a hyperedge with nodes a, b, and c represents the tuple (a,b,c) in the relation functions: a --c--> b (an edge from a to b labeled c) represents a(c) = b functions, multiple arguments, curried: a--c-->b b--e-->d (an edge from a to b labeled c, and an edge from b to d labeled e), represents a(c) = b and b(e) = d, which is the curried form of a(c,e) = d (and, b is the partially instantiated version of uncurried a, with first argument c) tables, which we identify with: functions, multiple arguments, uncurried (is this identity a good idea?): a--(c,e)-->d (an edge from a to d labeled (c,e)) represents a(c,e) = d orders: 0 -> 1 -> 2 -> 3 represents a total ordering on {0,1,2,3}

note:

relations can be extended to Chu spaces, without the crucial continuity constraint, by labeling the edge with its Chu lookup table entry; or Chu spaces can be represented as tables
the table representation above may be good for sparse tables but a traditional non-graphical table representation is better for dense ones
orders can be extended to partial orders using Hasse diagrams

these can also each be represented in each other (except for orders; embedding the others within orders seems unnatural to me):

representations as relations:

a function is a relation with the functional property. this also covers tables, which can be represented as functions
partial orders can be represented by the <= relation

representations as functions:

a relation can be represented by a function from tuples to 1,0 (and a Chu space by a similar function with a larger codomain)
a table can be represented by a function with multiple arguments
a partial order can be represented by the functional version of the <= relation

representations as tables: same as for functions

note: perhaps preorders (drop the antisymmetric condition/allow cycles) should be used here instead of partial orders

---

avoid this:

up down 4 adam at caucho dot com 7 years ago Note: according to the spec, PHP's comparison operators are not transitive. For example, the following are all true in PHP5:

"11" < "a" < 2 < "11"

As a result, the outcome of sorting an array depends on the order the elements appear in the pre-sort array. The following code will dump out two arrays with *different* orderings:

<?php $a = array(2, "a", "11", 2); $b = array(2, "11", "a", 2); sort($a); var_dump($a); sort($b); var_dump($b); ?>

This is not a bug report -- given the spec on this documentation page, what PHP does is "correct". But that may not be what was intended...

http://us.php.net/language.operators.comparison#65863

need stuff like boundaries and annotations.

use case: need to be able to reduce (compile) the form of some expression in the AST, but annotate each reduced bit with the corresponding part of the original expression

one simple, inefficient strategy for keeping perspectives consistent is to convert data implemntation each time the data is changed from a perpective. e.g. if you do x = 3 + 7 and then y = S(x), actually behind the scenes run the conversion from integer representation to successor representation, and then if z = y + 2, convert back

there is an algebraic structure on the operations that transition between representations.

sometimes you can/must apply a chain of operations to get to a new representation. e.g. you can reify a hypergraph, but then you can reify the reification, and the result is a different representation.

even for different representations that are isomorphic (such as reifications), they form a non-free category, that is to say, that different chains of operations can get you always the same representation, for example, if you reify twice then anti-reify twice, you are back where you started

if you represent a transitive antisymmetric reflexive relation as a poset, without all of the transitively implied edges (e.g. <= on Nat is just 0 -> 1 -> 2, we omit 0 -> 2 b/c it is transitively implied) (a Hasse diagram), and you treat that poset as an arbitrary graph, then there are graph operations which make it no longer a poset, e.g. if you add a cycle of edges, then it can only be a preorder, and in addition, by transitivity maybe you should add edges of opposite direction to the cycle. in general we should consider whether a representation-changing operation (a morphism) is an isomorphism or merely a homomorphism w/r/t various codomains. e.g. a transitive antisymmetric reflexive relation, represented as a graph, with the codomain as 'graph', is only homo b/c you can add edges that make it cyclic, but if you think of the codomain as Hasse diagrams then it is iso. Similarly, the reification of a non-reifiable, non-hypergraph graph, with the codomain seen as a non-reifiable, non-hypergraph graph, is homo but not iso (this homo is an injection but not a surjection, e.g. its an embedding).

the point here is that operations (homomorphisms) that transition between on representations may not be injective (e.g. when one representation is a projection of another) but even when they are injective, they also may not be surjective; and in the latter case, if you allow the second representation to be arbitrarily manipulated, respecting only the constraints of the codomain, you might alter the data object in a way that cannot be represented within the first representation, so you have a problem.

besides the poset example, consider an data object which is 'really' a natural number, but that has a float representation (a non-surjective passage between representations). if, using the float representation, the program code changes it to a fraction or a negative, what should the program's behavior be? should an exception be thrown? should the float representation keep this extra information, while setting the integer representation to zero? should the float representation and the intger representation be set to zero?

so, we need to think about at least the following three issues: (1) representations may be related by a chain of transformations, not just single transformations; the transformations form a category 2) transformations between representations may not be injective, in which case some representations have more information than others. Should be require that there is always at least one representation (perhaps the 'meta' representation) that has all the information, e.g. from which there is an injective map to any other representation? (3) transformations between representations may not be surjective, in which case operations will be available on the transformed representation that don't correspond to any operation in the original representation. If such operations are applied, should we (a) throw an exception, (b) simply have a non-injective mapping back to the original representation, and allow the transformed representation to hold onto the extra information (in which case we have to specify what happens if then the data is altered again using the original representation; e.g. does this cause the extra information in the other representation to be lost, as if the last representation to be used is always the 'real' form of the object, which is converted after the fact? what if an object is passed to a subroutine that only takes a superclass of the object; do we lose all of the content in the extra fields in the object? it seems that the answer should be no, but then how long do we hold the extra info? what if the object is copied within the superclass-using subroutine, instead of mutated?) (c) immediately take a round-trip to the smaller representation and then back to the current one, immediately erasing the extra info

-- note that in order to determine if something is surjective or not, you must specify the codomain. this is also the information that allows you to consider transformations as morphisms in a category (because you know what the targets are)

if a mapping between views is injective but not surjective, it could be like the mapping of the integers into the reals, or it could be like the mapping of ints into pairs of ints (and are there other possibilities too?).

the strongest desirable condition might be for each mapping between views define a subset of each view such that, within that subset, the mapping is an isomorphism.

but then we can't do the mapping of ints into pairs of ints that maps the int to the first (or second) element of the pair, and the other element stays the same as the first changes.

we could let each view apply each change made to the value in order. but then you might have one view which is path dependent, that is, which remembers the history of changes, which seems more expressive than we would like (because now the views aren't really just different ways of looking at the same value).

so how about this:

for each pair of views A and B, there must be a function f that tells Oot how A changes when the value is changed in B; this function takes (1) a previous value in A, and (2) the new value in B, and returns the new value in A, or throws an exception if the result does not correspond to any value in A (e.g. if A is Int and B is Real and B changed from 2 to 2.5). f must be pure. I say "(1) A previous value in A" instead of "(1) THE previous value in A" because the previous value in A may be any value since the last time that an operation was applied in view A, e.g. if a series of operations were applied in other views, any of those resulting A values may be given to f.

in addition, to speed things up in certain cases, optionally there could be provided: a pure function (g) which takes (1) a previous value in A, (2) the previous value in B, (3) the new value in B, and returns either True if the change in B is permitted, or False otherwise.

f CAN try to remember change history, but only by explicitly encoding it into A. However, there is no guarantee that f (or even g) will be called upon each change; by default it is (in order that the check that the new value in B corresponds to some new value in A), but the programmer can suppress this in order to save time (in fact maybe this should be the default, and the calling every time behavior should be optional).

note that in order to have views that are, say, only the second element of a tuple, you really want to have parametric views, e.g. to be able to say, "let view 'bzc' be a view which maps the third element of this list to a scalar", where "maps the n-th element of this list to a scalar" is a library function. But to save time typing you also want default views, e.g. to be able to say "x@REAL" without first saying "let view 'REAL' be a view which maps this value into a real", assuming that REAL is a data type and that there are conversion constructors to real from the current data type, and vice versa.

actually, with views/perspectives, i see no reason why oot CAN'T also provide low-level C-style interaction. The actual in-memory representation of data can be just another view on it.

note that different views of the same object are of different types

motivation for views:

1) we want one object to have multiple interfaces; a Dict is also a List of tuples 2) but we also believe, Clojure-style, that it's better to have one or a few fundamental data types with common operations on them, e.g. better to have only a Sequence interface than Lists and Stacks and Queues, and have methods for each of .push, .pop, .enqueue, .dequeue, .insert, .delete, .getAtIndex. 3) and the same object may implement an interface in multiple ways, e.g. a Dict is a List of pairs, but it's also a List of values (e.g. sometimes you want to map over the pairs in a dict, and sometimes you want to map over the values; and sometimes you want to treat the keys as a list, too) 4) the std prescription is to provide methods on a dict that return the list of keys, list of values, list of pairs, etc. But we want to modify the dict itself through at least some of these representations (e.g. map over the list of pairs to product a new dict). 5) the std prescription is to create another object with side-effects on the first one. but we want make such linkages more explicit.

how to represent stuff like git in oot data?

just as a pattern match annotates data with the part of the pattern that matched it, different versions of data annotate each other, matching up the corresponding parts of the data.

this sort of correspondence mapping is also what lets us see which part of source code corresponds to which part of object code

so, this sort of 'annotation' or 'mapping' of data should be quite fundamental to oot.

the way to do this is to have different views; e.g. one view (or set of parameterized views) of the data shows how it corresponds to other versions

parameterized views suggests that views themselves are first-class (they are nodes themselves) and that view lookup of each type of view is done thru a factory, which is a smart node (stateful OOP object) which acts as a function that can take multiple parameters, keyword parameters, etc, and that view lookup in general is a factory/array which somehow indexes over the types of views

or something like that

i mention this only because it shows that the terminology "derived view" is easily understood:

"So long as Urbit is always the primary state and Unix is only a derived view, this works great."

Adding a string type seems like a very simple improvement to nouns. In fact it would be a disaster. For instance, what character set are the strings in? Unicode? Great. You've just taken a data model defined in three sentences, and expanded it to a shelf-sized manual set. If you have a shelf that strong, why not just start with VMS? Or, of course, you could include some simple, broken, built-in string type, in which case you're making the same mistake that VM/370 made with its record formats: inserting a low-level facility which higher layers will (or should) work around, not with. " -- http://moronlab.blogspot.com/2010/01/urbit-functional-programming-from.html

see my meta notes with reifiable graph basis set

getNodeWhichIsTargetOfEdgeOf_withLabel, getNodesWhichReifyEdgesWhichHaveSource_Target (homset), keys, zero, successor, 'terminator', topnodes

ah, my views seem to be a generalization of Wadler's views

http://homepages.inf.ed.ac.uk/wadler/papers/view/view.ps 1987

(Wadler's views are isomorphisms)

his paper says that another thing called views which are different from his are the homomorphisms between modules in OBJ2

from meta.txt:

reiGets(n, k), content(n), topnodes, 'anonymous'

""" ... reiGets(n, k), content(n), topnodes, 'anonymous'

content(n) is the set of nodes n2 such that n \in get(n2, 'contentOf') topnodes is the set of nodes n such that get(n, 'contentOf') = {} reiGet(n, k) is the set of pairs (n2, es) of the form (nodes, {nodes}), such that for each e in es, get(e, 'source') = n, get(e, 'target') = n2 and (either get(e, 'key') = k or k = 'anonymous')

from this we can formulate gets, and from that plus the naturals and a terminator, we can formulate enumeration, from which which can formulate an enumerated gets and a matching enumerated rei, which are more intuitive to work with """ -- meta.txt

an older version (which i no longer like) but which i'm including because it has an interesting note on some library functions that should be in Oot:

"""

getNodeWhichIsTargetAlongKey(n, k, le), keys(n), rei(n,k), 'zero', successor, 'terminator', topnodes

so for A--e_01---e_1---e_11-->B,

getNodeWhichIsTargetAlongKey(A, '0', 'target') = B keys(A) = {0} (equivalent to a node with one edge, to 0) rei(A,0) = e_1 keys(e_1) = {'source', 'target', 'label'} getNodeWhichIsTargetAlongKey(e_1, 'source', 'target') = A getNodeWhichIsTargetAlongKey(e_1, 'target', 'target') = B getNodeWhichIsTargetAlongKey(e_1, 'label', 'target') = '0' rei(e_1,'source') = e_01 rei(e_1,'target') = e_11

much better, yes? now if you have 'nonstandard edges', e.g., Christine is Alice's child, represented as subject=Christine, verb=childOf, object=Alice, which is at key '0' at Christine, you can do:

getNodeWhichIsTargetAlongKey(Christine, '0', 'verb') = childOf getNodeWhichIsTargetAlongKey(Christine, '0', 'object') = Alice keys(Christine) = {0} (equivalent to a node with one edge, to 0) rei(Christine,0) = e_1 keys(e_1) = {'subject', 'verb', 'object', 'label'}

and the later on we can introduce operations to transform the graph into one in which only edges with verb 'childOf' are selected, e.g. a 'parent' graph, etc, or one in which all edges with verb 'childOf' are selected are grouped into one set, which is then the target of an edge with label 'childOf'. """

bayes nets

oot variables could have a 'metaness' field that tells you whether to interpret the payload as data, pointers, reified edges, etc, according to my meta notation scheme

e.g. this could implement aliases

of course if the client accesses such a node from a sufficiently high meta-level vantage point, then they get the pointer itself instead of auto-resolving the alias

one way of handling the question of whether to use platform arithmetic primitives (and if so what to do about overflows and/or overflow checking) or arbitrary arithmetic is: use arbitrary arithmetic unless either the programmer guarantees via the type system that the result will be small, or unless they use a special modular arithmetic library

"The basic objects of mathematical thought ex ist only as mental conceptions, though the source of these conceptions lies in everyday experience in manifold ways, in the processes of counting, ordering, matching, combining, separating, and locating in space and time" -- http://logic.harvard.edu/EFI_Feferman_IsCHdefinite.pdf

http://math.stanford.edu/~feferman/papers/Conceptual_Structuralism.pdf

" The general ideas of order, succession, collection, relation, rule and operation are pre-mathematical; some implicit understanding of them is necessary to the understanding of mathematics. • The general idea of property is pre-logical; some implicit understanding of that and of the logical particles is also a prerequisite to the understanding of mathematics. The reasoning of mathematics is in principle logical, but in practice relies to a considerable extent on various forms of intuition. "

" Conceptions of Sequential Generation • The most primitive mathematical conception is that of the positive integer sequence represented by the tallies: I, II, III, ... • Our primitive conception is of a structure (N + , 1, Sc, <) • Certain facts about this structure are evident (if we formulate them at all): < is a total ordering, 1 is the least element, and m < n implies Sc(m) < Sc(n). "

" Open-ended Schematic Truths and Definite Properties • At a further stage of reflection we may recognize the least number principle: if P(n) is any definite property of members of N + and there is some n such that P(n) then there is a least such n. • The schema is open-ended . What is a definite property? This requires the mathematician’s judgment. • The property, “n is the number of grains of sand in a heap” is not a definite property. • What about the property, “GCH does not hold at n”? "

" Reflective Elaboration of the Structure of Positive Integers • Concatenation of tallies immediately leads us to the operation of addition, m + n, and that leads us to m × n as “n added to itself m times”. • The basic properties of the + and × operations such as commutativity, associativity, distributivity, and cancellation are initially recognized only implicitly. • One goes on to the relations m

number”, m ≡ n (mod p), etc. • Soon have a wealth of expression and interesting problems (primes, perfect numbers, etc., etc.) "

n, “n is a prime

" Truth in Number Theory • The conception of the structure (N + , 1, Sc, <, +, × ) is so clear that there is no question in the minds of mathematicians as to the definite meaning of such statements and the assertion that they are true or false, independently of whether we can establish them one way or the other. • N + is recognized as a definite totality and the logical operation ( ∀ n ∈ N + ) P(n) is recognized as leading from definite properties to definite statements that are true or false. • In other words we accept realism in truth values , and the application of classical logic in reasoning about such statements is automatically legitimized "

" Further Reflection • Further reflection on the structure of positive integers led to the structure of natural numbers (N, 0, Sc, <, +, × ), then the integers Z and the rational numbers Q. Though not basic conceptions we are no less clear in our dealings with them than for the basic conception of N + . • More advanced reflection leads to the general open-ended scheme of proof by induction on N , P(0) ∧ ∀ n[P(n) → P(Sc(n))] → ∀ n P(n). • That is then used to justify definition by recursion on N . "

" The Unfolding of Arithmetic • There is a general notion of unfolding of open- ended schematic systems . (Feferman, 1996) • The unfolding of a basic schematic system for the natural numbers is equivalent in strength to predicative mathematics (Feferman, Strahm 2000). • But beyond that, the scheme of induction ought to be accepted for any definite property P that one will meet in the process of doing mathematics. "

" Multiple Sequential Generation • Finite generation under more than one successor operation Sc a where a is an element of an index collection A. • We may conceive of the objects of the resulting structure as “words on the alphabet A”, with Sc a (w) = wa in the sense of concatenation. • In the case that A = {0, 1} we also conceive of the words on A as the finite paths in the binary branching tree "

" For those not familiar with Twisted, a quick explanation of the difference between transports and protocols is in order. At the highest level, the transport is concerned with how bytes are transmitted, while the protocol determines which bytes to transmit (and to some extent when).

...

The most common type of transport is a bidirectional stream transport. It represents a pair of streams (one in each direction) that each transmit a sequence of bytes. The most common example of a bidirectional stream transport is probably a TCP connection. Another common example is an SSL connection. But there are some other things that can be viewed this way, for example an SSH session or a pair of UNIX pipes ...

There general notion of transport and protocol includes other interfaces, where the transport wraps some other communication abstraction. Examples include interfaces for sending and receiving datagrams (e.g. UDP), or a subprocess manager. The separation of concerns is the same as for bidirectional stream transports and protocols, but the specific interface between transport and protocol is different in each case. "

--- i like this C notation for structs:

struct X { int a; const char * s;}

struct X x = { .s = "the answer is", .a = 42 };

---

by default most C compilers align the items in structs to 4-byte boundaries ("words") because reading and writing words to memory is almost 10x more efficient. However, C does not automatically reorder the items in a struct to minimize the amount of padding (and hence memory size of the struct) needed.

in C, chars are always 1 byte. ints are often 4 bytes but this is not formally defined. on a 64-bit machine, pointers are 8 bytes

---

since the point is that we're like lisp with dicts instead of lists, where function arguments can be keywords as well as positional, if we also want currying syntax (which i think we do) then this becomes more complicated.

because f(keyword1=a, keyword2=b) == f(keyword2=b, keyword1=a)

so ((f keyword1=a) keyword2=b) == ((f keyword2=b) keyword1=a)

that is to say, f[keyword1=a][keyword2=b] == f[keyword2=b][keyword1=a]

so the language probably needs to understand at least enough category theory to express the latter condition

this also solves the 'table problem', e.g. where we have something like a dataframe or table, where you can specify first the column, then the row, or first the row, then the column, and get the same table entry in the end.

however, the named argument thing also fits in with hyperarcs in the hypergraph. e.g. node f's hyperarcs might have are labels source, keyword1, keyword2, target (generalizing source and target), and there might be an arc adjacent to nodes f, a,b,3 respectively (if f(keyword1=a, keyword2=b) == 3).

of course, currying partial function application syntax has an interpretation in this model too (there is a regularity condition on the graph that states that if there is an edge like the one above, there is also one with labels source, keyword2, target, for which if you traverse it from source to target, there will be an arc with keyword1, and then traverse that, then the results will be equal to just traversing the arc with both keyword1 and keyword2).

also, this model does not correspond to the simpler model for unary functions in which the source is the function, the edge LABEL is the argument, and the target is the result. otoh the hypergraph model is perhaps better in that it puts all of the function, its arguments, and the result on the same 'level' of the graph (nodes, not labels); the labels are reserved for things which are meta in some sense (the concepts of source and target, and the keywords).

probably the way to extend the simpler model to handle non-unary function with named keyword arguments is to have the label node itself be a node which is a dict from all of the keyword args to their values for this edge.

so currently i am leaning towards this hypergraph model. we will still provide facilities for working with graphs specifying functions using the simpler model, however, and we will still provide reifiable graphs so that labels can be easily accessed as nodes in themselves.

http://blaze.pydata.org/dev/overview.html

---

great stuff:

http://sahandsaba.com/thirty-python-language-features-and-tricks-you-may-not-know.html

---

so from a node in a graph you can get the meta-node for that entire graph.

but from that meta-node you could not only get the meta-node for THAT graph, but you could also get to the 'class' node for the class 'graph' and to the subnode(?) that defines the meta-node for that class.

---

similarity between list comprehensions and the white wolf Triat: create, transform, filter (note: filter step is performed AFTER transform).

generalize this beyond lists (Python also has set comprehensions, dict comprehensions). is it good enough to have a 'graph comprehension', or do we need a general 'X comprehension' that can be ad-hoc customized for each data type? i'm thinking the latter (or both, actually) but not sure.

ternary CAMs are like associative arrays (where lookups occur in parallel, e.g. O(1) in the # of entries in the CAM, although the time may increase with key length i guess) which allow you to give a mask with a key upon lookup, and which return every entry whose key matches the unmasked part of the key.

http://codingrelic.geekhold.com/2011/06/hash-tables-versus-cams.html

cursors for immutable data structures (e.g. to create an iterator out of an immutable list)

idea for how to do tables/dataframes:

consider a 2-D table. This is like a dict where the label of each item is a pair. In addition, you can use '*' to access rows and columns, e.g. (3,3) is the item at row 3, column 3, and (*, 3) is column 3. (the use of '*' is like array slicing).

since we in fact have dataframes instead of tables, what we have is labels that are dicts, not pairs. so e.g. in the above example, if the two dimension names are 'a' and 'b', the label of item (3,3) is really (a:3, b:3) although (3,3) is an allowed shortcut, just like you can call a function that takes keyword arguments positionally (note: this implies that the definition of argument lists is an important type of thing, see below).

the above implies that the definition of argument lists is an important type of thing, e.g. you need to say that the argument list for a function has keyword names [None, None, a=3,b=4], and also similar things for dataframe dimension names, except for dataframe dimension names there may not be default elements? or maybe there should be... hmmm....

should this be a type of Pattern, or something else, or just an ordinary graph without special syntax?

are array/table/dataframe slices their own type of type? do they have special syntax? are they just patterns?

approx representations of data types, i.e. int32 to represent integers even though it cannot handle arbitrarily large integers? using single-precision floating point to represent reals even though there's roundoff error?

---

writing the bit in Why Oot made me realize: the idea of Views is a kind of shape-based (homomorphism-based), rather than verb-based (function signature/API-based) ad-hoc polymorphism.

http://blog.safaribooksonline.com/2013/08/05/clojure-data-uniformity/

---

https://github.com/milessabin/shapeless

http://typelevel.org/

http://www.chuusai.com/2011/12/19/shapeless-preview/

---

a directed arc in a graph can, in general, be seen only from its source node; you have to make a special data type assertion to get a datastructure that allows backlinks.

---

http://wozniak.ca/what-orms-have-taught-me-just-learn-sql

" I've found myself thinking about the database as just another data type that has an API: the queries. The queries return values of some type, which are represented as some object in the program. By moving away from thinking of the objects in my application as something to be stored in a database (the raison d'être for ORMs) and instead thinking of the database as a (large and complex) data type, I've found working with a database from an application to be much simpler. And wondering why I didn't see it earlier. "

but Oot should excell at 'mapping' stuff. So this should make ORMs possible.

problems with ORMs:

Partial objects, attribute creep, and foreign keys

Perhaps the most subversive issue I've had with ORMs is "attribute creep" or "wide tables", that is, tables that just keep accruing attributes. As much as I'd like to avoid it, sometimes it becomes necessary (although things like Postgres' hstore can help). For example, a client may be providing you with lots of data that they want attached to reports based on various business logic. Furthermore, you don't have much insight into this data; you're just schlepping it around.

This in and of itself isn't a terrible thing in a database. It becomes a real pain point with an ORM. Specifically, the problem starts to show up in any query that uses the entity directly to create the query. You may have a Hibernate query like so early on in the project.

query(Foo.class).add(Restriction.eq("x", value))

This may be fine when Foo has five attributes, but becomes a data fire hose when it has a hundred. This is the equivalent of using SELECT *, which is usually saying more than what is intended. ORMs, however, encourage this use and often make writing precise projections as tedious as they are in SQL. (I have optimized such queries by adding the appropriate projection and reduced the run time from minutes to seconds; all the time was spent translating the database row into a Java object.)

Which leads to another bad experience: the pernicious use of foreign keys. In the ORMs I've used, links between classes are represented in the data model as foreign keys which, if not configured carefully, result in a large number of joins when retrieving the object. (A recent count of one such table in my work resulted in over 600 attributes and 14 joins to access a single object, using the preferred query methodology.)

" Data retrieval

Knowing how to write SQL becomes even more important when you attempt to actually write queries using an ORM. This is especially important when efficiency is a concern. "

Dual schema dangers

This one seems to be one of those unavoidable redundancies. If you try to get rid of it, you only make more problems or add excessive complexity.

The problem is that you end up having a data definition in two places: the database and your application. If you keep the definition entirely in the application, you end up having to write the SQL Data Definition Language (DDL) with the ORM code, which is the same complication as writing advanced queries in the ORM. If you keep it in the database, you will probably want a representation in the application for convenience and to prevent too much "string typing".

I much prefer to keep the data definition in the database and read it into the application. It doesn't solve the problem, but it makes it more manageable. I've found that reflection techniques to get the data definition are not worth it and I succumb to managing the redundancy of data definitons in two places.

But the damn migration issue is a real kick in the teeth: changing the model is no big deal in the application, but a real pain in the database. After all, databases are persistent whereas application data is not. ORMs simply get in the way here because they don't help manage data migration at all. I work on the principle that the database's data definitions aren't things you should manipulate in the application. Instead, manipulate the results of queries. That is, the queries are your API to the database. So instead of thinking about objects, I think about functions with return types. "

" Identities

Dealing with entity identities is one of those things that you have to keep in mind at all times when working with ORMs, forcing you to write for two systems while only have the expressivity of one.

When you have foreign keys, you refer to related identities with an identifier. In your application, "identifier" takes on various meanings, but usually it's the memory location (a pointer). In the database, it's the state of the object itself. These two things don't really get along because you can really only use database identifiers in the database (the ultimate destination of the data you're working with).

What this results in is having to manipulate the ORM to get a database identifier by manually flushing the cache or doing a partial commit to get the actual database identifier.

I can't even call this a leaky abstraction because the work "leak" implies small amounts of the contents escaping relative to the source. Transactions

Something that Neward alludes to is the need for developers to handle transactions. Transactions are dynamically scoped, which is a powerful but mostly neglected concept in programming languages due to the confusion they cause if overused. This leads to a lot of boilerplate code with exception handlers and a careful consideration of where transaction boundaries should occur. It also makes you pass session objects around to any function/method that might have to communicate with the database.

The concept of a transaction translates poorly to applications due to their reliance on context based on time. As mentioned, dynamic scoping is one way to use this in a program, but it is at odds with lexical scoping, the dominant paradigm. Thus, you must take great care to know about the "when" of a transaction when writing code that works with databases and can make modularity tricky ("Here's a useful function that will only work in certain contexts"). "

---