Bayle Shanks's website: proj-oot-ootDataNotes1

oot data design todos

representations may be related by a chain of transformations, not just single transformations; the transformations form a category
transformations between representations may not be injective, in which case some representations have more information than others. Should be require that there is always at least one representation (perhaps the 'meta' representation) that has all the information, e.g. from which there is an injective map to any other representation?
transformations between representations may not be surjective, in which case operations will be available on the transformed representation that don't correspond to any operation in the original representation. If such operations are applied, should we (a) throw an exception, (b) simply have a non-injective mapping back to the original representation, and allow the transformed representation to hold onto the extra information (in which case we have to specify what happens if then the data is altered again using the original representation; e.g. does this cause the extra information in the other representation to be lost, as if the last representation to be used is always the 'real' form of the object, which is converted after the fact? what if an object is passed to a subroutine that only takes a superclass of the object; do we lose all of the content in the extra fields in the object? it seems that the answer should be no, but then how long do we hold the extra info? what if the object is copied within the superclass-using subroutine, instead of mutated?) (c) immediately take a round-trip to the smaller representation and then back to the current one, immediately erasing the extra info

---

see also OotView?

---

" Self provides structural re ection via mirrors [25]. It can actually be argued that mirrors are a rediscovery of up and down from 2-Lisp, but put in an object- oriented setting. However, mirrors provide new and interesting motivations for a strict separation into internal and external representations. Especially, mirrors allow for multiple di erent internal representations of the same external object. For example, this can be interesting in distributed systems, where one internal representation may yield details of the remote reference to a remote object, while another one may yield details about the remote object itself. AmbientTalk?/2 is based on mirrors as well, but extends them with mirages that provide a form of (object-oriented) procedural re ection [26]."

huh i wonder if these are like my 'perspectives' or 'views'?

---

to generalize (hyper)nodes further (?!):

n-ary Chu-ish spaces

(note that ordinary Chu spaces can already represent n-ary relations, see http://chu.stanford.edu/ ; i say Chu-ish b/c i'm not really thinking of Chu spaces here, just an n-ary matrix that is like the characteristic matrix of a relation, except with an arbitrary alphabet)

lattice ops (lub and glb)

" onal Programming Language A relational programming language (RPL) is a DeclarativeLanguage? built around the RelationalModel? of data. StructuredQueryLanguage? (SQL) is an example.

See also Jay Earley's VERS 2, a ProceduralProgrammingLanguage?? built around the RelationalModel? of data. For example, this paper:

Jay Earley "Relational Level Data Structures for Programming Languages". Acta Informatica 2: 293-309 (1973)

Also LIBRA - A general-purpose programming language based on the algebra of binary relations

http://www.cs.adelaide.edu.au/~dwyer/TR95-10_TOC.html

    "Ordinary programming languages calculate functions. Sometimes a function is inappropriate. For example, 4 has two square roots, +2 and -2, but an ordinary programming language provides a sqrt function that returns only one of the roots."

The PrologLanguage? could be considered a relational programming language.

Not really. PrologLanguage? has no inherent support for labeled tuples. All it has is untyped lists of "atoms". In this respect it is quite similar to LispLanguage?. I suppose you could add support for labeled (typed) tuples, though. OTOH one of the fundaments of PrologLanguage? is an in-memory database of "facts" (forgot the actual term - clauses maybe? - but "facts" describes them better than original).

PrologLanguage? has tuples, but you examine them using pattern matching instead of using labels. Prolog is based on predicate logic rather than relational calculus. Relational calculus is I think a restricted form of predicate logic.

The truth statements of prologs are similar to the rows of a relational table. (are there your tuples above?)

 employee ('Joe Doe', 1979, 'Department of Defense').
 employee_managed_by ('Joe Doe', 'Mister X').

The thing that RelationalCalculus? is missing are the rules of Prolog. It is straightforward to have a "directReports" relationship in a relational table--associating each employee with his/her boss--but deriving the "indirectReports" relationship (the transitive closure of the directReports relationship) is trickier. You can do it with RelationalJoin??s, but that's ugly. A rule in PrologLanguage? expresses this relationship much more succinctly.

This can be made into a PrologForMassiveData?. "

-- http://cs.adelaide.edu.au/~dwyer/TR95-10.html

" let transition->{'Cold','Warm';'Warm','Hot';'Hot','Warm';'Warm','Cold'}.

The braces enclose a set of 4 elements, separated by semicolons. Each element is an ordered pair of terms, separated by a comma. The relation is given the name `transition'. The two means used in this example--set formation and pair formation--are the only ways of building data structures in Libra. The members of a set are unordered, but pairs are ordered.

There are three ways this relation could be used:

    It could generate the four pairs of values.
    It could be used to test whether a pair of values satisfy the relation.
    Given the first term of a pair or pairs, it could give the corresponding second terms as a result. For example, given 'Warm', it could yield both 'Cold' and 'Hot'.

The first two ways of looking at relations are shared by sets in general. We may enumerate their members, or test elements for membership of them. The third view puts the relation in a more active role, and is described as `applying' the relation to an argument to yield one or more values as a result. This is analogous to the view we take of applying functions in a functional programming language.

LIBRA's relations are directional and cannot be reversed and are binary, but Oot will have reversible, n-ary relations

also generalize relation from 0,1 to arbitrary alphabet

okay if 'f x' is our syntax for 'apply f to x', then we want to get the same answer whether 'f' is a function, or a relation with the functional property. So, although the natural form of returning a relation application would be as a set, by default we return just a random member of that set.

if you want the whole set you can use application modes (the .apply attr)

kant's 3 quantities (was it "quantities"?) jive with/suggest 3 application modes: implicit map (forall, universal), pick one random thing (exists, particular), treat the set as one object (singular)

e.g. if X and Y are sets, then maybe X*Y in the forall mode means elementwise multiplication, in the exists mode means to select one element of X and one element of Y and multiply them, in the singular mode means matrix multiplication

when one thing can implement one interface multiple ways, the perspective chooses which way in this way the same operator really does mean different things in different contexts

note: to Chu-ish-ify the relations, the 'apply mode' also possibly can be changed to give a condition on the K (that is the Chu-ish lookup table valuation of each tuple potentially 'in' the relation). for instance, if we have false/maybe/true as 0,0.5,1, the mode might say "accept anything >= 0.5" or "accept anything >= 1".

i'm becoming partial to the 'root convention' by which a variable holding a net is essentially holding the root or 'scope lookup table' of that net, that is, a node which is a dict that maps to each other node by label, if applicable, or without IDs (e.g. as a set), if not.

the current lexical environment also specifies the assignment of operations (which are just methods in interfaces) to punctuation symbols, e.g. '*' is literally a variable whose value is the 'multiply' interface, and when you say 'a * b', it looks up the value of '*'.

imagine that we have the platform int representation of an int, and also the successor representations.

now imagine that we have an object that supports the int interface, but that also has some metadata attached (it supports the FooFrameworkMetadata? interface).

now imagine that we have an FFI. The FFI will think, "oh i know how to represent platform ints" and will just pass that. but that will be wrong, because it leaves out the metadata. in fact a representation is a generalization of 'serialization', so this is the same sort of problem as if the object has a __repr__ method that actually only prints out the int.

what we need to do is to represent that the platformInt representation and the successor representation are just two ways of seeing the same data (two perspectives?), but that the FooFrameworkMetadata? interface is providing different data.

I suppose one might say that an object's total informational content is given by the sum of all of the perspective equivalence classes supported by that object. E.g. in this case, the 'int' equivalence class, and FooFrameworkMetadata?.

interestingly, a perspective equivalence class is the same kind as a class of typeclasses, although it is different: if you constructively have a perspective equivalence class, you can convert any perspective in the class into any other, and furthermore any composition of such conversions that gets back to the same perspective it started at is an identity.

could you sometimes want to support adding multiple perspectives that are in the same equivalence class to the same object, but to keep their information separate?

what do you do when there are multiple isomorphisms between two interfaces?

how does implicit conversion work, if it works at all? when there are multiple typeclass instances that apply to an object for the same typeclass, and/or when multiple implicit conversions are possible, is the choice between them part of the data carried with the object, or part of the static lexical context?

so is FooFrameworkMetadata? interface vs. the int interfaces two different 'parts' of the objects, one of which has one perspective and one of which has two, or do we just have three perspectives?

maybe the latter; one can imagine objects whose perspectives all share some information with at least one other perspective forming a connected graph, but in which no two perspectives are isomorphic.

so we have the actual object, the thing-in-itself, the implementation, and then we have various perspectives on it, each of which are just projections of some subset of its information, then transformed in some way. e.g. the function from the object-in-itself to a perspective on it can be decomposed into a projection and then an injective transformation.

but we dont want to actually make the implementation special at this point; we just want to define homomorphisms between parts of the perspectives. the programmer need not know whether the platform int or the successor representation is 'more real'

however it would be useful for the language to have the concept of an isomorphism class of perspectives; and also of disjoint perspectives; even if in general perspectives can related in ways more complex than this.

one way to relate the perspectives is an event-driven/listenery/dataflowy way.

one can see the perspectives as just an ordinary layer of indirection on top of each graph. the purpose of this is to hide information so that e.g. a list can be considered to be jsut the enumeration over all arc coming out of its root node, when in reality there may be other metainformation, such as the type of the list, attached to this data also.

---

is boundary mode per-perspective?

can you layer perspectives, e.g. layer a boundary perspective beneath another perspective?

if a cross-perspective connection has been specified, then when one of the nodes changes, the corresponding nodes in the other perspective must change. they should by default opt to change lazily, however, e.g. only if they are queried. so oot must perform 'dirty checking' to maintain a dirty bit for each node in each perspective that is linked to other perspectives. Note that changes may be transitive; if perspective A is linked to B, and B to C, then if a user change in perspective A causes parts of B to be marked dirty, that may cause parts of C to be marked dirty; and a query to C later may cause B to be updated from A and then C to be updated from B.

need a way to give a 'path' that includes both a filepath and the name of a variable in a serialized savefile; e.g. in Python my thesis code is cluttered with junk like:

def create_topscoring_gene_lists_from_geneclusters_pipeline(startfile_mean_path, clustersfile_path, scores_norm_hv_outputfile_base_path, n=8): D = loadmat(startfile_mean_path)['D'] D_valid = loadmat(startfile_mean_path)['D_valid']

    clustersfile = cPickle.load(open(clustersfile_path))
    clusters = clustersfile['clusters']
    protos = clustersfile['colorProtos']

    ...only now does the real computation begin...

i guess this is a stdlib issue because i guess i could make a function myself which does this in Python.. mb i should

---

https://github.com/martinblech/xmltodict

---

implicit conversion:

single step implicit conversion implicitly (e.g. if Int has a Bool interface (do you need to mark that interface 'implicit'?), then 'if 0' and 'if 1' work like 'if False' and 'if True')

multi step only by request (with the syntactic 'implicit' operator), and you must specify the path between types unless all possible paths are marked 'commutative' with each other (a commutative clique of paths) or unless all possible preferred paths form a commutative clique of paths. e.g. there is one level of priority for transforms, 'preferred' or 'non-preferred'. mb 'preferred' should be the same as 'implicit', in which case we don't have to consider the non-preferred case.

btw here's how scala does it: http://www.scala-lang.org/old/node/130 http://stackoverflow.com/questions/5598085/where-does-scala-look-for-implicits (presents priority rules) http://pietrowski.info/2009/07/scala-implicit-conversion/ (claims that there is no chaining, and that implicits must be unambiguous?!?) http://tomjefferys.blogspot.com/2011/11/implicit-conversions-in-scala.html http://stackoverflow.com/questions/5332801/how-can-i-chain-implicits-in-scala (no chaining except via implicit params) http://stackoverflow.com/questions/9557649/scala-transitive-implicit-conversion http://suereth.blogspot.com/2011/02/slides-for-todays-nescala-talk.html http://stackoverflow.com/questions/6906742/scala-implicit-conversion-scope-issues

i would mb: disallow chaining (if that's good enough for scala it should be good enough for us), yes only have 'implicit' not 'preferred', probably not have 'implicit parameters' add another level of chaining (i dont understand that yet tho), and allow one implicit to take priority over another only if one is defined in the current module and the other isnt.

but then here's my problem: shouldn't casting an Int to a Bool be the same as using a Boolable interface? And surely interfaces should be chainable, to promote making small decomposed interfaces (e.g. Int -> Num, Num -> Ord, Ord -> Sortable)

also, what's the difference between defining an Int -> Num cast, and making an Int a subclass of Num? Is it whether or not we have mutable state? or is a subclass when we have a number of functions that define an implementation data representation (e.g. a Lock implemented by a file in /var/run)? do we even need implementation data representations at all?

so, it seems we need to unify 3 things: dependent interfaces, implicit casting, subclassing. with a particular eye to transitivity.

also clearly interfaces should be first-class :)

hmm, i guess it's very important for interfaces to be chainable/transitive. so we gotta have that.

regarding the differences between OOP 'classes' with 'implementations' and typeclasses/interfaces: i think the difference is that in an OOP class, the various methods in the class are all making assumptions about the format of the internal data that is actually storing the stuff.

For example, let's say you have an object representing 8 bytes of data. It exposes an interface like this: read(byte): value, write(byte, value), where 'byte' is an int between 0 and 7 inclusive, and 'value' is an int between 0 and 255 inclusive. Now, you want this to expose an Int64 interface. But there's actually many two ways to do it, for example: big endian and little endian.

if that's true, then one might say that the difference is EXACTLY when there is more than one way to implement a given interface on a piece of data that already satisfies.

However, i still think there might be times when you don't want an interface automatically applied even if it is unique; that is, when it isn't a 'natural' part of the semantics of the original data, when it feels more like a transformation. E.g. each function can be associated with its Godel number integer encoding, but if you have a function and you add 1 to it, you probably should get a compiler error rather than it changing to whatever function has the next Godel number.

What about subclassing? I can imagine that you might want to define arithmetic on the Int64 defined above, calculated in an optimized manner that makes use of knowledge about the internal storage format; even though in theory you could define it all in recursive definitions using only zero and successor pattern matching, which can be implemented based on just the standard Int64 interface. Again, these optimized arithmetic ops (not the successor-based ones) will need to know about whether the internal storage is big endian or little endian.

So you have a superclass like 'little endian Int64' and a subclass 'little endian Int64 with arithmetic'. First note that these are what i've been calling 'implementations'. Second, one might just consider the subclass's OOP relation to its super as one defined just by inheritance; it presents a new interface (arithmetic) but that can be done without OOP. What i mean by "defined just by inheritance" is that one could consider the subclass to be a totally separate class than the super, but the inheritance is a synctactic shortcut to copy a bunch of code. However, this is missing something; if you have a 'little endian Int64 with arithmetic', and you have a function that wants a 'little endian Int64' (e.g. this function must be another implementation-dependent optimization), then you can pass the 'little endian Int64 with arithmetic' right in; you don't have to convert it, or to make the function go thru a public interface.

So in addition, subclassing adds some extensibleness/modularity to the process of writing optimized code for interacting with an object's internals.

Note that not allowing a function in another module to demand a 'little endian Int64' means that only functions in the original defining module can contribute efficient implementations of data operations (only composition is allowed, not inheritance). But allowing this goes against my doctrine of 'everything is an interface' and clears the way for ByteString?-demanding parser libraries to spring up. How to fight this?

Note that this business of having subclasses in addition to interfaces because subclasses are allowed to know about the internal data representation is like protected methods and fields.

this ties into the philosophy in that if the class is the aquinas 'essence', that's like saying the internal data format is like kant's 'thing-in-itself'.

and as usual, the reason that multiple inheritance may be confusing is that the two parent classes may use the same name for some protected field or method, which may or may not be desirable.

hmm.. to solve the multiple inheritance problem, the compiler could simply check for overlapping names, and require the inheriting programmer to specify, for each one separately, whether it is to be overlapping (accept this into the namespace even though others are also using the name) or disjoint (rename all occurences of the name in the inheriting superclass; like a hygenic macro)

also, to prevent a culture of overly diverging from everything-is-an-interface, could do: (a) every subclass method is either protected or an implementation of a public interface method (b) if you don't at least mention at least one protected method (remember fields are properties so are the same sort of thing as methods in oot) in your subclass, then it can't be a subclass method, it must be an interface method, (c) only methods attached to an object can be part of the subclass.

hey, i guess that if the perspectives are just one level of indirection, one step down from the root, this can easily be generalized to saying that a perspective is a selection of the root node! now we can have perspectives within other perspectives! and the real root is the meta node! (it's not necessarily a tree, but every node can be reached traveling from the meta node)

---

ok, two things:

views, where the network structure of a data element is changed, yet the idea is that there is a 'thing-in-itself' which has a homomorphism to each view. However, this 'thing-in-itself' may be a product of multiple implementations, as when a framework adds meta-data to an underlying data object
perspectives, which are decisions about which typeclass instance will be used to implement a particular ad-hoc operation when that operation is applied to this object. this is needed to allow you to have a data object, x, and pass it to Subroutine1, which will use the addition ('+') operator on it, where you want addition to invoke AdHocAdditionInstance?1; and then later to pass the same data object x to Subroutine1 again, but now when Subroutine1 uses addition you want that to invoke AdHocAdditionInstance?2.

one might give 'perspectives' a different name to make the names easier for people to remember. one might identify views with structural types and perspectives to nominal types, b/c the ad-hoc (typeclass instance) resolution should still happen at compile time when possible. however i still like the idea of being able to give a name to a structural type pattern and to have the type inference algorithm not create union types by itself, e.g. rely on user naming to control complexity.

so note that the choice of ad-hoc instance is made by the person supplying the data and is carried with the data, it is not made at the use site where the ad-hoc operator is called.

for functions that take multiple arguments one has to determine which argument controls the choice of perspective. One idea is to just have a simple rule like 'the leftmost one', and another idea is to allow the programmer to determine it somehow.

---

" I'm not sure the "Relational Model" is even a great conceptual model for common business or analytical processing. I think SQL is ugly because the model is a poor fit; if the model was solid, a clean syntax would follow naturally. As for alternatives, MDX has superseded it for core "pivot table" analytics. For application processing, it seems modern document/hierarchical databases are good for many transactional needs. "

upvote

clarkevans 1 day ago

link

I'm not sure the "Relational Model" is even a great conceptual model for common business or analytical processing. I think SQL is ugly because the model is a poor fit; if the model was solid, a clean syntax would follow naturally. As for alternatives, MDX has superseded it for core "pivot table" analytics. For application processing, it seems modern document/hierarchical databases are good for many transactional needs.

What modern relational databases have... is a very intuitive abstract storage model. That's awesome. We've been working on an alternative "navigational model" (inspired from CODASYL and ORMs) based on this storage model. Our experimental query language is at http://htsql.org

Our critique of SQL is at http://htsql.org/doc/overview.html#why-not-sql

upvote

koolkao 1 day ago

link

I have found SQL to be cumbersome for expressing temporal relationships, eg find all the event As that happen within one week of event B. There's not necessarily a data schema link between the table for event A and the table for event B.

how does htsql do with this?

upvote

clarkevans 1 day ago

link

Let's say you wish to list all students in a university, and then, show all other students with the same birth year and month. So, you'd start with ``/student`` and then you could select all columns using the asterisk. Next, you could use the ``fork()`` operation to join to the same table based on a set of columns that are correlated.

/student{, /fork(year(dob),month(dob))}

http://demo.htsql.org/student%7B*,%20/fork%28year%28dob%29,m...

You could wrap this up in a definition to make it a bit easier to understand:

/student .define(students_same_month := fork(year(dob),month(dob))) .select(, /students_same_month )

http://demo.htsql.org/student%0A.define%28students_same_mont...

With the more general case, let's say you're interested in listing per semester, which students started during that semester. In this case, you start with ``/semester`` and then define the correlated set, ``starting_students`` which uses the ``@`` attachment operator to link the two sets. Then, you'd select columns from semester and the list of correlated starting students.

/semester .define(starting_students:= (student.start_date>=begin_date& student.start_date<=end_date)@student) {, /starting_students }

http://demo.htsql.org/semester.define%28starting_students:=%...

Typically, you'd include the new link definition, ``starting_students`` in your configuration file so that you don't have to define it again... then the query is quite intuitive:

/semester{, /starting_students }

While this all may seem complex, it's not an easy problem. More importantly, HTSQL has the notion of "navigation". You're not filtering a cross product of semesters and students, instead, you're defining a named linkage from each semester to a set of students. The rows returned in the top level of query are semesters and the rows returned in the nested level are students.

upvote

koolkao 1 day ago

link

Thank you for the examples.

I made some queries that are analogous to my temporal query needs

Here I'm looking for every student, other students with DOB within 2 weeks of the given student: http://demo.htsql.org/student.define(similar_students:=%20(s...

Here for every semester with at least one student, I find the oldest and the youngest students enrolled: http://demo.htsql.org/semester%20.define(starting_students:=...

No being a computer scientist I have to admit I do not appreciate the intricacies of the 'problems with SQL' blog entries. But working with htsql I gotta say it seems a lot more intuitive than SQL. It feels like the logic correspond much better to my mental model. And that there is much less of the jumping up and down the code to nest my SQL code logic that I find myself doing all the time.

Is there a way to install this on a PostgresQL?