proj-oot-old-150618-ootDataNotes2

http://www.w3.org/TR/json-ld/#data-model-overview

http://json-ld.org/learn.html

http://www.slideshare.net/gkellogg1/jsonld-json-for-the-social-web

http://www.slideshare.net/lanthaler/building-next-generation-web-ap-is-with-jsonld-and-hydra

http://www.slideshare.net/lanthaler/jsonld-for-restful-services

http://www.slideshare.net/gkellogg1/json-for-linked-data

https://cloudant.com/blog/webizing-your-database-with-linked-data-in-json-ld/#.VAZAY9aNGZw

---

http://blog.codeclimate.com/blog/2014/06/05/choose-protocol-buffers/

required/optional/repeated are annotations of items in a data structure

numbered fields for versioning


i'm been focusing on graphs/trees/ordered dicts as a generalization of Python's lists and dicts

still, in a way, lists and dicts are nicely complementary ordered dict as faceted dict (the unordered dict facet, and the list facet) so, if you only had unordered dicts, then what would lists buy you? iterators for comprehensive traversal of all items, and ADT destructuring bind (e.g. f(head:tail) = ...)

http://c2.com/cgi/wiki?RelationalLispWeenie

http://c2.com/cgi/wiki?MinimalTable

contrast declarative table access with http://c2.com/cgi/wiki?NavigationalDatabase :

" A term sometimes used to describe NetworkDatabase? (a.k.a. "Codasyl database) and HierarchicalDatabase? systems and techniques. This is because one often has to "navigate" from node to node (object-to-object or record-to-record) by "pointer hopping" or "reference hopping". The navigation is characterized by:

    Explicit directional instructions like "next", "previous", "nextChild", "toParent", etc.,
    Paths (like file paths)
    Common use of explicit (imperative) loops to traverse the structure(s) for aggregate calculations or finding/filtering.

This is in contrast to RelationalDatabase? techniques which tend to use set notation, LogicProgramming?-like or FunctionalProgramming?-like techniques, and the concept of a "table" to logically describe what you want rather than how to navigate to get it. "

---

a relational db can be seen as a generalization of a triplestore to an ntuplestore. But somehow, operations like inner, outer join seem to me to be too low level; sometimes you just want to abstractly consider all available facts about an object; in these cases, the structure in terms of tables should be implicit/lower level. But i guess other times, you want to compute eg a list of all things that a given person has bought.

---

"Pardon me if I'm just sniping on the word "object" here, but if you think of your data as objects then you will find the relational model restrictive.

In my experience, objects are an application concept, closely coupled to an implementation. If you can conceive of your data in implementation-independent terms, i.e. as entities and relationships, then you can put a RDBMS to effective use." -- https://news.ycombinator.com/item?id=8378176

---

http://c2.com/cgi/wiki?AreTablesGeneralPurposeStructures

---

http://www.haskell.org/haskellwiki/Foldable_and_Traversable

---

y'know, using the pandas library was too hard, but a simple step would be just to add names to arrays to make 'data frames', that you could index into using the names instead of the dimensions:

the usual way: 'i'll just remember that dimension 1 (rows) of A is Time and dimension 2 (columns) are things that happen at that time, in order, Heat, Speed' and then:

A[60:1801, 1] (select Speed at times from one minute to half an hour)

instead, the programmer could choose any of:

A[60:1801, 1] A[60:1801, heat] A[time=60:1801, heat] A[time=60:1801, 1]

it's not clear whether it would be better to have the column names as implict keywords in the source code, like i have here, or quoted:

A[60:1801, 1] A[60:1801, "heat"] A[time=60:1801, "heat"] A[time=60:1801, 1]

see also http://www.r-tutor.com/r-introduction/data-frame

i'd like to emphasize that dataframes are not, as i once thought, a system for naming dimensions in arbitrary n-dimensional arrays. Rather, they are a system for 2-D arrays, where each rows is a data point, and the columns are attributes. It's worth noting that this is a similar construct to a relational DB table; in both cases, a data point is a 'row', each row can have many attributes, and each attribute is a 'column'. One difference, if any, is that DataFrames? can have row labels (i guess pandas calls the column/series of row labels an 'index'); these serve a similar function to db Table's primary keys, except afaict the dataframe row label is not, by convention, an actual normal column (not sure about that though). Rows and columns are not quite symmetric, afaict.

also, a common conversion: between a representation where you have a series of attribute values and the row identifiers are implicit, and are implicitly a range of uniformally ascending integers starting from zero (a 'raster' representation), and on the other hand a representation where the row identifiers are explicit yet strictly ascending, may skip some integers, may not start at zero. this is like matlab 'griddata'. Could the 'views' machinery help map one to the other? And remember, for some series with implicit row identifiers starting at zero, what the offset is into 'true' row identifiers which are also integers but which start at some high number? i think so.

(i call it convertLabeledTimesToRaster(times, values) in my internal utils lib)


copied from [1]:

the idea from the end of the previous section is important enough that i wanted to emphasize it in its own section; instead of representing only the choice of a field, 'memory offsets' should represent a whole PATH, that is a choice of a field and then optionally of a subfield, etc. We might interchangably think of a PATH and of a QUERY. Eg .b.c.d (apply this offset/path/query to variable 'a', and you get a.b.c.d); eg [b][c] (apply this offset/path/query to 2-d array 'a', and you get a[b][c]); eg x y (apply the function f to these two arguments, and you get (f x y)).

note that with regard to function application, we are reminded of the 0-ary function issue, which we saw elsewhere also corresponds to the distinction between applying a complete slice index to a table and getting back a table with one element (for consistency with partial slices, which return eg columns of 2-d tables) vs. applying a complete slice index to a table and getting back a non-table value. And also of the issue of __set needing a 'path' through an object, as opposed to __get just needing a value.

And also of the connection between adding offsets to pointers, and getattr (doing a single field selection), and having offsets able to represent paths (nested field selection). In C adding an offset to a pointer would work as a nested field selection iff the fields before the selected field were of known length; but in our case we want to abstract away from that byte length of fields. You might call paths 'queries', which also implies that more complicated, 'dynamic' methods of selection might obtain; basically, anything computation that walks the graph and returns a pointer to a destination node. We see a similar distinction here as between effective address and referent in addressing mode. However 'a pointer to a destination node' is not really a 'path', so we may need to rethink calling these 'paths'; otoh b/c of metaprogramming like __get, __set, in some cases it will be important which nodes you pass thru on the way to another node, because the node you finally reached may be only a virtual node simulated by running __gets and __sets on nodes that you went thru to get here.

---

now if paths ARE important, and should be used as 'effective addresses' so that metaprogramming in the middle of a path has a chance to override, then how does that relate to the more typical idea that effective addresses are pointers, single locations? and if we have paths, we have category theory, so what else does this suggest?

i think it means that we should have equations between paths. That is, commuting diagrams; be able to assert that a.b.c = d.e.f, not just that the value (referent) a.b.c = the value d.e.f, but rather than the location (reference; effective address; path) a.b.c = (the path) d.e.f.

this also allows us to express things like OOP delegation. Which is another instance of having metaprogramming in the middle of a path.

this also ties in with the idea that we should be able to represent, in the language, the concept of one object having multiple ids (one of the key ideas of a Kantian 'object', eg the idea that there is something out there in the world that is somehow connected to various distinct sensory impressions, although of course the Kantian skepticism about what the object is/if there are more than one of them, which was one of his main points, isn't so useful here). Another example is that an item in your database might correspond to an item in someone else's database, but may have different primary keys. The idea that effective addresses = pointers = single locations is related to thinking of things as having a single canonical id, whereas the idea that an effective address is a path and that there may be equations between paths is more 'relative' and is related to recognition that some things may not have any canonical primary id.

---

in fact, ideal data type may be distinct from interfacee (signature); lists that can be resized, and tuples that cannot, are probably the same ideal type even though they support different operations (would this be true even if neither's signature was a subset of the other? probably; imagine lists which could only be leengthened, vs lists that could only be reduced: otoh in that example there is a common core)

---

note that the diagrams in http://research.swtch.com/godata provide a great example of what i mean by adding (topology? geometry?) to a graph: the boxes representing an array are graph nodes, but they are also contiguous and go from left to right, which is important this is probably the key to represeenting arrays in oot directions (leftright vs updown) seem like named args/things that u query/asguments for partiable fn :)/ dimensions in multidim vectors

---

Taw has a blog post that bears directly on Oot:

http://t-a-w.blogspot.com/2010/07/arrays-are-not-integer-indexed-hashes.html

in Oot we treat both arrays and hashes as special cases of Graphs. Taw explains why this can't work. Namely:

" Consider this - what should be the return value of {0 => "zero", 1 => "one"}.select{

k,vv == "one"}?

If we treat it as a hash - let's say a mapping of numbers to their English names, there is only one correct answer, and everything else is completely wrong - {1=>"one"}.

On the other hand if we treat it as an array - just an ordered list of words - there is also only one correct answer, and everything else is completely wrong - {0=>"one"}.

These two are of course totally incompatible. And an identical problem affects a lot of essential methods. Deleting an element renumbers items for an array, but not for a hash. shift/unshift/drop/insert/slice make so sense for hashes, and methods like group_by and partition have two valid and conflicting interpretations. It is, pretty much, unfixable.

"

i think this can be resolved with Views; a slightly encapsulated version of " a monstrosity like PHP where ... half of array functions accept a boolean flag asking if you'd rather have it behave like an array or like a hash." "

---

hmm i think Haskell 'lenses' are like Oot 'Queries' or 'Paths'

https://www.fpcomplete.com/school/to-infinity-and-beyond/pick-of-the-week/a-little-lens-starter-tutorial

note that they generalize 'set' to 'over'; 'over' applies a fn to the target, eg the provided fn takes the current value and what it returns is what the new value is set to eg over(f, x) = x.__set__(f(x.__get__()))

these are cool too:

https://www.fpcomplete.com/school/to-infinity-and-beyond/pick-of-the-week/a-little-lens-starter-tutorial#the-lens-laws-

---

(first-class) 'queries' (=lenses, paths, slices) are closely related to data-binding

note: a 'path' through a graph is a graph way of looking at a reference to a part of a structure (e.g. parent.name); a 'slice' can be multiple 'paths' (e.g. array[2:4] references multiple elements in the array at once) with ordering (like an array), but slices do not usually have complex paths; one could generalize these by combining the two (eg parent.name[2:3] or parent[2:3].name); one could also generalize them by allowing reference to multiple elements at once in a dict-like (or more generally, graph-like) manner instead of just array-like... remember that we need to handle the results of eg SQL queries in this way... also note that we need dataframes (column names, and dimension names)..

---

"variants...are the dual of structs..." -- https://www.quora.com/Do-you-feel-that-golang-is-ugly/answer/Tikhon-Jelvis?srid=hMkC&share=1

--

what are Oot "patterns"? generalizing from regexps, i guess they are languages for boolean expressions with captures? eg 'http://([^ ]+)' is a regexp which is a boolean expression (it 'matches') but it also has a 'capture'.

in oot we want such patterns to also be 'firstclass', so they can be assigned to variables, dynamically built and manipulated, etc

--

in Javascript you can use variables as a way to get a namespace, eg:

var myModule = {
    controller: function() {},
    view: function() {}
}

(example from http://lhorie.github.io/mithril/getting-started.html )

another thing i thought i saw, that i did not, but that might be interesting, would be using function() {} as a way to get new namespaces. What was actually there was:

//define the view-model
todo.vm = {
    init: function() {
        //a running list of todos
        todo.vm.list = new todo.TodoList();

        //a slot to store the name of a new todo before it is created
        todo.vm.description = m.prop('');

        //adds a todo to the list, and clears the description field for user convenience
        todo.vm.add = function(description) {
            if (description()) {
                todo.vm.list.push(new todo.Todo({description: description()}));
                todo.vm.description("");
            }
        };
    }
};

but i didnt see the 'init: ' on the line 'init: function() {', so for a little while i thought this was just another way to make a new namespace.

i think doing it that way would be kinda confusing, but less confusing would be the opposite, where any namespace could be used as a function. Which i guess is just a convoluted way of saying that functions are first-class values? But i wonder if it could mean anything else than that.

--

http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

--

"In Python, x[(exp1, exp2, ..., expN)] is equivalent to x[exp1, exp2, ..., expN]; the latter is just syntactic sugar for the former."

" Remember that a slicing tuple can always be constructed as obj and used in the x[obj] notation. Slice objects can be used in the construction in place of the [start:stop:step] notation. For example, x[1:10:5,::-1] can also be implemented as obj = (slice(1,10,5), slice(None,None,-1)); x[obj] . This can be useful for constructing generic code that works on arrays of arbitrary dimension. "

" Basic slicing with more than one non-: entry in the slicing tuple, acts like repeated application of slicing using a single non-: entry, where the non-: entries are successively taken (with all other non-: entries replaced by :). Thus, x[ind1,...,ind2,:] acts like x[ind1][...,ind2,:] under basic slicing. "

" Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types of advanced indexing: integer and Boolean.

Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view). "

"

Warning

The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this is occurs.

Also recognize that x[[1,2,3?]] will trigger advanced indexing, whereas x[[1,2,slice(None)?]] will trigger basic slicing. "

-- http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#arrays-indexing

--

'array scalars' are a good idea:

http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html

--

this is great stuff:

http://www.kythe.io/docs/schema/#_builtin_types_2

see [2].

---

"RocksDB? organizes all data in sorted order and the common operations are Get(key), Put(key), Delete(key) and Scan(key)."

RocksDB? also allows writes to be batched into atomic transactions

---