Bayle Shanks's website: proj-jasper-jasperDataNotes2

---

http://blog.codeclimate.com/blog/2014/06/05/choose-protocol-buffers/

required/optional/repeated are annotations of items in a data structure

numbered fields for versioning

i'm been focusing on graphs/trees/ordered dicts as a generalization of Python's lists and dicts

still, in a way, lists and dicts are nicely complementary ordered dict as faceted dict (the unordered dict facet, and the list facet) so, if you only had unordered dicts, then what would lists buy you? iterators for comprehensive traversal of all items, and ADT destructuring bind (e.g. f(head:tail) = ...)

http://c2.com/cgi/wiki?RelationalLispWeenie

http://c2.com/cgi/wiki?MinimalTable

contrast declarative table access with http://c2.com/cgi/wiki?NavigationalDatabase :

" A term sometimes used to describe NetworkDatabase? (a.k.a. "Codasyl database) and HierarchicalDatabase? systems and techniques. This is because one often has to "navigate" from node to node (object-to-object or record-to-record) by "pointer hopping" or "reference hopping". The navigation is characterized by:

    Explicit directional instructions like "next", "previous", "nextChild", "toParent", etc.,
    Paths (like file paths)
    Common use of explicit (imperative) loops to traverse the structure(s) for aggregate calculations or finding/filtering.

This is in contrast to RelationalDatabase? techniques which tend to use set notation, LogicProgramming?-like or FunctionalProgramming?-like techniques, and the concept of a "table" to logically describe what you want rather than how to navigate to get it. "

---

a relational db can be seen as a generalization of a triplestore to an ntuplestore. But somehow, operations like inner, outer join seem to me to be too low level; sometimes you just want to abstractly consider all available facts about an object; in these cases, the structure in terms of tables should be implicit/lower level. But i guess other times, you want to compute eg a list of all things that a given person has bought.

---

"Pardon me if I'm just sniping on the word "object" here, but if you think of your data as objects then you will find the relational model restrictive.

In my experience, objects are an application concept, closely coupled to an implementation. If you can conceive of your data in implementation-independent terms, i.e. as entities and relationships, then you can put a RDBMS to effective use." -- https://news.ycombinator.com/item?id=8378176

---

http://c2.com/cgi/wiki?AreTablesGeneralPurposeStructures

---

http://www.haskell.org/haskellwiki/Foldable_and_Traversable

---

y'know, using the pandas library was too hard, but a simple step would be just to add names to arrays to make 'data frames', that you could index into using the names instead of the dimensions:

the usual way: 'i'll just remember that dimension 1 (rows) of A is Time and dimension 2 (columns) are things that happen at that time, in order, Heat, Speed' and then:

A[60:1801, 1] (select Speed at times from one minute to half an hour)

instead, the programmer could choose any of:

A[60:1801, 1] A[60:1801, heat] A[time=60:1801, heat] A[time=60:1801, 1]

it's not clear whether it would be better to have the column names as implict keywords in the source code, like i have here, or quoted:

A[60:1801, 1] A[60:1801, "heat"] A[time=60:1801, "heat"] A[time=60:1801, 1]

i'd like to emphasize that dataframes are not, as i once thought, a system for naming dimensions in arbitrary n-dimensional arrays. Rather, they are a system for 2-D arrays, where each rows is a data point, and the columns are attributes. It's worth noting that this is a similar construct to a relational DB table; in both cases, a data point is a 'row', each row can have many attributes, and each attribute is a 'column'. One difference, if any, is that DataFrames? can have row labels (i guess pandas calls the column/series of row labels an 'index'); these serve a similar function to db Table's primary keys, except afaict the dataframe row label is not, by convention, an actual normal column (not sure about that though). Rows and columns are not quite symmetric, afaict.

also, a common conversion: between a representation where you have a series of attribute values and the row identifiers are implicit, and are implicitly a range of uniformally ascending integers starting from zero (a 'raster' representation), and on the other hand a representation where the row identifiers are explicit yet strictly ascending, may skip some integers, may not start at zero. this is like matlab 'griddata'. Could the 'views' machinery help map one to the other? And remember, for some series with implicit row identifiers starting at zero, what the offset is into 'true' row identifiers which are also integers but which start at some high number? i think so.

(i call it convertLabeledTimesToRaster(times, values) in my internal utils lib)