proj-plbook-plChDataLangs

Table of Contents for Programming Languages: a survey

Chapter : Data languages

http://www.catb.org/esr/writings/taoup/html/ch05s02.html

We include 'configuration languages' here too.

DSV-style

http://www.catb.org/esr/writings/taoup/html/ch05s02.html

CSV

http://www.catb.org/esr/writings/taoup/html/ch05s02.html

RFC 822 Format

http://www.catb.org/esr/writings/taoup/html/ch05s02.html

Cookie-Jar Format

http://www.catb.org/esr/writings/taoup/html/ch05s02.html

Record-Jar Format

http://www.catb.org/esr/writings/taoup/html/ch05s02.html

INI

http://www.catb.org/esr/writings/taoup/html/ch05s02.html

XML

Not Turing-complete

Represents trees.

Considered 'heavyweight'.

XML schema languages:

Some distinguishing features:

...

JSON

Not Turing-complete

Considered 'lightweight'.

Schemas:

Opinions:

JSON extensions and variants

JCOP

https://github.com/mortie/jcof

" JCOF tries to be a drop-in replacement for JSON, with most of the same semantics, but with a much more compact representation of objects. The main way it does this is to introduce a string table at the beginning of the object, and then replace all strings with indexes into that string table. It also employs a few extra tricks to make objects as small as possible, without losing the most important benefits of JSON. Most importantly, it remains a text-based, schemaless format.

The following JSON object:

{ "people": [ {"first-name": "Bob", "age": 32, "occupation": "Plumber", "full-time": true}, {"first-name": "Alice", "age": 28, "occupation": "Programmer", "full-time": true}, {"first-name": "Bernard", "age": 36, "occupation": null, "full-time": null}, {"first-name": "El", "age": 57, "occupation": "Programmer", "full-time": false} ] }

could be represented as the following JCOF object:

Programmer;"age""first-name""full-time""occupation"; {"people"[(0,iw"Bob"b"Plumber")(0,is"Alice"b,s0)(0,iA"Bernard"n,n)(0,iV"El"B,s0)]} "

Discussions:

ASN.1

https://en.m.wikipedia.org/wiki/Abstract_Syntax_Notation_One

Older format. Considered 'heavyweight', unpopular today.

"A while ago, I started trying to write a Rust implementation of an ASN.1 library. Just parsing the message description format is a nightmare, since it has so many features, additions, and modifications over time. You can have generic types in your serialization format, which is kind of neat, but introduces even more complexity.

There is some stuff in there that is good (like the Canonical Encoding Rules which let you let you have a 1:1 binary mapping to a given piece of data, critical for crypto), but for general server-to-server RPC, it is more complicated than is generally needed." -- https://www.reddit.com/r/rust/comments/daja9b/binary_format_shootout_capn_protoflatbuffers_and/f22o7de/

Messagepack

https://github.com/msgpack/msgpack/blob/master/spec.md

Opinions:

vs JSON:

Variants and extensions:

BSON

Muon

https://github.com/vshymanskyy/muon https://docs.google.com/presentation/d/1MosK6LTy_Rr32eF6HKej6UEtf9vBzdbeSF6YPb1_e4A/present#slide=id.g13de7db3282_28_0 https://lobste.rs/s/a7ougq/on_compact_simple_binary_encoding_on_par

EDN

https://github.com/edn-format/edn

Not "a system for representing objects - there are no reference types, nor should a consumer have an expectation that two equivalent elements in some body of edn will yield distinct object identities when read" [7].

types:

extensibility through 'tags': a tag starts with '#', and indicates the semantics of the following element. E.g.:

  1. myapp/Person {:first "Fred" :last "Mertz"}

"Upon encountering a tag, the reader will first read the next element (which may itself be or comprise other tagged elements), then pass the result to the corresponding handler for further interpretation, and the result of the handler will be the data value yielded by the tag + tagged element, i.e. reading a tag and tagged element yields one value."

built-in tagged elements:

comments: semicolon denotes a comment to-end-of-line

discard: #_ "indicates that the next element should be read and discarded...(note that the next element must still be a readable element"

Links:

(see also Datalog)

Datalog

Implementation:

Fressian

https://github.com/Datomic/fressian

Protobuf (Protocol Buffers)

Supported by Google.

Version 3 drops required fields and adds maps.

Primitive types: bool 32/64-bit float double string. Composite types: none, but repeated elements, and maps in v3. [8]

Resiliant to malicious input [9].

New fields ignored by old binaries. [10]

Does not support random access reads without parsing [11].

"there were three undesirable issues with Google Protobuf: (1) IRONdb is in C and while the C++ support is good, the C support is atrocious, (2) it conflates protocol with encoding so it becomes burdensome to not adopt gRPC, and (3) it’s actually pretty slow." -- https://www.circonus.com/2017/11/some-like-it-flat/

Opinions:

Alternate RPC implementations:

Links:

MessagePack

Bencode

" Indeed, json maps are not supposed to be ordered so doing anything that depends on the order is bound to fail.

This is the exact reason bencode (https://en.wikipedia.org/wiki/Bencode) was invented, and I still believe we can replace all uses of json by bencode and be better off it, because it solves all too common issues:

" -- [12]

CBOR

https://cbor.io/

Discussion:

Thrift

Primitive types: bool byte 16/32/64-bit double string. Composite types: list<t1>, set<t1>, map<t1, t2>. [14]

Avro

Schema is expected to always be available with the data (sender and receiver must share exactly the same schema; but it could be sent with the data). No code generation.

Schema in JSON. Example schema fragment:

{ "type": "record", "name": "BankDepositMsg", "fields" :
  [
    {"name": "user_id", "type": "int"},
    {"name": "amount", "type": "double", "default": "0.00"},
    {"name": "datestamp", "type": "long"} ]
}

Primitive types: null, boolean, int, long, float, double, bytes, string. Composite types: records, enums, arrays, maps, unions, fixed. [15]

Cap’n Proto

Sort-of successor to protobuf (Kenton Varda, the author of Cap'n Proto, was the primary author of Protocol Buffers version 2, a rewrite of Protobuf which was the version that Google released open source [16]).

Zero-copy encoding.

Resiliant to malicious input [17].

Supports random access reads without parsing [18].

New fields ignored by old binaries. Unknown fields retained. [19]

"Iirc Cap'n Proto was the only protocol which is intended for mutually distrusting client and server, and considers the RPC mechanism fairly central to it's value statement. I think the Flatbuffers equivalent to the latter is gRPC, but I don't think it or SBE have an answer to the former. That said I assume any implementation of Flatbuffers or SBE could be checking for malicious/malformed input." -- https://www.reddit.com/r/rust/comments/daja9b/binary_format_shootout_capn_protoflatbuffers_and/

"Cap'n proto is amazing, but sadly does not get nearly the support that protobuf/grpc does. The crate is updated less, and, more importantly, generates far more complex code - working with capnproto in rust, today, is extremely unpleasant with not a lot of examples or docs out there." -- [20]

"To avoid memory allocation with each iteration, you should pass a scratch buffer to MallocMessageBuilder?'s constructor. The scratch buffer can be allocated once outside the loop, but you need to create a new MallocMessageBuilder? each time around the loop." -- https://stackoverflow.com/a/61370743

"To serialize new messages, Cap’n Proto uses a “builder” object. This builder allocates memory on the heap to hold the message content, but because builders can’t be re-used, we have to allocate a new buffer for every single message. I was able to work around this with a special builder that could re-use the buffer, but it required reading through Cap’n Proto’s benchmarks to find an example, and used std::mem::transmute to bypass Rust’s borrow checker." -- [21]

Links:

FlatBuffers

https://google.github.io/flatbuffers/index.html https://google.github.io/flatbuffers/flatbuffers_guide_tutorial.html

Zero-copy encoding.

New fields ignored by old binaries. Unknown fields cannot be copied. [22]

Supports random access reads without parsing [23].

"Protocol Buffers is indeed relatively similar to FlatBuffers?, with the primary difference being that FlatBuffers? does not need a parsing/ unpacking step to a secondary representation before you can access data, often coupled with per-object memory allocation." -- [24]

" FlatBuffers?, like Protobuf, has the ability to leave out arbitrary fields from a table (better yet, it will automatically leave them out if they happen to have the default value). Experience with ProtoBuf? has shown me that as data evolves, cases where lots of field get added/removed, or where a lot of fields are essentially empty are extremely common, making this a vital feature for efficiency, and for allowing people to add fields without instantly bloating things.

Cap'n'Proto doesn't have this functionality, and instead just writes zeroes for these fields, taking up space. They recommend using an additional zero-compression on top of the resulting buffer, but this kills the ability to read data without first unpacking it, which is a big selling point of both FlatBuffers? and Cap'n'Proto. So Cap'n'Proto is either not truly "zero-parse", or it is rather inefficient/inflexible, take your pick. " -- [25]

"A major caveat of flatbuffers is that it stables a large vtable onto the front of every message, which can have unpleasant effects on size if your messages are small. It's not really intended for networking, but rather for large data files that repeat with many repeated records in them that can share a single vtable." -- https://www.reddit.com/r/rust/comments/daja9b/binary_format_shootout_capn_protoflatbuffers_and/f22p0e8/

"Simply specify the data encoding you are sending with a ‘Content-Type: x-circonus-<datatype>-flatbuffer’ header and the remote end can even avoid copying or even parsing any memory; it just accesses it. The integration into our C code is very macro-oriented and simple to understand." -- https://www.circonus.com/2017/11/some-like-it-flat/

Q: "What do you think is the best buffer protocol to use for multiplayer games? We used Protobuf for a fast-paced .io game, but encoding-decoding turned out to be pretty slow and generated a lot of garbage in JS. We were in the process to switch to FlatBuffers? (before the company went bankrupt), but the syntax made it feel harder to use compared to Protobuf, not sure about the performance though (we expected it to be faster and less garbage created because of the zero-copy).

So, would you recommend Protobuf, Cap'n'Proto, FlatBuffers? or FlexBuffers? for multiplayer games? The usual packets are game states or user input sent at high frequency. "

A: "I originally designed FlatBuffers? for games (though admittedly more for things like level data or save game data than network packets), so I'd think it is pretty suitable. I had actually used Protobuf on a game project just before, and its performance problems led directly to the no-unpacking no-allocation design that FlatBuffers? has.

So FlatBuffers? will make an incoming packet waaay faster to work with than Protobuf. On the downside, Protobuf tends to be a little smaller, so if bandwidth is a greater concern than (de-)serialization speed, you might still prefer it. Additionally, receiving data over the network raises the question of how you handle packets that have been corrupted (or intentionally malformatted by an attacker), and in the case of FlatBuffers? you'd need to at least run the "verifier" over the packet before accessing it if you don't want your game servers to crash when this happens. That slows it down a little bit, but is still fast, i.e. still doesn't allocate etc.

Cap'n Proto will perform similarly, though does have the downside that all fields take space on the wire, regardless of whether they're set or not. So which is better depends on the kind of data you want to send and how it is likely to evolve.

Frankly, for the absolute highest performance (and lowest bandwidth) game networking you still need a custom encoding. " -- https://news.ycombinator.com/item?id=23591749

Flatbuffers claims to have a simpler encoding than CapnProto? according to [26]

Links:

FlexBuffers

https://google.github.io/flatbuffers/flexbuffers.html

Part of Flatbuffers; schema-less.

"A type byte is made up of 2 components (see flexbuffers.h for exact values):

Thus, in this example 4 means 8 bit child (value 0, unused, since the value is in-line), type SL_INT (value 1). " -- https://google.github.io/flatbuffers/flatbuffers_internals.html

SBE

Simple Binary Encoding. Focus on low-latency.

Zero-copy encoding.

https://github.com/real-logic/simple-binary-encoding/wiki/Design-Principles

specifying message formats:

" SBE (Simple Binary Encoding) was designed by a former Protobuf developer for the Chicago Market Exchange group to be used for low-latency trading; its deserialization performance is therefore very much a central concern, even in the presence of forward/backward compatibility.

There's a fixed cost penalty in SBE to handle forward/backward compatibility. The format is simple:

    Messages have a dynamic size and contain first Fields and then lists of Groups.
    Groups have a dynamic size and contain Fields.
    Fields have a fixed size.

A new Field can be added at the end of the Fields of a Message or Group. A new list of Groups can be added at the end of a Message.

Detecting the presence (or absence) of a given Field in a Message or Group is done by comparing its fixed offset + fixed size to the dynamic size of the Message or Group: if it is less-than-or-equal, it is present, otherwise it is absent.

For the set of features supports (which notably excludes arbitrary nesting), SBE is the fastest protocol I've ever implemented.

If you need arbitrary nesting, however... well, you won't be able to use SBE. " -- [27]

"...my intuition is that SBE will probably edge Cap’n Proto and FlatBuffers? on performance in the average case, due to its decision to forgo support for random access" -- [28]

"Simple Binary Encoding has the simplest encoding..." between Cap’n Proto, FlatBuffers?, SBE -- [29]

"Both Cap’n Proto and Flatbuffers use message offsets to handle variable-length data, unions, and various other features. In contrast, messages in SBE are essentially just structs; variable-length data is supported, but there’s no union type." -- https://speice.io/2019/09/binary-format-shootout.html

"After a series of internal bake-offs, we found that Simple Binary Encoding (SBE) performed comparably or better than other well-known formats (like Cap’n Proto and FlatBuffers?) in terms of encode and decode times. In particular, SBE stood out for its low timing variability." -- https://web.archive.org/web/20190427124806/https://polysync.io/blog/session-types-for-hearty-codecs/

Links:

Transit

Core types: strings, booleans, integers (to 64 bits w/o truncation), floats, nil/null, bytearray, arrays, maps (with arbitrary scalar keys, not just strings).

Extension types: timestamps, UUIDs, URIs, arbitrary precision integers and decimals, symbols, keywords, characters, quoted values, sets, lists, arrays, hypermedia links, maps with composite keys. Also, custom extension types.

No reference types, nor identity. Not "a system for representing object graphs" [30].

An encoding on top of JSON or MessagePack?.

Links:

YAML

Not Turing-complete

Popular as a configuration language.

YAML can encode cyclic data structures [31].

Indentation is significant.

"(YAML) goes to great lengths to provide human-friendly features, trading off computer-friendliness to an fairly extreme extent. Eliminating 'plain' scalars (unquoted strings-as-values), folded multiline literals, tags, anchors/aliases, and possibly directives, as a sort of reduced yaml would make the language a lot less silly for the kinds of things a lot of people end up using it for. " [32]

"is still the best 'JSON with better layout' out there, which is what we want when we want readable documents and maintainable code. EDN, Transit and TOML all fail in that somewhere, IMO, and of course popularity is actually a hugely important feature for any data interchange format." -- [33]

YAML is supposed to be a superset of JSON: "YAML can therefore be viewed as a natural superset of JSON, offering improved human readability and a more complete information model. This is also the case in practice; every JSON file is also a valid YAML file. This makes it easy to migrate from JSON to YAML if/when the additional features are required." -- YAML: Relation to JSON. However, it misses the mark a little: "Please note that YAML has hardcoded limits on (simple) object key lengths that JSON doesn't have and also has different and incompatible unicode character escape syntax... YAML also does not allow \/ sequences in strings" -- [34] via [35]

Opinions/gotchas:

Retrospectives:

StrictYAML

Simpler than YAML:

https://github.com/crdoconnor/strictyaml/blob/master/FAQ.rst

Removes:

Opinions:

TOML

https://github.com/toml-lang/toml

"essentially an extended version of INI which allows the expression of both hierarchical and typed data." [41]

Discussion:

HCL

https://github.com/hashicorp/hcl2

UCL

https://github.com/vstakhov/libucl

Opinions:

" I wish more things would adopt UCL for configuration. Like YAML, it is a representation of the JSON object model but it also has a number of features that make it more useful as a configuration language: Macros. Include files. Explicit merging rules for includes (replace objects, add properties to objects). Cryptographic signing of includes, so you can use semi-trusted transports for them. Syntactic sugar for units " -- [44]

UCG

https://ucg.marzhillstudios.com/

SAN

https://news.ycombinator.com/item?id=18023105

S-expressions (sexps)

Not Turing-complete (note: Lisp uses S-expressions but is built on top of them)

Represents trees.

"Canonical" S-expressions (csexps)

example:

(4:this22:Canonical S-expression3:has1:55:atoms)

"a binary encoding form of a subset of general S-expression (or sexp)...The particular subset of general S-expressions applicable here is composed of atoms, which are byte strings, and parentheses used to delimit lists or sub-lists. These S-expressions are fully recursive. ... While S-expressions are typically encoded as text, with spaces delimiting atoms and quotation marks used to surround atoms that contain spaces, when using the canonical encoding each atom is encoded as a length-prefixed byte string. No whitespace separating adjacent elements in a list is permitted. The length of an atom is expressed as an ASCII decimal number followed by a ":". ... A csexp includes a non-S-expression construct for indicating the encoding of a string, when that encoding is not obvious. Any atom in csexp can be prefixed by a single atom in square brackets – such as "[4:JPEG]" or "[24:text/plain;charset=utf-8]". " [46]

pros:

Links:

Comparisons

" edn is the best choice for human-readable data. It is however, less efficient to transmit and depends on writing a high-performance parser - this is a high bar in some language environments. edn is most attractive right now to Clojure users b/c of its proximity to Clojure itself. While it has many advantages as an extensible readable literal data format, it's an uphill battle to sell that against other data formats that already have greater mindshare and tooling in other language communities.

fressian is the highest performance option - it takes full advantage of a number of compression tricks and has support for arbitrary user-extensible caching. Again, it requires a fair amount of effort to write a fressian library though so it's probably best for JVM-oriented endpoints right now. By seeking greatest performance, fressian also makes tradeoffs that narrow its range of use and interest group.

transit is a pragmatic midpoint between these two. It focuses, like fressian, on program-to-program data transmission however, the data can be made readable (like edn) via the json-verbose mode. Like fressian, transit contains caching capabilities but they are more limited and not user-extensible. transit is designed primarily to have the most high-quality implementations per lowest effort - effectively shooting for greater reach than either edn or fressian by lower the bar to implementation. The bar is lowered by reusing high-performance parser for either JSON or messagepack which exist in a large number of languages. Of particular importance is leveraging the very high performance JSON parsers available in JavaScript? runtimes, making transit viable as a browser-side endpoint for a fraction of the effort required to write a high performance edn or fressian endpoint. As transit explicitly seeks reach and portability, it is naturally the format with the broadest potential usage. "

-- https://groups.google.com/forum/#!topic/clojure/9ESqyT6G5nU

"Transit sounds like an evolution of EDN and Fressian: make the bottom layer pluggable to support human-readable/browser-friendly JSON or use the well-established msgpack for compactness. Caching is still there, but it can only be used for keywords/strings/symbols/etc. instead of arbitrary values like Fressian -- probably a good trade-off for simplicity." [52]

Moon

https://github.com/jordanorelli/moon

BayesDB

links:

Alan's data modeling language

See https://alan-platform.com/pages/tuts/getting-started.html

Dhall

"A configuration language guaranteed to terminate"

https://github.com/dhall-lang/dhall-lang

Opinions:

Dhall links

Nickel

https://www.tweag.io/blog/2022-03-11-nickel-first-release/

https://github.com/tweag/nickel/

https://github.com/tweag/nickel/#related-projects-and-inspirations

https://github.com/tweag/nickel/blob/master/RATIONALE.md

https://lobste.rs/s/hskr5v/first_release_nickel

Opinions:

Typedefs

https://typedefs.com/

"programming language agnostic, algebraic data type definition language"

https://github.com/typedefs/typedefs/blob/master/TUTORIAL_UNDERSTANDING_TYPEDEFS.md

Cue

https://cuelang.org/

A scheme language. Works with YAML and JSON.

Discussion/opinions:

Lua

Lua is not a data language but can be used as one:

Opinions:

Recfile

blog entry: https://labs.tomasino.org/gnu-recutils/

Jsonnet

https://jsonnet.org/

Discussion/opinions:

Starlark

Configuration language in Bazel.

Apache Arrow

https://arrow.apache.org/

"specifies a standardized language-independent columnar memory format for flat and hierarchical data,...also provides computational libraries and zero-copy streaming messaging and interprocess communication...the de-facto standard for columnar in-memory analytics...backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm"

" Arrow is based on Flatbuffers, Parquet is based on Thrift. " -- https://news.ycombinator.com/item?id=23594874

Ion

https://amzn.github.io/ion-docs/

Opinions:

Pup

https://github.com/liljencrantz/crush/blob/master/src/crush.proto

(see also https://github.com/liljencrantz/crush )

Oheap

http://www.oilshell.org/blog/2017/01/09.html http://www.oilshell.org/blog/2018/12/16.html#toc_3

ASDL

A language for describing ASTs. Software is available to autogenerate code that implements ASTs described by ASDL.

Used in Python and in Oil shell.

http://www.oilshell.org/blog/2016/12/11.html

http://www.oilshell.org/blog/2016/12/16.html

https://www.cs.princeton.edu/research/techreps/TR-554-97

https://raw.githubusercontent.com/eliben/asdl_parser/master/docs/ASDL%20Zephyr%20-%20Wang.pdf

http://eli.thegreenplace.net/2014/06/04/using-asdl-to-describe-asts-in-compilers

https://github.com/eliben/asdl_parser

http://asdl.sourceforge.net/

" Analogies for ASDL 2016-12-16

If you haven't used Google's protocol buffer serialization technology, this analogy may be helpful:

JavaScript? Data Model : JSON :: C Data Model : Protocol Buffers

Just as JSON is a language-independent serialization format extracted from JavaScript?'s data model (objects, heterogeneous arrays, strings, numbers, booleans), protocol buffers are a mostly language-independent serialization format extracted from C's data model:

    structs (messages)
    homogeneous arrays (repeated fields)
    strings
    enums
    double and float
    unsigned and signed integers of various widths.

A similar analogy explains Zephyr ASDL, which I explained from a few other angles in the last post:

C data model : Protocol Buffers :: ML data model : ASDL

ML is the language that introduced algebraic data types or ADTs. ADTs are a characteristic feature of strongly-typed functional languages like Standard ML, OCaml, and Haskell.

ASDL, like protocol buffers, is a domain-specific language that describes a language-independent serialization format for a particular data model -- in this case, the ML data model. It has the following constructs:

    Product types, aka records, representable by structs in C and C++.
    Sum types, aka variants, representable by tagged unions in C or subclasses in C++.
    Optional fields, representable by a pointer that may be null. In ML-like languages, they're the Option type.
    Repeated fields, representable by arrays in C++. In ML-like languages, they're lists.
    Strings.
    Integers of unspecified width.

" -- http://www.oilshell.org/blog/2016/12/16.html

" ASTs (Abstract Syntax Trees) are an important data structure in compiler front-ends. If you've written a few parsers, you almost definitely ran into the need to describe the result of the parsing in terms of an AST. While the kinds of nodes such ASTs have and their structure is very specific to the source language, many commonalities come up. In other words, coding "yet another AST" gets really old after you've done it a few times.

Worry not, as you'd expect from the programmer crowd, this problem was "solved" by adding another level of abstraction. Yes, an abstraction over Abstract Syntax Trees, oh my! The abstraction here is some textual format (let's call it a DSL to sound smart) that describes what the AST looks like, along with machinery to auto-generate the code that implements this AST.

Most solutions in this domain are ad-hoc, but one that I've seen used more than once is ASDL - Abstract Syntax Definition Language. The self-description from the website sounds about right:

    The Zephyr Abstract Syntax Description Lanuguage (ASDL) is a language designed to describe the tree-like data structures in compilers. Its main goal is to provide a method for compiler components written in different languages to interoperate. ASDL makes it easier for applications written in a variety of programming languages to communicate complex recursive data structures.
    "
     -- https://eli.thegreenplace.net/2014/06/04/using-asdl-to-describe-asts-in-compilers

Preserves

https://preserves.gitlab.io/preserves/preserves.html

typed-wire

https://github.com/typed-wire/typed-wire

ATD

https://github.com/ahrefs/atd https://atd.readthedocs.io/en/latest/

"Adaptable Type Definitions" -- "Static Types for Json APIs"

Spot

https://github.com/airtasker/spot

"Spot ("Single Point Of Truth") is a concise, developer-friendly way to describe your API contract."

"Leveraging the TypeScript? syntax, it lets you describe your API and generate other API contract formats you need (OpenAPI?, Swagger, JSON Schema)."

Opinions:

OpenAPI / Swagger

OpenAPI? (version 2 was formerly known as Swagger)

Relational pipes

https://relational-pipes.globalcode.info/v_0/

FIDL

https://fuchsia.dev/fuchsia-src/concepts/fidl/overview

Arrow

https://arrow.apache.org/

Opinions:

SimpleSerialize (SSZ)

https://github.com/ethereum/eth2.0-specs/blob/dev/ssz/simple-serialize.md https://github.com/ethereum/eth2.0-specs/blob/a63de3dc374148fe8adacd8718f67f8c7ba54f2e/specs/simple-serialize.md https://rauljordan.com/2019/07/02/go-lessons-from-writing-a-serialization-library-for-ethereum.html

RLP

https://eth.wiki/en/fundamentals/rlp https://github.com/ethereum/wiki/wiki/RLP

https://ethresear.ch/t/replacing-ssz-with-rlp-zip-and-sha256/5706

Kaitai

https://kaitai.io/ https://lobste.rs/s/pnfkzp/kaitai_struct_declarative_binary_format

NestedText

https://nestedtext.org/

Only strings, lists, dicts are supported.

Comparison to JSON, YAML, TOML, INI, CSV, TSV: https://nestedtext.org/en/latest/alternatives.html#yaml

Opinions:

ZSON / Zed

https://zed.brimdata.io/docs/formats/zson/

Data language concepts, patterns, and practices

Data language links

Schema exchanges

Concise-encoding

https://concise-encoding.org/

discussion:

Server-Sent Events

" The protocol is very simple. It uses the text/event-stream Content-Type and messages of the form:

data: First message

event: join data: Second message. It has two data: lines, a custom event type and an id. id: 5

comment. Can be used as keep-alive

data: Third message. I do not have more data. data: Please retry later. retry: 10

Each event is separated by two empty lines (\n) and consists of various optional fields.

The data field, which can be repeted to denote multiple lines in the message, is unsurprisingly used for the content of the event.

The event field allows to specify custom event types, which as we will show in the next section, can be used to fire different event handlers on the client.

The other two fields, id and retry, are used to configure the behaviour of the automatic reconnection mechanism. This is one of the most interesting features of Server-Sent Events. It ensures that when the connection is dropped or closed by the server, the client will automatically try to reconnect, without any user intervention.

The retry field is used to specify the minimum amount of time, in seconds, to wait before trying to reconnect. It can also be sent by a server, immediately before closing the client’s connection, to reduce its load when too many clients are connected.

The id field associates an identifier with the current event. When reconnecting the client will transmit to the server the last seen id, using the Last-Event-ID HTTP header. This allows the stream to be resumed from the correct point.

Finally, the server can stop the automatic reconnection mechanism altogether by returning an HTTP 204 No Content response. " -- [66]

FIDL (Fuchsia Interface Definition Language)

"the language used to describe interprocess communication (IPC) protocols used by programs running on Fuchsia"

https://fuchsia.dev/fuchsia-src/concepts/fidl/overview

Discussions:

Concise Encoding

https://concise-encoding.org/

discussion:

Hay

https://www.oilshell.org/release/0.11.0/doc/hay.html

(part of the Oil shell project)

vs. TOML and Cue: " TOML is a data-only language. There are no functions / loops / conditionals. That is totally fine, but the minute you want to start “templating” it (like YAML/Go templates), I would say that is a smell. I mention in the doc that Hay is for the cases where you outgrow “plain old data” (which IMO happens to every system when it gets big enough). Cue is one of the more interesting config languages (i.e. it is NOT the “JSON with lambda/map/filter” design I dislike). As far as I understand, it does validation with a logic programming model. I think this could be useful for some things, but I do think “regular Python-like code” is more general – I feel like you will have to mix Cue with something else for most apps ? But I’d definitely like to hear from people who have success with Cue. " -- [67]

3D (Dependent Data Descriptions)

https://www.fstar-lang.org/papers/EverParse3D.pdf

Preserves

https://preserves.dev/

Protobuf ASCII

https://rachelbythebay.com/w/2023/10/05/config/

Opinions:

misc notes

"From the Java world comes Thrift, its successor Avro, and MessagePack?. From Python we have pickle, which somehow has escaped the Python world to inflict harm upon others. From the C and C++ world we have Cap’n Proto, Flatbuffers, and perhaps the most popular, Google Protobuf (the heart of the widely adopted gRPC protocol). Now, these serialization libraries might have come from one language world, but they’d be useless without bindings in basically every other language… which they do generally boast, with the exception of pickle.

It should be noted that not all of these serialization libraries stop at serialization. Some bring protocol specification (RPC definition and calling convention) into scope. Notwithstanding that this can be useful, the fact that they are conflated within a single implementation is a tragedy. " -- https://www.circonus.com/2017/11/some-like-it-flat/

" ...for config languages...If we're limiting ourselves to just JSON, INI, XML and YAML as potential choices, I get why people cling onto one of these suboptimal choices and then fiercely defend it, but there are other options. There's libconfig, JSON5, Dhall, various interpreted languages... " -- https://news.ycombinator.com/item?id=37594549

"It's a noble goal to want a simple configuration format, toml is far from the simplest, line separated options format is simpler. The fact that it needs a parser indicates that it creates a bigger parsing problem than necessary." -- https://news.ycombinator.com/item?id=37597502

"...for configuration files...dhall" -- https://news.ycombinator.com/item?id=37598897