proj-oot-ootDataNotes4

Difference between revision 57 and current revision

No diff available.

See also [[proj-oot-plChDataLangs?]].

---

'pattern matching' is a key operation here; this is a SWITCH statement on the form of the data ('form' meaning which is the outermost constructor; this is then extended to be a nested top-down pattern from the outermost constructor to some ways in), combined with a destructuring bind in each case.

in fact, we should generalize pattern matching to more general 'patterns'. First, we should replace the design where a 'pattern' consists of nested constructor applications with a design where it consists of nested applications of (classes of constructors); this is like generalizing types to typeclasses. Note that there's no need for the constructors in a class to be from the same (Haskell-style) 'type'; this could come in handy for sum types in which you want to do the same thing when two different constructors within two different types within the sum type are encountered. Also, that should be recursive in the sense that a 'class of constructor' may be another pattern. Second, we should make it more like graph regexs. These two steps can be thought of in analog to regexs (and in precise analogy to graph regexs): the first step, the movement from constructors, to enumerated classes of constructors, to recursively defined patterns, is like moving from regex matching on single characters (explicitly given), to regex matching on character classes, to parenthesized expressions within a regex; and the second step, the movement to graph regexs, is like the addition of ? and * to the language of regular expressions.

---

Now there are languages like Haskell (and Coq, i think?) that put pattern matching in core languages like F-lite. Conditionals are done via 'switches' (or 'case's as they call them, or implicit switches via guaraded/partial pattern-matching function definitions), and data composition is done by pattern matching. Haskell says it's a 'graph reduction' (graph redex) language, and f-lite is compilable to Reduceron 'template code', and by the name Reduceron, and its parallel-ness, you can tell that they might of it in terms of graph reduction. If you think about it, 'pattern matching' can be envisioned in terms of graph reduction, which can be seen by thinking about the Reduceron operational semantics:

An expression in the program is a node in the graph. When an expression refers to another expression using a variable, this is an edge in the graph. As Haskell begins to execute the program, at first these nodes are just lazy thunks containing expressions to possibly be evaluated, but then as execution continues, the runtime determines that it has to actually evaluate some of these expressions. So, we visit the node corresponding to the expression, and we 'reduce' that expression, which involves traversing the local edges to other nodes (and possibly creating new intermediate nodes as we partially apply stuff, i guess). Then, because Haskell probably wants to cache/memoize already reduced expressions, we write the reduced version of the expression back to the node of the graph that corresponds to that expression. So if you had a graph visualization of this, you would see the CPU focus it's attention on various graph nodes, then move its attention to their children, possibly creating new nodes for partially applied functions, and eventually reach primitive 'leaf nodes', substitute them into the expression in their parents that is was trying to evaluate, and simplify that subset of the graph by removing the temporary partially applied nodes that are no longer needed because they have already computed a fully applied version of them and the requesting parent node only needs that fully applied one so now there are no more edges to the partially applied one.

Pattern matching refers to checking the pattern of the descendent nodes at the other side of an edge. Each node is labeled with its type, and if a constructor, with which constructor it is, and the 'pattern' just refers to these labels, except it can also go more than one hop and include the labels of further nested descendents.

So this seems great, and i bet it is great for pure expressions, but what about side effects? Now ordering matters. You can use monads to guarantee an order of evaluation, i guess. But it seems to me that you may also want to just print out debugging and logging stuff, store profiling information, and cache stuff, as nodes are evaluated, without otherwise messing up the program. I can't decide if the monads are a dirty hack, and you should confine this graph redex stuff to pure expressions and just do sequential imperative programming for interactivity, or if monads are just a beautiful way to do interactivity in a graph redex system (leaning towards the former; graph redex for expression evaluation, imperative sequence for interactivity). I do think that you should have something like Oot Statemasks to allow you to have some 'inconsequential' (literally; the constraint is that they don't change the result of the expression, except for weird special cases like disk full exceptions during logging) side-effects even within the graph redex.

---

as a control primitive, pattern matching on ADTs treats the nested query on the form of the constructor as a shortcut for a bunch of nested if-thens.

in this way, since ADTs and pattern matching are generally seen together, ADTs are closely related to a control structure, too.

---

as a data primitive, contrast ADTs to dicts (hashtables; associative arrays) and multidimensional arrays. ADTs can do structs pretty easily (the different fields of the struct are the different arguments to a constructor), but associative arrays seem a bit more expressive than that, and probably multidimensional arrays too. Of course you can build these structures out of ADTs but i bet it's cumbersome to operate on them that way.

---

does that 'uniform' requirement on pattern matching in f-lite prohibit having more than one 'case' that might match any given thing? Because this is still order-independent as long as you have a rule that 'the most specific one matches'. But you can't just look at them one-by-one, so maybe that's what they want to rule out with the 'uniform' requirement.

---

" The operations of an abstract type are classified as follows: · Constructors create new objects of the type. A constructor may take an object as an argument, but not an object of the type being constructed. · Producers create new objects from old objects; the terms are synonymous. The concat method of String, for example, is a producer: it takes two strings and produces a new one representing their concatenation. · Mutators change objects. The addElement method of Vector , for example, mutates a vector by adding an element to its high end. · Observers take objects of the abstract type and return objects of a different type. The size method of Vector , for example, returns an integer. " [1]

---

in Haskell etc, pattern matching against constructors is against (the set of all) fixed length tuples

how to extend that to matching patterns on (non-fixed-length) graphs? maybe graph regex?

---

stuff in Python that is like just a struct, a class with no methods:

1) class C: def __init__(self, a): self.a = a

x = C(1)

2)

x = type('C', (object,), dict(a=1))()

[2]

3) import types; x = types.SimpleNamespace?(a=1)

4) import collections; C = collections.namedtuple('C', ['a']); x = C(a=1)

5) import attr

@attr.s class C(object): a = attr.ib()

x = C(1)

https://glyph.twistedmatrix.com/2016/08/attrs.html makes a case that the 'attr' library is the best out of these. In short, (1) and (2) have no __repr__ or __eq__, and (4) (i) has fields accessible as numbered indices, too, (b) compares equal to raw tuples of the same values, (c) is (mostly?) immutable. E doesn't treat SimpleNamespace?, which seems to cover the basics. attrs does have various more advanced features than SimpleNamespace?, though, including __lt__(), asdict(), validators, optional closedness via __slots__ (no additional attributes added later).

---

_asummers 4 days ago [-]

Const does not mean immutability, only immutable references to the outermost pointer. It is equivalent to final in Java. While that solves the issue with numbers changing state, it does not help objects e.g. For that you need something like immutable.js from Facebook.

reply

---

"Canonical" S-expressions (csexps)

example:

(4:this22:Canonical S-expression3:has1:55:atoms)

"a binary encoding form of a subset of general S-expression (or sexp)...The particular subset of general S-expressions applicable here is composed of atoms, which are byte strings, and parentheses used to delimit lists or sub-lists. These S-expressions are fully recursive. ... While S-expressions are typically encoded as text, with spaces delimiting atoms and quotation marks used to surround atoms that contain spaces, when using the canonical encoding each atom is encoded as a length-prefixed byte string. No whitespace separating adjacent elements in a list is permitted. The length of an atom is expressed as an ASCII decimal number followed by a ":". ... A csexp includes a non-S-expression construct for indicating the encoding of a string, when that encoding is not obvious. Any atom in csexp can be prefixed by a single atom in square brackets – such as "[4:JPEG]" or "[24:text/plain;charset=utf-8]". " [3]

pros:

Links:

---

these XML features are cool:

"The first atom in a csexp list, by convention roughly corresponds to an XML element type name in identifying the "type" of the list. "

also http://json-ld.org/ looks cool

--- i skimmed:

my summary:

CapnProto? is sort-of a successor to protobuf (the guy who made it was a principle designer of protobuf). Unlike Protobuf, it is big into 'zero-copy', meaning that instead of parsing an incoming message, it just keeps the bytes around and provides accessors functions to use it.

Protobuf is supported by Google and is cross-platform. CapnProto? is made by a few guys at a startup and doesn't support Windows very well yet. One commentor found that Flatbuffer's Java implementation was 'more mature' than CaptnProto?. The CapnProto? guy thinks Thrift is ~categorically worse than Protobuf.

Other 'zero-copy' guys include flatbuffers and SBE. I have heard of flatbuffers.

---

I think for Oot serialization and interop, if possibly, we should choose one of these sorts of guys and use it rather than coming up with our own JSON-like format and/or our own pickling format.

a list of contenders is in plChData. These are not all the same type of thing, but whatever, i'm putting them in the same list. Here they are again with a short summary:

Not in the running:

hmm.. a lot of these aren't great.. i guess what i want is a 'JSON+'. Plus dates, plus references.

EDN/Transit seem to provide this, except for references. Transit has a built-in efficient MessagePack? encoding and even a JSON encoding, so that sounds good. TOML looks cool. YAML is popular but the spec is too complex, and i don't like significant indentation. StrictYAML? still has significant indentation. Should check out HJSON, JSON5, SDLang. CSEXPs are a great lower layer but they don't even have a set of types so they don't solve the problem.

So far i guess Transit seems like the best. Having a JSON and a MessagePack? encoding is a pretty big advantage. And, being a Clojure thing designed for language interoperability, instead of for RPC protocols, i bet it'll fit my use case better than CapnProto?.

---

other random schema systems:

nitrogen 781 days ago [-]

If anyone finds JSON Schema in Ruby to be too slow, I developed a Ruby-based schema system that is much faster:

http://rubygems.org/gems/classy_hash

https://github.com/deseretbook/classy_hash

I wrote it for an internal backend system at a small ecommerce site with a large retail legacy.

Edit: Ruby Hashes (the base "language" used by Classy Hash) aren't easily serialized and shared, but if there's enough interest, it would be possible to compile most JSON Schema schemas to Classy Hash schemas.

alexatkeplar 781 days ago [-]

Have you looked contracts.ruby (https://github.com/egonSchiele/contracts.ruby)? I'm sure you could overlap some code

nitrogen 781 days ago [-]

Interesting. It looks like contracts.ruby does for method calls what Classy Hash aims to do for API data.

---

in Python, you can pack together a bunch of reused optional arguments into a namedtuple and pass that along instead, but then it's cumbersome to construct one of these while inheriting the defaults, eg:

    disassemble_raw_bytecode_file(infile,outfile, dis_opts=dis_defs._replace(allow_unknown_opcodes=args.allow_unknown_opcodes))

it would be better to have a namedtuple with defaults.

---

Typescript has some neat syntax for destructuring, merging, and choosing fields of objects, called 'object spreads' and 'object rests':

" Object Rest & Spread

We’ve been excited to deliver object rest & spread since its original proposal, and today it’s here in TypeScript? 2.1. Object rest & spread is a new proposal for ES2017 that makes it much easier to partially copy, merge, and pick apart objects. The feature is already used quite a bit when using libraries like Redux.

With object spreads, making a shallow copy of an object has never been easier:

let copy = { ...original };

Similarly, we can merge several different objects so that in the following example, merged will have properties from foo, bar, and baz.

let merged = { ...foo, ...bar, ...baz };

We can even add new properties in the process:

let nowYoureHavingTooMuchFun = { hello: 100, ...foo, world: 200, ...bar, }

Keep in mind that when using object spread operators, any properties in later spreads “win out” over previously created properties. So in our last example, if bar had a property named world, then bar.world would have been used instead of the one we explicitly wrote out.

Object rests are the dual of object spreads, in that they can extract any extra properties that don’t get picked up when destructuring an element:

let { a, b, c, ...defghijklmnopqrstuvwxyz } = alphabet; "

---

Erwin 1 day ago [-]

This might benefit from SQLite's Virtual tables: https://sqlite.org/vtab.html

With Virtual Tables you can expose any data source as a SQLite table -- then you can use every SQL feature that sqlite offers. You can just tell sqlite how to iterate through your data with a few functions, with an option to push down filtering information for efficiency.

You can also create your own aggregates, functions etc.

Here's an article where the author exposes redis as a table within sqlite: http://charlesleifer.com/blog/extending-sqlite-with-python/

reply

erydo 1 day ago [-]

My thoughts went straight to PostgreSQL? Foreign Data Wrappers. Something like that would be really helpful!

reply

---

 bipvanwinkle 9 days ago [-]

Out of curiosity what progress has been made in regards to improving the ergonomics of records in Haskell? Stephen references that an answer is in the works, but it looks like it has stalled out.

reply

harpocrates 9 days ago [-]

Actually, a lot has been done and a lot is coming in the near future. GHC 8.0 brought us `DuplicateRecordFields?`, so that we can finally use the same field name for two records.

There is active work done by Adam Gundry to extend this even further [1]. The key part of this is that there will be a new type class so that I can express as a constraint that a type must have a field with a certain name and with a certain type.

Further in the future, but still actively discussed is using overloaded labels as lenses [2]. Past that, I can't imagine anything else I would want records to do.

[1] https://github.com/adamgundry/ghc-proposals/blob/overloaded-... [2] http://stackoverflow.com/questions/38136144/replace-record-p...

reply

wyager 9 days ago [-]

Most people use Lenses for heavily record-oriented programming. They work quite well. They are less convenient than built-in structural syntax like in Javascript, but once you get past the initial inconvenience they are vastly more powerful.

reply

axman6 9 days ago [-]

On the contrary, lenses are far more powerful than what's available in JavaScript? and all other OO languages. Traversals and Prisms give so much power that's lacking in OO

reply

dllthomas 9 days ago [-]

I don't see how that's contrary to what the parent said.

reply

---

" Specifically the immutability providing an easy way to reason about functions via only inputs and outputs.

I feel the the memory model of Rust(single mutable ref or unlimited non-mutable refs) combined with the fact that there are no mutable globals(inside safe code) gives you a much easier system to reason about. "

---

" Many types, one interface

    One of Clojure’s core features is its generic data-manipulation API. A small set of functions can be used on all of Clojure’s built-in types. For example, the conj function (short for conjoin) adds an element to any collection, as shown in the following REPL session:
    user> (conj [1 2 3] 4)
    [1 2 3 4]
    user> (conj (list 1 2 3) 4)
    (4 1 2 3)
    user> (conj {:a 1, :b 2} [:c 3])
    {:c 3, :a 1, :b 2}
    user> (conj #{1 2 3} 4)
    #{1 2 3 4}
    Each data structure behaves slightly differently in response to the conj function (lists grow at the front, vectors grow at the end, and so on), but they all support the same API. This is a textbook example of polymorphism — many types accessed through one uniform interface.
    Polymorphism is a powerful feature and one of the foundations of modern programming languages. The Java language supports a particular kind of polymorphism called subtype polymorphism, which means that an instance of a type (class) can be accessed as if it were an instance of another type.
    In practical terms, this means that you can work with objects through a generic interface such as java.util.List without knowing or caring if an object is an ArrayList, LinkedList, Stack, Vector, or something else. The java.util.List interface defines a contract that any class claiming to implement java.util.List must fulfill."

---

" > Cloud Spanner uses a SQL dialect which matches the ANSI SQL:2011 standard with some extensions for Spanner-specific features. This is a SQL standard simpler than that used in non-distributed databases such as vanilla MySQL?, but still supports the relational model (e.g. JOINs). It includes data-definition language statements like CREATE TABLE. Spanner supports 7 data types: bool, int64, float64, string, bytes, date, timestamp[20].

> Cloud Spanner doesn't, however, support data manipulation language (DML) statements. DML includes SQL queries like INSERT and UPDATE. Instead, Spanner's interface definition includes RPCs for mutating rows given their primary key[21]. This is a bit annoying. You would expect a fully-featured SQL database to include DML statements. Even if you don't use DML in your application you'll almost certainly want them for one-off queries you run in a query console. "

---

"Son

A subset of JSON.

JSON contains lots of extraneous details like the difference between 10e2 and 10E2. This helps when writing it by hand, but can cause problems such as making it difficult to serialize and hash consistently.

Son is a subset of JSON intended to remove redundant options. "

https://github.com/seagreen/Son

---

"

onion2k 10 hours ago [-]

JSON doesn't have comments so it's a bad choice for human-editable config. YAML doesn't have an end marker so you can never be sure if you've got the entire file. XML is a huge pain to edit by hand if the schema is complicated, and overly verbose if it isn't. None of them are even close to being safe (for example https://arp242.net/weblog/yaml_probably_not_so_great_after_a...). All of those choices fail your "elegance" test.

TOML is my preferred config file language option where I have a choice - https://github.com/toml-lang/toml - but I suspect that suffers a lot of the same problems.

reply

note: https://arp242.net/weblog/yaml_probably_not_so_great_after_all.html says that YAML has an operator that runs other commands when parsed!

rendaw 8 hours ago [-]

I will capitalize on this derailment to promote luxem, my flexible and minimal JSON alternative: https://github.com/rendaw/luxem#what-is-luxem

reply

Sunset 10 hours ago [-]

Just add comments to JSON, Douglas Crockford can eat his heart out.

reply "

" vince14 8 hours ago [-]

StrictYAML? - https://github.com/crdoconnor/strictyaml

reply

marcoms 8 hours ago [-]

Still does not allow tabs for indentation - same problem as `make` but inverted

reply "

---

https://github.com/rendaw/luxem#what-is-luxem

" luxem is a specification for serializing structured data.

luxem is similar to JSON. The main differences are:

    You can specify a type using (typename) before any value. Ex: (direction) up.
    You can have a , after the final element in an object or array.
    Quotes are optional for simple strings (strings containing no spaces and no ambiguous symbols).
    The document is an array with implicit (excluded) [] delimiters.
    Comments (written as *comment text*) can be placed anywhere whitespace is.

All documents should be UTF-8 with 0x0A line endings (linux-style).

No basic types are defined in the parsing specification, but the following should be used as a guideline for minimum data type support:

    bool true|false
    int -?[0-9]+
    dec -?[0-9]+(\.[0-9]+)?
    string
    ascii16 ([a-p][a-p])*

ascii16 is a binary encoding that is both ugly and easy to parse, using the first 16 characters of the alphabet. "

---

hueving 1 day ago [-]

Can you provide a 'top 3' list of reasons to use Cap'n Proto over Protobufs?

reply

doh 1 day ago [-]

1) Cap'n Proto doesn't encode/decode messages thus it's nuch cheaper for processing and memory management

2) protobuf in the proto3 design doesn't cary default values. So if you have a bool field and want to explicitly send false, well you have to change it to some other type or use the default values all the time

3) protobuf generates incredibly large serialization/deserialization support coce for each template. For some languages like Python in can be in hundreds of kilobytes. Cap'n proto messages are significantly smaller

There is more for CnP? but Protobuf has much better support and is by default used in projects like gRPC. Also new CnP? is lacking speed in new development in comparison to Protobuf.

But I'm using in one of my side projects and I'm very happy with it

reply

StreamBright? 1 day ago [-]

Performance might be one reason to use Cap'n Proto over Protobuf.

http://dbeck.github.io/5-lessons-learnt-from-choosing-zeromq...

reply

---

in Python, it's hard to tell ahead of time if some object supports len(). There should be a method to check this dynamically.

---

yeah this is kinda weird...

---

Python's not allowing you to change 'closure' variables from a containing scope is not actually that bad b/c of the pass-by-reference stuff, eg you can change the contents of a struct even if you can't change the struct variable itself.

eg:

In [6]: def outer(): ...: a = 3 ...: def inner(): ...: print a ...: a = 4 ...: print a ...: inner() ...: print a ...:

In [7]: outer()


UnboundLocalError? Traceback (most recent call last) <ipython-input-7-8493578e1e0e> in <module>()


> 1 outer()

<ipython-input-6-6adaf1641721> in outer() 5 a = 4 6 print a


> 7 inner() 8 print a 9

<ipython-input-6-6adaf1641721> in inner() 2 a = 3 3 def inner():


> 4 print a 5 a = 4 6 print a

UnboundLocalError?: local variable 'a' referenced before assignment

In [8]: def outer(): a = 3 def inner(): print a b = 4 print a inner() print a ...:

In [9]: outer() 3 3 3

In [10]: def outer(): a = {'a':3} def inner(): print a a['a'] = 4 print a inner() print a ....:

In [11]: outer() {'a': 3} {'a': 4} {'a': 4}

---

"

---

" When I was at Google, we had a saying that "The only interesting part of MapReduce? is the phase that's not in the name: the Shuffle". [That's the phase where the outputs of the Map are sorted, written to the filesystem and eventually network, and delivered to the appropriate Reduce shard.] If you don't need a shuffle phase - either because you have no reducer, your reduce input is small enough to fit on one machine, or your reduce input comes infrequently enough that a single microservice can keep up with all the map tasks - then you don't need a MapReduce?-like framework. "

---

ToJans? 12 hours ago [-]

First:

> Elm has an incredibly powerful type system

Near the end of the article:

>Want to decode some JSON? Hard, especially if the JSON is heavily nested and it must be decoded to custom types defined in your application.

IMHO the lack of typeclasses/traits is really hurting Elm. Take haskell f.e.

  {-# LANGUAGE DeriveGeneric #-}
  
  import GHC.Generics
  
  data Person = Person {
        name :: Text
      , age  :: Int
      } deriving (Generic, Show)
  
  instance ToJSON Person
  instance FromJSON Person

While I understand Evan's aversion against complexity, it makes me a bit wary about using ElmLang? in production. I am currently using TypeScript?, but if I would need a more powerful type system, I would probably switch to Haskell/PureScript? or OCaml/BuckleScript? instead.

reply

fbonetti 4 hours ago [-]

I really wish people would stop spreading the meme that decoding JSON in Elm is "hard". Yes, Haskell allows you to automatically decode/encode datatypes, but this only works in the simplest of cases. For example, if your backend returns a JSON object with snake-cased fields, but your model has camel-cased fields, `instance ToJSON? Person` won't work; you'll have to write a custom decoder. The automatic decoders/encoders in Haskell only work if the shape of your JSON perfectly matches your record definition.

Writing decoders in Elm is not hard. It's manual. It's explicit. It forces you to specify what should happen if the JSON object has missing fields, incorrect types, or is otherwise malformed. There's a slight learning curve and it can be time consuming at first, but it guarantees that your application won't blow up at runtime because of some bad data. Because of this, JSON decoding is frankly one of my favorite parts about Elm.

Typescript, on the other hand, offers no such guarantee. If you write a function that takes an Int and you accidentally pass it a String from a JSON response, your app will blow up and there's the nothing the compiler can do to help you. Personally, I'd rather write JSON decoders than have my app blow up because of a silly mistake.

reply

Albert_Camus 3 hours ago [-]

Author here. I agree with your points, and in my article I specifically mention that there are benefits to some of the "hardness" of certain tasks in Elm (type-safety in the case of JSON decoding).

But to claim that JSON decoding in Elm is not significantly more difficult than it is in JavaScript? would be misleading. A front end developer that has only written JS will be surprised when he/she cannot just drop in the equivalent of JSON.parse() and get an Elm value out of it. I call it "hard" because there is a bit of a learning curve, and it does require some thought, and quite frankly it takes quite a bit of time if you have a large application like we do.

Moreover, I am not complaining. And I do not think people should be. As I said in the article, the tradeoff is worth it.

reply

somenewacc 2 hours ago [-]

You don't need to write the decoder boilerplate manually to get all those benefits.

For example, here's how you rename a JSON field while serializing/deserializing a data type in Rust:

https://play.rust-lang.org/?gist=1b382bc1572858841d5e392435d...

You just annotate the field with #[serde(rename = "..")]. Here is a list of such annotations

https://serde.rs/field-attrs.html

Serde is also generic in the serialization format; the example I linked uses serde_json, and was adapted from its README here https://github.com/serde-rs/json

reply

MaxGabriel? 2 hours ago [-]

Same for Haskell. This package provides common translations like snake_case to CamelCase?:

https://www.stackage.org/haddock/lts-9.0/aeson-casing-0.1.0....

Giving you automatic encoders/decoders like so:

instance ToJSON? Person where toJSON = genericToJSON $ aesonPrefix snakeCase instance FromJSON? Person where parseJSON = genericParseJSON $ aesonPrefix snakeCase

And the implementation of that package is like 4 simple lines for snake case; it's totally doable on your own for whatever you need https://github.com/AndrewRademacher/aeson-casing/blob/260d18...

I haven't had to do snake_case to CamelCase? with Aeson before, but I have dropped a prefix before, like "userName" -> "name", "userAge" -> "age", and it was pretty easy and well supported.

Also I would note that this isn't as big of a deal for Haskell and Rust, because they're primarily backend languages, so they're more often sending out JSON in whatever form they please, rather than consuming it. In my experience the main consumers (Javascript on the web, Objective-C on iOS and Java on Android) use CamelCase? anyway, so there's a natural compatibility.

reply

wraithm112 2 hours ago [-]

With respect to the snake-cased fields issue, it's actually not that hard to do that with Generic in Haskell.

  data Person = Person
       { personFirstName :: Text
       , personLastName  :: Text
       } deriving (Generic)
  instance ToJSON Person where
     toJSON = genericToJSON $ aesonPrefix snakeCase
  instance FromJSON Person where
     parseJSON = genericParseJSON $ aesonPrefix snakeCase

Which produces messages like:

  {
   "first_name": "John",
   "last_name": "Doe"
  }

reply

dmjio 11 hours ago [-]

If decoding json in Elm is considered hard, I'd recommend checking out miso (https://github.com/dmjio/miso), a Haskell re-implementation of the Elm arch. It has access to mature json libraries like aeson for that sort of thing, along with mature lens libraries for updating your model. Here's an example of decoding json with GHC.Generics using typeclasses. https://github.com/dmjio/miso/blob/master/examples/xhr/Main.hs#L130-L131

reply

enalicho 11 hours ago [-]

You don't need to switch a whole language because of JSON decoding. There are many tools that exist to aid you write JSON decoders in Elm. The language is not just about the architecture -- you can implement the architecture in any language, as Redux has proven. What people like about Elm is the compiler and design philosophy that radiates through the entire community. Switching to Haskell won't give you that, as the Haskell community has different priorities.

Here are some JSON tools for Elm:

reply

desireco42 8 hours ago [-]

Oh, this is awesome, json2elm really helps :) thanks, didn't know about it.

reply

a-saleh 10 hours ago [-]

I am kinda waiting for Purescript to mature a tiny bit more in this regard, because it seems that they have something special brewing there, with their polymorphic record type and interesting take on type-level programming.

Because this [1], even though it seems to be just a experiment so far, looks really good.

I.e: doing

type MyTestStrMap? = { a :: Int , b :: StrMap? Int }

and then just calling

let result = handleJSON """ { "a": 1, "b": {"asdf": 1, "c": 2} } """

let newResult = doSomething (result:: Either MultipleErrors? MyTestStrMap?))

is kinda all I ever wanted in these haskell inspired languages?

[1] https://github.com/justinwoo/purescript-simple-json/blob/master/test/Main.purs

reply

---

" agentm 14 hours ago [-]

Project:M36 is an implementation of the proposed design from the "Out of the Tarpit" paper."

https://github.com/agentm/project-m36

---

from [9]

Feature 5: Everything is an iterator

    In Python 3, range, zip, map, dict.values, etc. are all iterators.
    If you want a list, just wrap the result with list.
    Explicit is better than implicit.
    Harder to write code that accidentally uses too much memory, because the input was bigger than you expected.

---

from [10]

Feature 6: No more comparison of everything to everything

    It's because in Python 2, you can < compare anything to anything.
    >>> 'abc' > 123
    True
    >>> None > all
    False
    In Python 3, you can't do this:
    >>> 'one' > 2
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: unorderable types: str() > int()
    This avoids subtle bugs, e.g., from not coercing all types from int to str or visa versa.
    Especially when you use > implicitly, like with max or sorted.
    In Python 2:
    >>> sorted(['1', 2, '3'])
    [2, '1', '3']

---

from [11]:

Feature 11: Unicode and bytes

    In Python 2, str acts like bytes of data.
    There is also unicode type to represent Unicode strings.
    In Python 3, str is a string.
    bytes are bytes.
    There is no unicode. str strings are Unicode.

---

morinted 97 days ago [-]

I really like the format strings in Python 3.6: https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep49...

Seems that this set of slides (which were very informative!) is for up to 3.5

https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498

---

In [1]: L =[]

In [2]: L.append(L)

In [3]: print L [[...?]]

---

" 9. Adding new data types

There are multiple layers of complexity to adding new data types:

    Adding new metadata
    Creating dynamic dispatch rules to operator implementations in analytics
    Preserving metadata through operations

For example, a "currency" type could have a currently type a string, with the data physically represented as a float64 or decimal. So you could treat the currency computationally like its numeric representation, but then carry through the currency metadata in numeric operations.

The rules about preserving metadata may be operator-dependent, so it can get complicated.

In Arrow we have decoupled the metadata representation from the details of computation and metadata nannying. In the C++ implementation, we have been planning ahead for user-defined types, so when we are focusing more on building an analytics engine it is a goal to enable the creation of user-defined operator dispatch and metadata promotion rules. "

---

jkabrg 28 minutes ago [-]

Slightly off-topic, but maybe we need a fully standardized and unambiguous CSV dialect with its own file extension. Or maybe just use SQLite tables or Parquet?

Some things I dislike about CSV:

These use up a bit of time whenever I get a CSV from a colleague, or even when I change operating system. Sometimes I end up clobbering the file itself.

Good things: * Human readable. * Simple.

I think the addition of some rules, and a standard interpretation of them, could go some way to improving the format.

reply

---

would be nice to have something like Python's 'in', but generalizable/customizable and returns the position of the found item, so that you could have a shorthand for stuff like a case-insensitive search for match within a list/iterator:

    for i in range(len(header)):
        if re.match(r'(?i)timestamp', header[i]):
            found_idx = i
            found_type = 'timestamp'
            break

that is, it would be nicer to be able to write:

found_idx = 'timestamp' findin_caseinsensitive header

(although in Python you can already do found_idx = findin_caseinsensitive('timestamp', header); so am i just asking for custom infix here? or maybe i'm just commenting that it would be nice to have infix library fns for case insensitive search)

---

someone else's spec for tables:

https://github.com/neugram/ng/issues/1

---

 bognition 2 days ago [-]

Honestly immutable data classes that are created via builders is a must have in Java.

reply

emidln 2 days ago [-]

Make fields public final and your constructor is your "builder".

reply

positr0n 2 days ago [-]

Those two patterns aren't the same. You can pass around a half constructed builder but you can't pass around a half constructed public final object like you described.

reply

dtech 1 day ago [-]

You can in most modern or functional languages, by currying/partial evaluation/binding. It's rare for data classes.

In scala:

    case class Foo(a: Int, b: String)
    val halfBuilt: String => Foo = Foo(1, _)
    val foo = halfBuilt("foo")

reply

eropple 2 days ago [-]

Is there something that a builder gets you that Kotlin-style data classes doesn't?

reply

Matthias247 2 days ago [-]

Builders can set multiple fields of the data class at once, e.g. from a single builder parameter. They also can perform error checking whether the thing that the user actually wanted to build is consistent before returning a data class. There might be multiple fields in a class where the value of one field restricts possible values of another one.

These are already some reasons why you want users to create objects through builders instead of initializing them directly.

Another thing that builders provide is interface isolation: If your data class is only interpreted by your own libraries code you can modify it's (package-private) content as you like and require. The builder can evolve (and fill in additional fields as needed), and your code which consumes the built data class can evolve without breaking a user contract. Just the builder API needs to be stable (or backward compatible). With data classes this wouldn't work that easy, since you require the user to specify more parameters directly.

reply

cle 2 days ago [-]

One thing I love about builders, besides the sibling replies, is that builders force users to explicitly name the arguments. Passing args to a constructor/method/whatever with implicit argument ordering can be error-prone in certain situations (multiple adjacent args with the same type).

    c = Class(firstName, lastname)   // are these passed in the right order?
    c = Class(lastName=firstName, firstName=lastName) // obviously incorrect

Especially when you're using other magic like auto-generating constructors based on fields (as with Lombok)...then rearranging two String fields can subtly break things with no compiler errors!

reply

---

on an article about data classes, in a thread about Java:

icedchai 2 days ago [-]

Or just install Lombok.

reply

didibus 1 day ago [-]

Can't do java without it.

reply

bherrmann7 2 days ago [-]

Lombok @Value for the win!

reply

gravypod 2 days ago [-]

@Data seems more like what they are proposing. It comes with toString, equals, etc and has a bunch of cool features [0].

[0] - https://projectlombok.org/features/Data

reply

---

a proposal for adding data classes to Java:

http://cr.openjdk.java.net/~briangoetz/amber/datum.html

has very good detailed discussion of design choices. Some points that i thought might be useful for Oot (much of this is quoted from the article):

Perhaps surprisingly, enums delivered their syntactic and semantic benefits without requiring us to give up most other degrees of freedom that classes enjoy; Java's enums are not mere enumerations of integers, as they are in many other languages, but instead are full-fledged classes, with unconstrained state and behavior, and even subtyping (though this is constrained to interface inheritance only.)"

Because the equals() method of arrays is inherited from Object, which compares instances by identity, making defensive copies of array components in read accessors would violate the invariant that destructuring an instance of a data class and reconstructing it yields an equivalent instance -- the defensive copy and the original array will not be equal to each other. (Arrays are simply a bad fit for data classes, as they are mutable, but unlike List their equals() method is based on identity.) We'd rather not distort data classes to accomodate arrays, especially as there are ample alternatives available.

On the other hand, data classes instances have identity, which supports mutability (maybe) but also supports self-reference. Unlike value types, data class instances are entirely suited to representing self-referential graphs. "

    Non-abstract data classes are final;
    Data classes can be abstract (in which case they acquire no equals(), hashCode(), or toString() methods, and all constructors must be protected);
    Data classes can extend abstract data classes;
    No restrictions on what interfaces a data class could implement.

This allows us to declare families of algebraic data types, such as the following partial hierarchy describing an arithmetic expression:

interface Node { }

abstract __data class BinaryOpNode?(Node left, Node right) implements Node { }

__data class PlusNode?(Node left, Node right) extends BinaryOperatorNode?(left, right) { }

__data class MulNode?(Node left, Node right) extends BinaryOperatorNode?(left, right) { }

__data class IntNode?(int constant) implements Node { }

When a data class extends an abstract data class, the state description of the superclass must be a prefix of the state description of the subclass...The arguments to the extends Base() clause is a list of names of state components of Sub, not arbitrary expressions, must be a prefix of the state description of Sub, and must match the state description of Base; "

---

techdragon 2 days ago [-]

I see that https://github.com/fish-shell/fish-shell/issues/3341 is scheduled for 3.0. This is also marked as the fix for the broken/invalid YAML format that is used for storing the shell history https://github.com/fish-shell/fish-shell/issues/2258

I implore you to NOT use JSON or ProtocolBuffers?, neither are appropriate formats for configuring my shell, or storing long sets of textual data. Protocol Buffers is hardly the kind of format I want to parse and read my history in, and JSON is a less than ideal configuration format for many reasons.

https://github.com/vstakhov/libucl UCL (Universal Configuration Language) is a much more appropriate format for configuring programs than JSON, and is has been accepted as standard for use configuring all tools included in the FreeBSD? operating system.

I also suggest you consider using a different language to store shell history, since shell history and shell configuration are two very different jobs with different goals and trade offs. I personally would suggest a simple safe format like TOML https://github.com/toml-lang/toml for storing the history.

But If you do want a single format for both configuration and history, I would implore you to pick TOML not JSON or Protocol Buffers.

reply

__sr__ 1 day ago [-]

Out of the mainstream formats, JSON is among the easiest to parse - thanks to the CLI tool jq. Does UCL have something equivalent?

Protobuf (or any other binary format) has the benefit of being faster to load, but I agree - it should not be be used to store things like history. It should be easy to parse history without having to write code. And with Protobuf, you need the schema.

If a binary format is going to be used, it should be optional - or perhaps there should be tools to convert the history file to/from a plaintext format.

reply

jhoechtl 1 day ago [-]

Would like to see comments in configuration, Json doesn't support comments.

reply

---

re 2 days ago [-]

I've been toying around with Python 3 and using it for most of my personal/hack projects, but I somehow missed the unpacking improvements: https://www.python.org/dev/peps/pep-0448/

In particular, being able to create an updated copy of a dict with a single expression is pretty cool:

    return {**old, 'foo': 'bar'}
    
    # Old way
    new = old.copy
    new['foo'] = ['bar']
    return new

reply

est 2 days ago [-]

    return {**old, 'foo': 'bar'}
    
    # Old way
    return dict(old, foo='bar')

Not much difference if you ask me.

reply

orf 2 days ago [-]

It's twice as slow, doesn't work with more than one dictionary, you can't easily control the merge prescience and you can't (easily) use expressions/variables in the keys:

    {**x, 'fo'+'o': 'bar', **y}

reply

---

from websites we now have a language of @person and #topic. What could @person be good for in a programming language?

The first thought is users, of course, but this differs too much between applications to be a useful core construct in a programming language.

A second idea is that it could refer to an 'entity'/'agent'/'actor'/'process' which can take action or at least change on its own. For example, a process or thread.

A third idea is that a @person could be, instead, something with an 'identity'. I'm not sure exactly what this means yet but recall that i had the idea that things with 'identity' could be, eg something shared by an API that is referenced in different ways (by different 'names', although some references could be queries rather than names) by different clients, servers, nodes. And also maybe by different versions of the same program, or different libraries within the program.

The system that gives this thing a unity even though it is accessed in different ways in different places is an 'identity system'. And @ could mark things with an identity.

This seems subtly different than &objects, somehow. But i do have my doubts about that; the 'venus vs the morning star' example shows that two seemingly separate objects can have identity; also, pronouns can refer to objects as well as people. We speak as if objects take actions, and objects can be subjects of sentences; 'The rock fell'. Maybe the difference between objects and people/processes is that objects are typically static, without opaque internal state (think of a chair or a rock; sure, they do have internal state, but we often speak and reason as if they don't. Or, perhaps the difference is that we typically expect that objects don't take action without some external cause, whereas people (or more generally processes) have some internal state that sometimes causes them to take action. This sort of has to do with the 'frame axiom', that is, what we assume stays constant across timepoints unless we have evidence that it changed. So maybe both objects and people have identity, but people emit 'uncaused actions' and objects don't (or maybe we just say that 'the rock fell' isn't really an action, just a declaration of a change in the state of the world; "the rock was here at some timepoint, but then after that timepoint it was over there, where 'there' is lower than 'here' and the transition trajectory went through empty air and stopped when the rock landed on a surface"). Eg with data structures we don't say "the list sorted" because lists don't sort themselves; we would say though that "thread #1 sorted the list". So maybe 'the rock fell' is really equivalent to an implicit passive voice; 'the rock was pulled down by gravity' in this case. 'The list was sorted' would be an equivalent; we may not know who sorted the list, but we know that it became sorted, and that someone did it.

otoh, online @person is mainly used for 'mentions', similar to backlinks in wikis. Which we don't really need here. So do we really need a @person sigil?

For that matter, do we really need a &object sigil? Just using it as a marker for mutation violates our 'syntax if and only if there is a syntactic difference, or core language level semantic that couldn't be expressed with a library or metaprogramming'. But otoh maybe there could be a core difference; maybe &object marking could help with that issue we had where when you apply __set__ to a nested object, you want to actually 'send' the __set__ to the base object (eg if you do a.b.c.__set__, you may want to send __set__ to 'a', rather than sending it to a.b.c). hmm i'll copy that to 'dataThoughts'

---

one issue i have with (TOML-style?) stuff is having to put everything in quotes and braces:

compare: " [packages] requests = { extras = ['socks'] } records = '>0.5.0' django = { git = 'https://github.com/django/django.git', ref = '1.11.4', editable = true } "e682b37" = {file = "https://github.com/divio/django-cms/archive/release/3.4.x.zip"} "e1839a8" = {path = ".", editable = true} pywinusb = { version = "*", os_name = "=='nt'", index="pypi"} " -- [12]

to:

"

  1. ##### Requirements with Version Specifiers ######
  2. See https://www.python.org/dev/peps/pep-0440/#version-specifiers docopt == 0.6.1 # Version Matching. Must be version 0.6.1 keyring >= 4.1.1 # Minimum version 4.1.1 coverage != 3.5 # Version Exclusion. Anything except version 3.5 Mopidy-Dirble ~= 1.1 # Compatible release. Same as >= 1.1, == 1.* " -- [13]

---

need a syntax to destructuring bind dicts; eg if d = {'a': 1, 'b': 2}, need a way to do something like x, y = d[a], d[b] (the reason that isn't good enough already is that you might actually be iterating through a larger list, eg for x, y in ll (except that you want to bind x and y to particular fields in each item within ll)

---

actually we should use 'statecharts' instead of state machines; statecharts are just hierarchical state machines:

[14]

https://statecharts.github.io/on-off-delayed-exit-1-zoomed.svg

"A statechart is a state machine where each state in the state machine may define its own subordinate state machines, called substates. ...

    When a state is entered, its sub state machine starts and therefore, a substate is entered
    When a state is exited, its sub state machine is exited too, i.e. any substates also exit
    This happens in addition to normal state machine behavior, namely entry and exit actions.... Like state machines, statecharts also react to events; events are dealt with by the states and the main side effects are specified by what happens upon entering and exiting states. ... Note also that a state may define several independent subordinate state machines; in this scenario, when a state is entered, then all of the subordinate state machines “start” and enter their initial state. In the statechart terminology, this is called a “parallel” state "

mb (or mb not) also read this someday: https://statecharts.github.io/on-off-delayed-exit-1-zoomed.svg

an example of how to apply the concept of statecharts in redux (mb read or mb not): https://medium.freecodecamp.org/how-to-model-the-behavior-of-redux-apps-using-statecharts-5e342aad8f66

---

"

Erlang, which traditionally deals with concurrency and parsing protocol has had a built-in standard FSM module for a long while:

http://learnyousomeerlang.com/finite-state-machines

It even got a recent re-write and the new one is called gen_statem: http://erlang.org/doc/man/gen_statem.html

This new version is already is used to handle some of the TLS and SSH stuff from what I've heard. Here is TLS connection code: https://github.com/erlang/otp/blob/11cd0f1d000be5849bba2466b3b54daa7727af22/lib/ssl/src/tls_connection.erl

I've done them in C and Python before as well. In C I the like the table + function pointers approach when possible. Here is an example: https://stackoverflow.com/questions/133214/is-there-a-typical-state-machine-implementation-pattern?answertab=active#tab-top

reply

tonyarkles 2 days ago [-]

Even without the framework, Erlang/Elixir processes very naturally lend themselves to statemachine-like behaviour. Unless you're doing something really weird, the messaging behaviour is basically:

    new state = old state + message

reply

rdtsc 1 day ago [-]

Indeed. In Erlang I had never actually used any of the FSM modules in practice because in my case a simple gen_server works just as well.

reply

"

---

richdougherty 4 days ago [-]

Here's a tip for state machines. If you've got a transition that takes some time then you often actually need another state to represent the state that the system is in during the "transition".

This is usually modeled better on backend systems (a state while waiting for a network response to arrive) but is often modeled poorly in front-end systems (a state while waiting for an animation to finish).

reply

---

" What is a Data Class?

Most Python developers will have written many classes which looks like: class MyClass?: def __init__(self, var_a, var_b): self.var_a = var_a self.var_b = var_b

Data classes help you by automatically generating dunder methods for simple cases. For example, a __init__ which accepted those arguments and assigned each to self. The small example before could be rewritten like:

@dataclass class MyClass?: var_a: str var_b: str

A key difference is that type hints are actually required for data classes. If you’ve never used a type hint before: they allow you to mark what type a certain variable _should_ be. At runtime, these types are not checked, but you can use PyCharm? or a command-line tool like mypy to check your code statically. " [15]

reaperhulk 13 hours ago [-]

As noted in the PEP data classes is a less fully-featured stdlib implementation of what attrs already provides. Unless you’re constrained to the stdlib (as those who write CPython itself of course are) you should consider taking a look at attrs first.

http://www.attrs.org/en/stable/

reply

shoyer 12 hours ago [-]

Less fully-featured, sure, but also a bit more cleanly designed (in my opinion). Are there features from attrs that you would miss with dataclasses?

reply

metalliqaz 11 hours ago [-]

This is spot on. The design of attrs remines me a little bit of the syntax from a declarative ORM, for example. I'm sure it can do very powerful things that I've not had occasion to use, but it is heavy. The @dataclass format is very clean and seems more like the syntactic sugar that I expect from Python.

One of the prime uses of a dataclass is to be a mutable namedtuple. And the syntax can be almost identical:

    Part = make_dataclass('Part', ['part_num', 'description', 'quantity'])

(from Raymond Hettinger's twitter)

This has the added benefit of not requiring type hinting, if you don't want to bother with such things.

reply

---

https://github.com/tc39/proposal-pattern-matching

https://news.ycombinator.com/item?id=16924554

---

from "Crafting Interpreters":

" This makes the "prototype" a piece of metadata instead of data. Goblins have warty green skin and yellow teeth. They don’t have prototypes. Prototypes are a property of the data object representing the goblin, and not the goblin itself. "

---

Category Theory datatype. Augmented so that it can also hold things like (homo,iso)morphism.

So can be used to express most kinds of type computations, algebraic judgments (e.g. is there a homomorphism between this thing and that thing).

---

first-order logic datatype, they can hold propositions such as "forall x, (y = 3)"

This means we also need a datatype that can hold symbols such as '\forall'

---

https://github.com/fantasyland/fantasy-land https://github.com/rpominov/static-land https://github.com/sanctuary-js/sanctuary-type-classes https://github.com/sanctuary-js https://github.com/sanctuary-js/sanctuary https://github.com/fluture-js/Fluture

https://monet.github.io/monet.js/

---

"parallel arrays...support multidimensional indexing (where an attribute is a property of a tuple of entities, rather than just one entity)." [16]

---

" Ward Cunningham: "We're entering a period of unprecedented mobility of information. This is fueled by a de facto standard representation uniformly embraced by all important programming languages. It's not the CONS cell of Lisp, or the Struct of C, or the Record of SQL. It is the mashup of hash tables and flexible arrays at the center of Perl, Python, PHP, Ruby and JavaScript?. It is {} and []. Learn these characters. "

---

chubot 38 days ago

parent [-]on: Python 3.7: Introducing Data Classes

I'm not sure what part of the article you refer to, but:

---

YAML

https://wiki.xxiivv.com/#indental

indental

" This flat-file database format is inspired by YAML, and is meant to be intelligible while remaining flexbile and fast.

The parser is a mere 50 lines, and allows for neat data structures for serverless sites. The file extension is .tome and has its own syntax highlight, thanks to Colin. Example NAME KEY : VALUE LIST ITEM1 ITEM2 Or, {NAME:{KEY:VALUE,LIST:[ITEM1,ITEM2]}} "

https://github.com/uonai/Tome

https://github.com/XXIIVV/Oscean/blob/master/scripts/lib/indental.js

---

"subset" Data types can specify underlying data types, and then validators (predictates that say whether or not some value, expressed as a value of the underlying data type, is a member of the subset being defined; think of a zipcode form input validator on a website)

---

also, we may want 'unit' data type suffixes, eg "4 meters"

i guess you could generalize this to a form a 'product' data types, eg the '4' and the 'meters' are treated as orthogonal.

---

"Julia has a new canonical representation for missing values. Being able to represent and work with missing data is fundamental to statistics and data science. In typical Julian fashion, the new solution is general, composable and high-performance. Any generic collection type can efficiently support missing values simply by allowing elements to include the pre-defined value missing. The performance of such “union-typed” collections would have been too slow in previous Julia versions, but compiler improvements now allow Julia to match the speed of custom C or C++ missing data representations in other systems, while also being far more general and flexible." [17]

https://julialang.org/blog/2018/06/missing

---

"Broadcasting is already a core language feature with convenient syntax—and it’s now more powerful than ever. In Julia 1.0 it’s simple to extend broadcasting to custom types and implement efficient optimized computations on GPUs and other vectorized hardware, paving the way for even greater performance gains in the future." [18]

https://julialang.org/blog/2018/05/extensible-broadcast-fusion

---

" Named tuples are a new language feature which make representing and accessing data by name efficient and convenient. You can, for example, represent a row of data as row = (name="Julia", version=v"1.0.0", releases=8) and access the version column as row.version with the same performance as the less convenient row[2]. " [19]

---

"The dot operator can now be overloaded, allowing types to use the obj.property syntax for meanings other than getting and setting struct fields. This is especially useful for smoother interop with class-based languages such as Python and Java. Property accessor overloading also allows the syntax for getting a column of data to match named tuple syntax: you can write table.version to access the version column of a table just as row.version accesses the version field of a single row." [20]

---

"

The iteration protocol has been completely redesigned to make it easier to implement many kinds of iterables. Instead of defining methods of three different generic functions—start, next, done—one now defines one- and two-argument methods of the iterate function. This often allows iteration to be conveniently defined with a single definition with a default value for the start state. More importantly, it makes it possible to implement iterators that only know if they’re done once they’ve tried and failed to produce a value. These kinds of iterators are ubiquitous in I/O, networking, and producer/consumer patterns; Julia can now express these iterators in a straightforward and correct manner. " [21]

---

"

Scope rules have been simplified. Constructs that introduce local scopes now do so consistently, regardless of whether a global binding for a name already exists or not. This eliminates the “soft/hard scope” distinction that previously existed and means that now Julia can always statically determine whether variables are local or global. " [22]

---

The utter awkwardness of Protobuf-generated code is particularly problematic. I've had pretty good results with the TypeScript? code generator, but the Go generator is just egregiously bad. There's no way to write a client or server that uses the Protobuf structs directly as first-class data types without ending up looking like a ball of spaghetti, at least not if you venture into the realm of oneofs, timestamps or the "well known types" set of types.

I think we're at a juncture where we really desperately need a better, unified way to express schemas and pass type-safe data over the network that flows through these schemas. I'm currently working on a project that involves both gRPC (which uses Protobuf), GraphQL? and JSON Schema, with the backends written in Go, and the frontend in TypeScript?, and the overlap between all of these is ridiculous. TypeScript? is a fresh breath of air that mostly solves the brittleness of JavaScript?'s type system, and it's absolutely crucial to extend this type-safety all the way to the backend.

The challenge is that no language is the same, and a common schema format ends up targeting a lowest-common denominator. For example, GraphQL? errs (in my opinion, wrongly) on the side of simplicity and doesn't support maps, so now you have a whole layer that needs to either custom types (JSON as an escape hatch, basically) or emulate maps using arrays of pairs, neither of which is ideal.

reply

-- [23]

---

again someone brought up: https://github.com/edn-format/edn https://github.com/cognitect/transit-format

which i have to learn more about sometime. also, what's the difference in use-cases for these two? https://www.reddit.com/r/Clojure/comments/5eopoc/transit_vs_edn/daegbw3 seems to say that EDN wasnt sufficiently portable. https://www.reddit.com/r/Clojure/comments/5eopoc/transit_vs_edn/ also seems to say that transit is more of a protobuf-like thing (more efficient, not as human readable or expressive, less metadata). see also https://groups.google.com/forum/#!msg/clojure/9ESqyT6G5nU/2Ne9X6JBUh8J which makes similar points. So, it sounds like both transit and edn would be of interest to me -- transit b/c it's portable and EDN b/c of the metadata.

just looked at transit for a few minutes. As an encoding, it's probably a bit too 'practical' for consideration in the conceptual core of oot; it contains things like the specific way of encoding various things into JSON strings and arrays, specific 'type tag' strings to indicate whether something is an integer or a float, and a specific way to compress repeated elements. However, it may be of interest just for its selection of data types. Even there though it seems to have some data types that are not quite 'lowest common denominator' enough for me; for example, it has data types for both 'keywords' and 'symbols'. Still, it's a great place to start.

still have to look at EDN

TOML also looks good as a simple data format. And JSON (but with comments added).

---

hardwaresofton 2 days ago [-]

I'd like to humbly suggest that we use JSON please, in particular:

JSON + JSONSchema[0] +/- JSON Hyperschema[1] +/- JSON LD[2]

It's a bit to learn but I promise you, it's worth it.

[0]: https://json-schema.org/ [1]: https://datatracker.ietf.org/doc/draft-handrews-json-schema-hyperschema/ (also on json-schema.org) [2]: https://json-ld.org/

---

" Keywords

These are everywhere in Clojure. I think of these as more-or-less interned strings that are granted some special abilities.

:foo ;; a keyword

You can think of that as "foo" for most intents and purposes, and there’s a pair of functions for converting between keywords and strings.

(keyword "foo")

> :foo

(name :foo)

> "foo"

Keywords are the de facto type for keys in map literals e.g.

{:name "Bill", :age 23}

Keywords can also be used as functions of maps i.e. to get a value from the map by keyword: (:foo m). They can be namespaced to avoid naming collisions e.g. :my-namespace/foo. Note the name of the namespaced keyword is just the part after the /. " -- [24]

---

"Maps would seem to be Clojure’s commonplace record type, even though it has defrecord...Something that bugged me about this initially was there were no guarantees a record (map) would contain what you expected, or that it wouldn’t contain a bunch of stuff you didn’t expect! I came to realize this doesn’t matter in practice as often as I thought." -- [25]

we probably want both structs and maps

---

this sounds similar to what i wanted to do:

" Some core Clojure data structures can be used as functions. I didn’t know what to make of this at first but I love it now. For example, a map is a function of one (key) argument. A set is a function of one (member) argument.

(def my-set #{1 2 3}) ;; a set of integers (my-set 1) => 1 ;; returns the member value (my-set 3) => 3 ;; returns the member value (my-set "no") => nil ;; returns nil (map my-set [3 2]) => (3 2) ;; mapping the set over a vector " -- [26]

---

" Lately I find myself thinking more in terms of data than procedural code. For example, sometimes it’s nice to express program logic declaratively with data as opposed to control-flow/case/cond statements. Consider this case:

(case x "High" 100 "Medium" 50 "Low" 0)

This can also be expressed as a map, and that map can be used as a function that returns the matched key’s value. This code has roughly the same effect as the case above:

(def score {"High" 100 "Medium" 50 "Low" 0}) (score "Medium") => 50

An important detail is that case works more like a switch statement with fast equality checks/jumps, and will throw an exception by default if no match is found.

Here we’re using the map score as a function and passing a key value for which to return the value.

(score "Nope.") => nil ;; returns nil when key not found

You can also specify a third argument for a default value if the key isn’t found:

(score "Nope." -1) => -1

You can use the map as a function in higher order functions:

(map score ["Medium" "Low" "Medium"]) => (50 0 50)

And some of the same concepts apply to sets.

This approach is nice when you want to define some “rules” in one place and it use them in many, or maybe it’s loaded from a database or configuration file. " -- [27]

this suggests that Oot may want to unify these things

---

clojure has a special case of the more general 'view' functionality that i want:

" Metadata

Clojure can attach metadata to objects. You can use with-meta (or a prefix syntax ^) to tag objects with metadata. The object/value will still present itself to callers as before, but now it has a secret bag (map) of goodies and you can inspect it with the meta function.

For example, a function to pad the left side of a collection that also returns metadata for the number of padding items:

(defn left-pad [n p s] (let [diff (- n (count s))] (with-meta (concat (repeat diff p) s) {:padding diff}))) (left-pad 5 0 [1 2 3])

> (0 0 1 2 3) ;; looks like a typical result

(meta (left-pad 5 0 [1 2 3]))

> {:padding 2} ;; but it has metadata too

The metadata facilities are also commonly used for compiler type hints, docstrings, deprecation tags, etc. In fact, that left-pad function itself has been decorated with a bunch of metadata behind-the-scenes, including its definition location down to the line and column!

(meta #'left-pad) ;; #' gets the Var of the function definition

>

{:arglists ([n p s]), :line 8, :column 1, :file "/Users/Taylor/Projects/playground/src/playground/core.clj", :name left-pad, :ns #object[clojure.lang.Namespace 0x547b55b7 "playground.core"]} " -- [28]

---

usgroup 14 days ago [-]

I got the Clojure bug at some point not long ago. Decided I’d write a crawler and some data munging stuff directly in Clojure since it’s all about data processing.

Crawlers naturally want to be stacks with pipelines and expressing them as tail recursions over URLs curried into transducers, etc was easy enough conceptually, by difficult in reality because you want crawlers to be stateful. I think in clojure you end up hacking state by just attaching stuff to the outputs of your functions and building of more complicated objects down the line.

I discovered that stuff which is cake in SQL or a language with data frame support is often typically hard in Clojure. Eg eg joining data , aggregation, etc. The few clojure libraries for this sort of thing are complicated to use because they have a mechanical sympathy with how clojure executes. Eg https://github.com/nathanmarz/specter/

So I wrote some of my own libraries to get over this hurdle. I then realised that there isn’t a single decent statistics library that still maintained, and that the java bindings were far from simple to us.

At that point I gave up on clojure for my use case. That’s not to say that there isn’t something for which it’s awesone. But in my opinion it isn’t data processing.

dm3 13 days ago [-]

For the data processing tasks my go-to tool is Manifold[1]. Depending on how the application is structured, you can keep state in atoms, the system map when using Component[2] or vars with Mount[3]. Manifold streams asynchronously join the state transformations together while allowing control over backpressure and threading.

On the data analysis side things aren't rosy on the JVM. I'd delegate that part to something which actually has all the tools built-in, e.g. Julia/Python+Numpy/R. Apache Arrow[4] is a nice project to facilitate the dataframe interop.

[1] https://github.com/ztellman/manifold

[2] https://github.com/stuartsierra/component

[3] https://github.com/tolitius/mount

[4] https://arrow.apache.org/

reply

ken 14 days ago [-]

I think the Clojure answer to stateful data storage is "use Datomic". There's a big impedance mismatch between the functional world and the stateful (RDBMS) world. Datomic extends the functional world all the way down to data storage.

You can certainly use SQL, and it's not any worse than any other language (and in some ways a little better), but IME it's not Clojure-simple, either. It's still the most awkward part of all my Clojure programs.

Clojure excels when you can avoid mutating state, and I think that's why people especially love it for web apps. It fits almost perfectly into that model.

weavejester 14 days ago [-]

Can you explain a little more why you think state is hard to manage in Clojure? I've done a fair amount of data processing in Clojure, and I'd consider it one of Clojure's strengths.

---

If maps and functions are the same thing (both take a value and return a value), and if we represented them similarly syntactically, then how do we handle mutating an item in a map? Maybe it's like piecewise function definition with a guard.

e.g.

m = {'a': 1, 'b':2} m['b'] = 3 m[a] == 1

is like:

m = {'a': 1, 'b':2} m 'b' = 3 m 'a' == 1

or mb like:

m = {'a': 1, 'b':2} m x

m 'a' == 1
x == 'b' = 3

another special case of this sort of thing could be adding a type implementation in inheritance / adhoc polymorphism:

add x y

isa(x) == isa(y) == ComplexNumberType? = ComplexNumber?(x.real + y.real, x.imag + y.imag)

---

" FORTRAN has multidimensional arrays, and the compilers know about them and optimize aggressively. Most other languages don't. C people just do not get multidimensional arrays. They think an array of arrays is good enough. I tried to convince the Go people to put in multidimensional arrays, with no success. "

---

" There are attributes with predefined meaning, e.g. controlling the display of the corresponding objects, in particular – that in the GUI. The GUI is managed entirely in this way, thus it is purely data driven and declarative.

Two other predefined kinds of attributes, the so called dependencies and triggers, provide a spreadsheet-like behaviour in programs by relating global variables to each other. Both dependencies and triggers are expressions, each associated (as an attribute) with a global variable.

A trigger is executed whenever its variable receives a value, and is most often used for setting the value of another global variable. Evaluation of a dependency takes place due to any of the global variables in it changing its value. The value of the dependency expression eventually becomes the value of the dependent variable. However, unlike that of a trigger, the evaluation of a dependency only occurs when the dependent variable is actually referenced. In other words, triggers evaluate eagerly and dependencies lazily. "

---

"When I create my own language (soon, hopefully), I would love for it to emulate R's paradigm where scalars are just length-1 vectors:"

---

" X-expres­sions. [Racket] This choice is some­what biased by my work with Racket, which mostly involves docu­ment processing and type­set­ting. But related topics arise in most web program­ming. An X-expres­sion is a special native data struc­ture that Lisps use to repre­sent HTML and other XML-ish data.

Well, not “special” in a Lispy sense—keeping with the usual policy, an X-expres­sion is just another list—but special in the sense that other program­ming languages don’t have it. Usually your choice is to repre­sent HTML either as a string or as a full XML tree. A string is wrong because it doesn’t capture the struc­ture of the HTML, as defined by its tags and attrib­utes. An XML tree shows this struc­ture, but conceals the sequen­tial nature of the data elements, and is unwieldy to work with.

An X-expres­sion ends up being an ideal hybrid between a string and a tree. More­over, because it’s just another list-based expres­sion in the language, you have a lot of options for processing it. Trans­lating an X-expres­sion to or from a text repre­sen­ta­tion using angle brackets is trivial and fast. (Details.)

Given the close kinship between XML-ish data struc­tures and Lisp languages, I have no expla­na­tion why, during the Internet era, they’ve not been paired more often. They’re like peanut butter and jelly. "

---

trevor-e on May 18, 2017 [-]

Pretty good list. I don't think the tuple comparison is accurate though, I thought a data class is basically a value type (like Swift's struct) except with some stuff like equals, hashcode, and toString calculated for you.

Minor nit: the sort example needlessly uses two lines for the Swift version.

---

tomfitz on May 18, 2017 [-]

An alternative approach is Guava's Range class:

Range.closed(1,5) == [1,5]

Range.open(1,5) == (1,5)

Range.openClosed(1,5) == (1,5]

Range.closedOpen(1,5) == [1,5)

Range.greaterThan(1) == (1,infinity)

Range.atLeast(1) == [1,infinity)

That class always strikes me as having high power-to-weight. It has methods like encloses(anotherRange), contains(aValue), and others.

https://google.github.io/guava/releases/19.0/api/docs/com/go...

---

an OCamler says some things about Ocaml that are better than Haskell critique of Haskell:

" ... structural variants specifically. When you don't have to define every variant before usage, you can start using them much more cheaply to aid readability. For example, here's core's Map.merge function:

( merges two maps *) val merge : ('k, 'v1, 'cmp) t -> ('k, 'v2, 'cmp) t -> f:(key:'k -> [ `Left of 'v1

`Right of 'v2`Both of 'v1 * 'v2 ]
        -> 'v3 option)
  -> ('k, 'v3, 'cmp) t

Notice the 2nd argument passed to f.

    This is the same as above: "no structural types".

That's true. But I also mean just plain old records as well. In OCaml you can define as many plain records with overlapping field names in the same module as you want and then use type names to disambiguate (field accessors aren't just functions) " -- [29]

---

For reference types, MB have two ways they sit sit in registers; as a pointer, or as a "value" (the Actual pointer is implicitly defererenced by any access)

---

ncmncm 3 days ago [-]

...Boolean true should be ~0.

reply

---