proj-oot-ootTypeNotes8

---

[1]

" Protobuffers Are Wrong ...

No Compositionality

Protobuffers offer several "features", but none of them see to work with one another. For example, look at the list of orthogonal-yet-constrained typing features that I found by skimming the documentation.

    oneof fields can't be repeated.
    map<k,v> fields have dedicated syntax for their keys and values, but this isn't used for any other types.
    Despite map fields being able to be parameterized, no user-defined types can be. This means you'll be stuck hand-rolling your own specializations of common data structures.
    map fields cannot be repeated.
    map keys can be strings, but can not be bytes. They also can't be enums, even though enums are considered to be equivalent to integers everywhere else in the protobuffer spec.
    map values cannot be other maps.

This insane list of restrictions is the result of unprincipled design choices and bolting on features after the fact. For example, oneof fields can't be repeated because rather than resulting in a coproduct type, instead the code generator will give you a product of mutually-exclusive optional fields. Such a transformation is only valid for a singular field (and, as we'll see later, not even then.)

The restriction behind map fields being unable to be repeated is related, but shows off a different limitation of the type-system. Behind the scenes, a map<k,v> is desugared into something spiritually similar to repeated Pair<k,v>. And because repeated is a magical language keyword rather than a type in its own right, it doesn't compose with itself.

Your guess is as good as mine for why an enum can't be used as a map key.

What's so frustrating about all of this is a little understanding of how modern type-systems work would be enough to drastically simplify the protobuffer spec and simultaneously remove all of the arbitrary restrictions.

The solution is as follows:

    Make all fields in a message required. This makes messages product types.
    Promote oneof fields to instead be standalone data types. These are coproduct types.
    Give the ability to parameterize product and coproduct types by other types.

That's it! These three features are all you need in order to define any possible piece of data. With these simpler pieces, we can re-implement the rest of the protobuffer spec in terms of them.

For example, we can rebuild optional fields:

product Unit { no fields }

coproduct Optional<t> { t value = 0; Unit unset = 1; }

Building repeated fields is simple too:

coproduct List<t> { Unit empty = 0; Pair<t, List<t>> cons = 1; }

Of course, the actual serialization logic is allowed to do something smarter than pushing linked-lists across the network---after all, implementations and semantics don't need to align one-to-one. Questionable Choices

In the vein of Java, protobuffers make the distinction between scalar types and message types. Scalars correspond more-or-less to machine primitives---things like int32, bool and string. Messages, on the other hand, are everything else. All library- and user-defined types are messages.

The two varieties of types have completely different semantics, of course.

Fields with scalar types are always present. Even if you don't set them. Did I mention that (at least in proto31) all protobuffers can be zero-initialized with absolutely no data in them? Scalar fields get false-y values---uint32 is initialized to 0 for example, and string is initialized as "".

It's impossible to differentiate a field that was missing in a protobuffer from one that was assigned to the default value. Presumably this decision is in place in order to allow for an optimization of not needing to send default scalar values over the wire. Presumably, though the encoding guide makes no mention of this optimization being performed, so your guess is as good as mine.

As we'll see when we discuss protobuffers' claim to being god's gift to backwards- and forwards-compatible APIs, this inability to distinguish between unset and default values is a nightmare. Especially if indeed it's a design decision made in order to save one bit (set or not) per field.

Contrast this behavior against message types. While scalar fields are dumb, the behavior for message fields is outright insane. Internally, message fields are either there or they're not---but their behavior is crazy. Some pseudocode for their accessor is worth a thousand words. Pretend this is Java or something similar:

private Foo m_foo;

public Foo foo { only if `foo` is used as an expression get { if (m_foo != null) return m_foo; else return new Foo(); }

  // instead if `foo` is used as an lvalue
  mutable get {
    if (m_foo = null)
      m_foo = new Foo();
    return m_foo;
  }}

The idea is that if the foo field is unset, you'll see a default-initialized copy whenever you ask for it, but won't actually modify its container. But if you modify foo, it will modify its parent as well! All of this just to avoid using a Maybe Foo type and the associated "headaches" of the nuance behind needing to figure out what an unset value should mean.

This behavior is especially egregious, because it breaks a law! We'd expect the assignment msg.foo = msg.foo; to be a no-op. Instead the implementation will actually silently change msg to have a zero-initialized copy of foo if it previously didn't have one.

Unlike scalar fields, at least it's possible to detect if a message field is unset. Language bindings for protobuffers offer something along the lines of a generated bool has_foo() method. In the frequent case of copying a message field from one proto to another, iff it was present, you'll need to write the following code:

if (src.has_foo(src)) { dst.set_foo(src.foo()); }

Notice that, at least in statically-typed languages, this pattern cannot be abstracted due to the nominal relationship between the methods foo(), set_foo() and has_foo(). Because all of these functions are their own identifiers, we have no means of programmatically generating them, save for a preprocessor macro:

  1. define COPY_IFF_SET(src, dst, field) \ if (src.has_##field(src)) { \ dst.set_##field(src.field()); \ }

(but preprocessor macros are verboten by the Google style guide.)

If instead all optional fields were implemented as Maybes, you'd get abstract-able, referentially transparent call-sites for free.

To change tack, let's talk about another questionable decision. While you can define oneof fields in protobuffers, their semantics are not of coproduct types! Rookie mistake my dudes! What you get instead is an optional field for each case of the oneof, and magic code in the setters that will just unset any other case if this one is set.

At first glance, this seems like it should be semantically equivalent to having a proper union type. But instead it is an accursed, unutterable source of bugs! When this behavior teams up with the law-breaking implementation of msg.foo = msg.foo;, it allows this benign-looking assignment to silently delete arbitrary amounts of data!

What this means at the end of the day is that oneof fields do not form law-abiding Prisms, nor do messages form law-abiding Lenses. Which is to say good luck trying to write bug-free, non-trivial manipulations of protobuffers. It is literally impossible to write generic, bug-free, polymorphic code over protobuffers.

That's not the sort of thing anybody likes to hear, let alone those of us who have grown to love parametric polymorphism---which gives us the exact opposite promise.

The Lie of Backwards- and Forwards-Compatibility

One of the frequently cited killer features of protobuffers is their "hassle-free ability to write backwards- and forwards-compatible APIs." This is the claim that has been pulled over your eyes to blind you from the truth.

What protobuffers are is permissive. They manage to not shit the bed when receiving messages from the past or from the future because they make absolutely no promises about what your data will look like. Everything is optional! But if you need it anyway, protobuffers will happily cook up and serve you something that typechecks, regardless of whether or not it's meaningful.

This means that protobuffers achieve their promised time-traveling compatibility guarantees by silently doing the wrong thing by default. Of course, the cautious programmer can (and should) write code that performs sanity checks on received protobuffers. But if at every use-site you need to write defensive checks ensuring your data is sane, maybe that just means your deserialization step was too permissive. All you've managed to do is decentralize sanity-checking logic from a well-defined boundary and push the responsibility of doing it throughout your entire codebase. "

---

some discussion on the previous item:

"Also, required vs optional - definitely only optional, there should be nothing required. So I flat out disagree with that one. " [2]

emmanueloga_ 3 hours ago [-]

Agree with other ppl here, an unnecessarily violent rant, and the point the author makes is kind of lost in the title "Protobuffers Are Wrong". What they really mean is "I think Protobuffers type system could be better".

I recently saw a talk by Rich Hickey about Effective Programs [1]. The talk explains why Hickey favors dynamic types, with EDN [2] and transit [3] proposed as an alternative to more statically typed data exchange formats like protobufs and less structured ones like Json (the talk explains the reasons, nearing the end I think).

1: https://www.youtube.com/watch?v=2V1FtfBDsLU 2: https://github.com/edn-format/edn 3: https://github.com/cognitect/transit-format

reply

" I used to like protobufs but I wouldn't use proto3. It seems to have regressed over time instead of getting better. Partly this is because the wire format is not evolvable. Protobufs do not have a "this is protobufs v1" header in them. This also explains the type system issues: Google cannot/will not evolve the wire format so each new version of protobufs is just different tooling and APIs over it. " [3]

derefr 2 days ago [-]

> That way everyone who interacts with your service can decide how they want to represent that structure.

Protobufs aren't a message format for publicly specced standard wire protocols. They're a serialization layer for polyglot RPC request and response messages. The whole point of them is that you're transporting the same typed data around between different languages, rather than there having to be two conversions (source language -> "standard format"; "standard format" -> dest language).

In this sense, they're a lot like, say, Erlang's External Term Format—driven entirely by the needs of an RPC-oriented distributed-computation wire protocol, doing interchange between nodes that aren't necessarily running the same version of the same software. Except, unlike ETF, Protobufs can be decoded without the backing of a garbage-collected language runtime (e.g. in C++ nodes.)

I'm not saying Protobufs are the best, but you have to understand the problem they're solving—the "rivals" to Protobufs are Cap'n Proto and Thrift, not JSON. JSON can't even be used to do a lot of the stuff Protobufs do (like talk to memory-constrained embedded kernels running on SDN infra.)

deathanatos 2 days ago [-]

> you probably shouldn't be using elaborate data structures in streams/files.

I don't consider Option<T> a particular "elaborate" data structure, but protobuf would benefit heavily from.

 Instead, as the author alludes to, you're forced to do different things depending on which side of the message/scalar dichotomy T falls on; if it's a message, you get Option<T> essentially for free, as a submessage can always be omitted. (Which is bad in the case that you just want a T; protobufs will happily let you forget to encode that attribute, and happily decode the lack thereof at the other end, all without error, despite the message being malformed, as it lacks a way to express this — rather common — constraint.) If T is a scalar, you get to use one of the WKTs.

See this GitHub? issue for more context: https://github.com/protocolbuffers/protobuf/issues/1606

I work with data where most attributes outside of the attributes that compose the key identifying some object are quite possibly unknown. Our data comes from messy datasets, and while we clean it up as best we can, sometimes an attribute's value is unintelligble in the source data, and it is more pragmatic to move on without it. In SQL, we represent this as NULL, in JSON, null, but protobuf makes it rather difficult, and this frustration is evident from the other posters in that GitHub? issue.

---

more discussion on the 'protobufs are bad':

docker_up 2 days ago [-]

Put me firmly in the camp of "optional fields are bad." I believe all fields should be required.

I come from an ONC/RPC background, which is the original UNIX RPC. Every iteration of an rpc would get versioned, and then you could write a conversion between versions, ex from V1 to V2, from V2 to V3, etc. This allowed for true backwards compatibility.

The idea of "forwards compatibility" is a pipedream, in my opinion. Just because something won't crash because a field is or isn't there doesn't mean that it will actually work. Code ages over time, and instead of looking at the spec, which will tell you which fields are required, you need to read the documentation, which may or may not tell you what is required. And especially when dealing with an older client, it simply may not work because what used to be optional-optional fields become required-optional fields.

It's the same argument as schemaless NOSQL database like Mongo vs MySQL?. Having a schema is a pain in the ass, but it's better over time and ages much more gracefully than schemaless ones, because things won't just break.

reply

jeremyjh 2 days ago [-]

If all fields are required, you cannot have middleware that processes multiple versions of the same protobuf, and every application has to be updated when a field is added, even if they do not use that field. This is one of the more important design goals underlying not only protobufs, but most of the non-language specific binary formatters.

reply

derefr 2 days ago [-]

> every application has to be updated when a field is added, even if they do not use that field

This is when, in a protocol, you reach for the hammer called "extensibility."

In this case (decoding to native structs), you'd probably have your FooMessage? product-type have an "extensions" field, which is a list of zero or more FooExtensionStructs?, where a FooExtensionStruct? is a sum type of the known extensions to FooMessage?.

Then, just make the semantics of your format such that a sum type can have a pragma in its IDL indicating that it isn't required to be decodable for the product-type containing it to be successfully decoded. I.e., the sum type becomes a (DecodeResult? = Decoded(sum type)

Opaque(raw wire data)).

Opaque (unrecognized) extension data can't be manipulated by the program, but it can losslessly survive a trip back through the wire encoder. So you can choose to pass it through in your middleware, or not.

Boom: you've reinvented the "chunks" concept of the https://en.wikipedia.org/wiki/Interchange_File_Format, as seen in PNG and ELF.

reply

fixermark 1 day ago [-]

I believe you've also reinvented optional fields in a more generalizable way. Generalizability can be good or bad, depending on how much complexity it adds.

reply

derefr 1 day ago [-]

Nope—there's a very important difference in the two approaches. You have to choose to add the kind of extensibility I was describing to a particular type, in advance, as part of that type's original specification. You have no choice in having optional fields.

With Protobuf optional fields, even if you formally specify a protocol where some particular message is absolutely "final" and will never be extended, anyone can just throw an optional member on there when sending one to you, and your implementation won't reject the struct. This is a big problem if you're trying to specify types that really map one-to-one to fixed-size structs you'll use in the business logic of your non-memory-managed language.

With explicit extension points, you get to dictate where and when extension happens; and, as well, you get to formalize exactly what extensions look like (i.e. you don't have to allow for un-recognized extensions; you can simply have extensions be a finite set of known new extensions, where everything else is a real decode error.)

reply


more discussion on 'protobufs are wrong'

" There is no metadata anywhere. A protobuf with a single tagged int32 field encodes to just three bytes. There's no version header. There's no embedded schema. There's nothing that can be used to even mark a message as using an upgraded version of the format. There is however a rather complex varint encoding.

This makes a ton of sense given it was refactored out of a protocol for inter-server communication in a web search engine (Google search being largely a machine that spends its time decoding varints). But having come into existence this way, protobufs quickly proliferated everywhere including things like log files which may have to be read years later. So almost immediately the format became permanently fixed.

And what can't you do if the format is fixed? You can't add any sort of schema metadata that might help you analyse where data is going or what versions of a data structure are deployed in production.

As a consequence, when I was at Google, nobody had any way to know whether there was something running in production still producing or consuming a protobuf that they were trying to remove a field from. With lack of visibility comes fear of change... " [4]

" > There is no metadata anywhere.

Specifically, you mean there is no header / container around a protobuf. This is one of the best properties of protobufs, because it makes them compose nicely with other systems that already have their own metadata, or where metadata is irrelevant. Adding required metadata wastes bytes and creates ugly redundancy. For example, if you're sending a protobuf in an HTTP response, the obvious place to put metadata is in the headers -- encoding metadata in the protobuf body would be redundant and wasteful. " [5]


todo read https://news.ycombinator.com/item?id=18189339

todo read https://news.ycombinator.com/item?id=18188519 , starting with the comment https://news.ycombinator.com/item?id=18190005 " kentonv 2 days ago [-] Hello. I didn't invent Protocol Buffers, but I did write version 2 and was responsible for open sourcing it...." and continuing for the rest of the thread (i've already read the other comments above that one, except for 18189339, namely, i've already read dweis 2 days ago [+11] atombender 2 days ago [+41] malkia 2 days ago [+10] TeMPOraL? 1 day ago [+3] twic 2 days ago [+8] _dmurph 2 days ago [+9] rdtsc 2 days ago [+1] skylark 2 days ago [+2] emmanueloga_ 3 hours ago [+1] patrec 1 day ago [+5] docker_up 2 days ago [+27] )

the https://news.ycombinator.com/item?id=18190005 comment and its thread looks particularly interesting -- i skimmed it but didn't read it yet

---

so far my take-homes from the 'protobufs are wrong' article and discussion (before reading the other to-reads above, though) are:

"Protobufs aren't a message format for publicly specced standard wire protocols. They're a serialization layer for polyglot RPC request and response messages." [6]

the original article also complains about how protobufs prioritizes efficiency over other stuff.

A lot of people's complains about protobufs, eg lack of required fields (which i guess was changed later in v3, not sure? but the capnproto guy, kentonv, disagrees), seem to be relevant for actual 'standardized'/interoperable wire protocol. But protobufs should be thought of as just a utility to help serialize stuff. What protobufs seems to be for is more like when one client/server application, or one internal service, wants to communicate over the network with clients/servers that are part of the same codebase in a really efficient, but not necessarily interoperable, manner; and yet to allow the clients to be written in multiple programming languages.

So, for that use-case, yes we want flexibility to evolve the messages rather than strictness, and yes we want dirty-yet-efficent stuff.

"If all fields are required, you cannot have middleware that processes multiple versions of the same protobuf, and every application has to be updated when a field is added, even if they do not use that field." [7]

"... As implemented, required field checking occurred every time a message was serialized, not just when it was produced or consumed. Many systems involve middlemen who receive a message and then send it on to another system without looking at the content -- except that it would check that all required fields were sent, because that's baked into the protobuf implementation.

What happened over and over again is that some obscure project would redefine a "required" field to "optional", update both the producer and the consumer of the message, and then push it to production. But, once in production, the middlemen would start rejecting any message where these fields were missing. Often, the middleman servers were operating on "omnibus" messages containing bits of data originating from many different servers and projects -- for example, a set of search results might contain annotations from dozens of Search Quality algorithms. Normally, those annotations are considered non-essential, and Google's systems are carefully architected so that the failure of one back-end doesn't lead to overall failure of the system. However, when an optional backned sent a message missing required fields, the entire omnibus message would be rejected, leading to a total production outage. This problem repeatedly affected the search engine, gmail, and many other projects.

The fundamental lesson here is: A piece of data should be validated by the consumer, but should not be validated by pass-through middlemen. However, because required fields are baked into the protobuf implementation, it was unclear how they could follow this principle. Hence, the solution: Don't use required fields. Validate your data in application code, at consumption time, where you can handle errors gracefully.

Could you design some other version of "required" that doesn't have this particular problem? Probably. " [8]

in other words, at the level of the actual application layer protocol, it's often great to have things be required, but for protobufs, which is a library used for serialization, you don't want that, because sometimes the library users want to work with data that doesn't conform to any particular version of the application layer protocol; and sometimes you even want the real protocol to be 'omnibus'.

"...in a distributed system, you cannot update both sides of a protocol simultaneously. I have found that type theorists tend to promote "version negotiation" schemes where the two sides agree on one rigid protocol to follow, but this is extremely painful in practice: you end up needing to maintain parallel code paths, leading to ugly and hard-to-test code. Inevitably, developers are pushed towards hacks in order to avoid protocol changes, which makes things worse." [9]

again, the point here is that the application layer protocol itself may have required fields, but the utility used in implementations should not mandate this because you want to relax that requirement sometimes eg. when upgrading.

similarly with the metadata comment; required metadata may be great for the application layer, but not for a utility layer, because the messages may already contain, or may be inside of containers that contain the metadata.

"I don't consider Option<T> a particular "elaborate" data structure, but protobuf would benefit heavily from." [10]

a lot of people made the point that instead of having required fields or having permissiveness, you could use option types.

the original article made the point that in protobufs, not only is nothing required, but sometimes it's difficult for the client to tell if something was present or not.

and further, people made the point that in a protocol description, even if every field isn't required, some should be -- however, as per above, i think this is misunderstanding the purpose of protobuf; it is not supposed to be a prescriptive language for describing application protocols, it's supposed to be a lower-level utility for implementing serialization.

" derefr 2 days ago [-]

> every application has to be updated when a field is added, even if they do not use that field

This is when, in a protocol, you reach for the hammer called "extensibility."

In this case (decoding to native structs), you'd probably have your FooMessage? product-type have an "extensions" field, which is a list of zero or more FooExtensionStructs?, where a FooExtensionStruct? is a sum type of the known extensions to FooMessage?.

Then, just make the semantics of your format such that a sum type can have a pragma in its IDL indicating that it isn't required to be decodable for the product-type containing it to be successfully decoded. I.e., the sum type becomes a (DecodeResult? = Decoded(sum type)

Opaque(raw wire data)).

Opaque (unrecognized) extension data can't be manipulated by the program, but it can losslessly survive a trip back through the wire encoder. So you can choose to pass it through in your middleware, or not.

Boom: you've reinvented the "chunks" concept of the https://en.wikipedia.org/wiki/Interchange_File_Format, as seen in PNG and ELF.

reply

fixermark 1 day ago [-]

I believe you've also reinvented optional fields in a more generalizable way. Generalizability can be good or bad, depending on how much complexity it adds.

reply

derefr 1 day ago [-]

Nope—there's a very important difference in the two approaches. You have to choose to add the kind of extensibility I was describing to a particular type, in advance, as part of that type's original specification. You have no choice in having optional fields.

With Protobuf optional fields, even if you formally specify a protocol where some particular message is absolutely "final" and will never be extended, anyone can just throw an optional member on there when sending one to you, and your implementation won't reject the struct. This is a big problem if you're trying to specify types that really map one-to-one to fixed-size structs you'll use in the business logic of your non-memory-managed language.

With explicit extension points, you get to dictate where and when extension happens; and, as well, you get to formalize exactly what extensions look like (i.e. you don't have to allow for un-recognized extensions; you can simply have extensions be a finite set of known new extensions, where everything else is a real decode error.)

reply " [11]

this is another idea that sounds good for extensibility of application-level protocols

a lot of people say, look, in most applications, the right thing to do is to deserialize and serialize at application boundaries. Just because protobufs lets you operate directly on the received data object doesn't mean that you should.

the original article has a lot of good-sounding points that i don't completely understand about how composable product and coproduct types would be simple. i should re-read it and understand it more sometime.

---

" Next, we add another use line: use rand::Rng. The Rng trait defines methods that random number generators implement, and this trait must be in scope for us to use those methods. Chapter 10 will cover traits in detail. " [12]

---

(not about the type system so much as language implementation itself but...)

" Actual Formalization / Metatheory

Speaking of formal logic: an unfortunate fact about most languages -- even simple ones with very pedestrian type systems -- is that they usually ship with next-to-no formalization of their semantics, nor proofs that any such formalizations have any interesting metatheoretic properties (eg. soundness). We tend to just make do with testing and informal, whiteboard-and-email level reasoning about the systems, hoping that'll get us close enough to correct that it'll be ok.

This approach is, to say the least, wearing thin. Compilers still have serious bugs decades after they ship, failing to implement the languages they're supposed to. Worse, languages in the field still have serious design flaws decades after they ship, failing to uphold safety properties when subject to formal analysis. Long term, we have to get to the point where we ship languages -- and implementations -- with strong, proven foundations. There are promising moves in this direction, both in designed-for-language-designer tools like K framework or Redex, and in the general set of libraries and projects being undertaken in general proof assistants like Isabelle and Coq. " [13]

---

mcguire on Aug 19, 2017 [-] versioning.... Supporting multiple different versions of the same module at the same time is a wee-bit hairy and has only been done well enough (OSGi) to give an idea of the potential.

mcguire on Aug 20, 2017 [-]

As far as I know, most of the version work comes from Java-land, particularly OSGi, along with "semantic versioning".

For example, the difference between a minor version increment and a micro version increment is whether client code implements an interface provided by the module or uses an implementation coming from the module. (If you use an interface that has had methods added, you won't know about those methods; but if you implement the interface, you have to provide implementations for those methods.)

So, for example, if you are using "modules" A and B, where A uses a module C and B uses C' (which has a bumped micro number), you won't have a problem simultaneously using an instance c from C (returned by A) and an instance c' from C' (returned by B).

On the other hand, if you create an instance c (as from C) and pass it to C' through B, things don't work out. (Execution blows up. Did I mention that the rest of Java's modularization story sucks?)

---

mcguire on Aug 19, 2017 [-]

[Aside: Why do I have the Whiley (http://whiley.org/about/overview/) link marked seen?]

I was mildly curious why Graydon didn't mention my current, mildly passionate affair, Pony (https://www.ponylang.org/), and its use of capabilities (and actors, and per-actor garbage collection, etc.). Then, I saw,

"I had some extended notes here about "less-mainstream paradigms" and/or "things I wouldn't even recommend pursuing", but on reflection, I think it's kinda a bummer to draw too much attention to them. So I'll just leave it at a short list: actors, software transactional memory, lazy evaluation, backtracking, memoizing, "graphical" and/or two-dimensional languages, and user-extensible syntax."

Which is mildly upsetting, given that Graydon is one of my spirit animals for programming languages.

On the other hand, his bit on ESC/dependent typing/verification tech. covers all my bases: "If you want to play in this space, you ought to study at least Sage, Stardust, Whiley, Frama-C, SPARK-2014, Dafny, F∗, ATS, Xanadu, Idris, Zombie-Trellys, Dependent Haskell, and Liquid Haskell."

So I'm mostly as happy as a pig in a blanket. (Specifically, take a look at Dafny (https://github.com/Microsoft/dafny) (probably the poster child for the verification approach) and Idris (https://www.idris-lang.org/) (voted most likely to be generally usable of the dependently typed languages).

---

rtpg on Aug 19, 2017 [-]

The blurring of types and values as part of the static checking very much speaks to me.

I've been using Typescript a lot recently with union types, guards, and other tools. It's clear to me that the type system is very complex and powerful! But sometimes I would like to make assertions that are hard to express in the limited syntax of types. Haskell has similar issues when trying to do type-level programming.

Having ways to generate types dynamically and hook into typechecking to check properties more deeply would be super useful for a lot of web tools like ORMs.

mcguire on Aug 19, 2017 [-]

Check Idris (https://www.idris-lang.org/) and if you have a chance, work through the Idris book, Type Driven Development with Idris (https://www.manning.com/books/type-driven-development-with-i...).

charlieflowers on Aug 19, 2017 [-]

I believe F# is rich in features like this. I am only an F# outsider who has read about such features, but check out F# type providers.

---

dom96 on Aug 19, 2017 [-]

Interesting to see the mention of effect systems. However, I am disappointed that the Nim programming language wasn't mentioned. Perhaps Eff and Koka have effect systems that are far more extensive, but as a language that doesn't make effect systems its primary feature I think Nim stands out.

Here is some more info about Nim's effect system: https://nim-lang.org/docs/manual.html#effect-system

---

DonbunEf?7 on Aug 19, 2017 [-]

Indeed, I would say that capability-safety is the next plateau after memory-safety, but most languages are only barely glimpsing the capability horizon.

jerf on Aug 20, 2017 [-]

This is personally one of my major pet peeves, because I like to write secure code, and I think you very rapidly end up half-reconstructing capabilities work if you do, and it is not hard to notice that you basically get no help whatsoever from languages. It is still clearly a nearly-invisible problem to the vast majority of developers. You need something almost Haskell class simply to assert "Please, at least provide me some evidence that you ran some sort of permission check." to access a resource.

zerker2000 on Aug 30, 2017 [-]

For "you ran some sort of permission check" I'm a great fan of the React.js `dangerouslySetInnerHTML` interface, which accepts a corresponding `{__html: ...}` object that has at least nominally been properly vetted.

---

snrji 14 minutes ago [-]

I enjoy coding in Rust, and I think it's a step in the right direction, but I'm still looking for a general purpose programming language which model IO/State effects the same way languages supporting functional programming do with pure functions. Haskell monads are not granular enough. The closer thing I've seen is Purescript, but still is not general purpose enough. Solidity's "pure" annotations are a step in the right direction, but I'm speaking about something much expressive. Also, it should have a Rust/C-like syntax in order to be able to be mainstream, but I'm happy with Haskell-like syntaxes.

reply

---

i guess

https://clojure.org/about/spec

is like an optional/gradual runtime type system-ish thing for clojure

---

i get the sense that a bunch of languages now have single inheritance but multiple 'interface inheritance/traits'

---

i get the sense that a bunch of languages now have type inference within functions but require explicit types for function definitions. eg. rust does this. [14] says that Swift has "type inference, that doesn't include method definition, but I guess it's fine", and that golang "has type inference for variable declarations, but not function definitions".

---

throwaway487549 2 days ago [-]

I would love to have type-clases a-la Haskell (implicits with parametric polymorphism, which is dead-simple and well understood) and universal pattern matching everywhere, but this is, of course, just a dream.

I would love to have ML/Scala-style syntax for curried functions and function definition via pattern-matching with guards, which is also, it seems, out of questions.

Actuall, the more of ML a strict language gets in - the better.

What is really funny is that Bell Labs did a lot of ML research, especially on stdlib, but Go team is ignoring everything which is not of the C flavour. Pity.

Again, ML is absolutely wonderful, and type-classes are the biggest single major innovation since Smalltalk.

It is better to lean towards ML instead of heading towards Javascript.

reply

---

lwansbrough 5 days ago [-]

Java is not an example of a forward thinking language anymore. It has lagged behind C# in every major feature pretty much since C# arrived in 2000. The JVM has had solid performance improvements since forever, and may still be faster than the CLR, but the JVM still has some pretty glaring shortcomings coming from the .NET world. For example, Java generics (and their underlying implementation in the JVM) suffer from type erasure which makes for an awful developer experience in many scenarios. Java/JVM's story for asynchronous code is also pretty pathetic. This is stuff I don't even think about anymore when writing C# but I groan every time I have to do any "modern" programming in Java.

reply

bad_user 5 days ago [-]

My opinion is from the other side of the fence ... type erasure is perceived as a weakness of Java, however I believe it is one of JVM's best features, because it didn't cripple its runtime for other languages, this being one of the reasons for why other languages have been flourishing on the JVM (that and the tooling and the open source culture).

To understand why, we need to talk about 2 separate things:

---

1. Specialization is important for performance and when speaking of .NET, specialization for value types is subsumed in type erasure, but that's not necessarily the case. You can have compiler-driven or even runtime-driven specialization without reification.

In Scala for example we've had the "@specialize" annotation for a long time, with the compiler being able to specialize generic code just fine. It's not perfect, a better implementation eventually happened in Miniboxing [1] but it withered away due to lack of interest and the ongoing work happening in Dotty / Scala 3 and its newer TASTY distribution format, which should make specialization easier to accomplish.

Also there is on-going work to bring value types to the JVM and it's happening: http://mail.openjdk.java.net/pipermail/valhalla-spec-experts...

That said having specialization is pretty cool for fine control of the memory layout and Java developers have to resort to a lot of unsafe hacks for achieving the same thing. But if that cost was paid such that languages like Scala or Haskell could happen on top of the JVM, until they figure out how to do it such that everybody benefits, I think it was a cost worth paying.

Also consider that the lack of specialization forced the JVM 's engineers to get creative in other areas. For example the JVM has always been great at inlining code at runtime, even for megamorphic call sites. And the new GraalVM? has super impressive abilities to eliminate boxing at runtime, which works for dynamic languages too: https://www.graalvm.org/

---

2. Reification is in fact about adding info about type parameters at runtime. This aids in using reflection to make a difference between List<int> and List<string>, but people miss the forest from the trees.

As a matter of fact such reflection is only needed because languages like Java or C# have very weak static type systems, compared with other languages in the ML family. With an expressive type system, you never need reflection capabilities.

In Haskell for example the question of whether something is a List<string> or List<int> never, ever happens. In Scala you sometimes need it, but much rarely and you can get it via a compiler-generated `ClassTag?`, which is actually a much better approach, because it makes it clear in the signature, this being compile-time reflection. People also like being able to do "new T", however that need completely goes away via proper support for type classes, which both Haskell and Scala have, this being another special purpose band-aid.

And reification is actually a bad feature to have in the runtime, because it makes it hard for languages to support higher-kinded types, or to build dynamic languages.

F# does not do higher-kinded types and is less expressive for that reason than OCaml, Scala or Haskell and the primary reason for why it doesn't do higher-kinded types is because it would have to do type erasure by itself, thus forgoing the performance benefits and the interoperability it has with C#.

Ironically it is support for higher-kinded types in a language that increases its expressive capabilities to the point that you no longer need runtime reflection. In other words ... you're blaming Java for not having a band-aid that happened in C# due to their static type system being basically unsound and thus needing runtime guards and reflection. You quickly get over this once you'll start using a more expressive language ;-)

[1] http://scala-miniboxing.org/

reply

migueldeicaza 4 days ago [-]

Reflection is the least of the problems this solves.

The majority of the issues and gotchas listed in the generics FAQ for Java do not happen in C# at all. It is liberating.

Reflection is an advanced use case that some people use, but it is rare.

The lack of flourish in. NET overtime more to what happens when a company takes the reins of a stack and later stops working on it.

Microsoft for a while invested and designed JavaScript?, Python and Ruby versions for .NET. But when conditions changed, there was no interest to keep the projects going. These are all observations from an outsider at the time the projects were defunded.

Because of the siloed approach to development at the time, and the lack of an external county around those efforts, bootstrapping a community to drive those on their own proved to be very hard. Ruby mostly died, Python is barely surviving.

The mood in the ecosystem went from "we can build these and speed them up" to "this is an ongoing cost, let us rather interop with the real implementations rather than find constant catch up".

Meanwhile, languages that Microsoft did not build did flourish, like PHP

This corporate phenomenon deserves a blog post on its own

reply

i_s 4 days ago [-]

> The lack of flourish in. NET overtime more to what happens when a company takes the reins of a stack and later stops working on it.

But couldn't .NET not being a good target for dynamic languages be one of the reasons to stop working on it?

I keep hearing that .NET is a good platform for other languages, but there doesn't seem to be much empirical evidence for it (at least in dynamic languages).

reply

lwansbrough 5 days ago [-]

I don’t know about all that in all honesty. I’m not a language designer so I suppose I don’t care about the why. All I know is Java is really bad at fulfilling my generic typing needs. Pragmatism is really important for languages, because without it all you have is a complicated excuse for a bad user experience. It’s sort of like how if you were to optimize English by cutting out redundant letters and combining words instead of making new words, you basically end up with German, which many English speakers would agree is not as nice a language to look at. (Germans may disagree of course because they’re used to it.)

reply

bad_user 5 days ago [-]

I understand your point, however people complaining about Java are complaining about the language instead of the platform and the platform has evolved to support multiple languages.

Scala, Kotlin, Clojure, Eta (Haskell), JRuby are the ones I care about and they've been flourishing.

I'm a pragmatic at heart btw, but pragmatism sometimes leads to ignorance. It's important to see what's available on the other side of the fence without bringing the preconceptions bag with you. Because you can then actually use the metaphorical "best tool for the job".

It's what I'm doing myself, or at least trying and .NET is without a doubt evolving in a direction I like. And I can also write entire paragraphs of what's better on .NET, I just don't find the generics reification to be one of those things.

reply

bpicolo 5 days ago [-]

Is the suggestion that those languages can't easily written on CLR? There's a direct, maintained (by the one-man-army that is https://github.com/dmiller) port of Clojure to the CLR:

https://github.com/clojure/clojure-clr

reply

solarkraft 5 days ago [-]

This means that C# could be ported to the JVM, doesn't it?

reply

louthy 5 days ago [-]

> As a matter of fact such reflection is only needed because languages like Java or C# have very weak static type systems, compared with other languages in the ML family. With an expressive type system, you never need reflection capabilities.

The implication here is that you never need runtime type checking, which clearly isn't the case in ML family languages. Sum types for example have an abstract representation that must be resolved at runtime (for pattern matching).

C# also doesn't need reflection to resolve types (other than through the virtual route - i.e. inherited method resolution), this is a core feature of the CLR though (via `callvirt` [1] if I remember right) and not a weakness of C#'s type-system. C# bakes the concrete generic types into the generated IL.

[1] https://docs.microsoft.com/en-us/dotnet/api/system.reflectio...

reply

bad_user 5 days ago [-]

> Sum types for example have an abstract representation that must be resolved at runtime (for pattern matching).

Indeed, but that's plain tagging, which is basically an int and not the same thing.

"callvirt" is basically doing a virtual method lookup in a vtable. It's still at runtime, but we are talking about OOP polymorphism now and how late binding works. Languages with type classes don't need that either, but that's a separate discussion and note that I still like OOP.

> C# also doesn't need reflection to resolve types

No, the developers do, because the language is not expressive enough to express generic pieces of code without losing type info, so you need reflection for constraints (e.g. "new T" or "instanceOf" checks) and runtime guards for downcasting.

Classic example: express a "sum" function that works on any Array<A>.

And note that what F# is doing to solve this particular problem is a hack to workaround both the lack of higher kinded types and of type classes and that does not interoperate with C#, since it requires "inline functions" which are doing type erasure (and it's a big and ugly hack, speaking of ML, because inline functions are no longer values ;-)).

> C# bakes the concrete generic types into the generated IL.

Of course, but that too is part of the problem.

reply

louthy 4 days ago [-]

> No, the developers do, because the language is not expressive enough to express generic pieces of code without losing type info, so you need reflection for constraints (e.g. "new T" or "instanceOf" checks) and runtime guards for downcasting.

The type system is totally capable of it. The language doesn't make it easy, but the code below is essentially type-classes and class-instances if you squint. It's something the C# team are looking into now anyway, but the main point is the type system doesn't preclude it.

    public interface Newable<A>
    {
        T New();
    }
    public struct NewableThing : Newable<Thing>
    {
        Thing New() => new Thing();
    }
    public class Foo
    {
        public A Bar<NewableA, A>() where NewableA : struct, Newable<A> =>
            default(NewableA).New();
    }

> Classic example: express a "sum" function that works on any Array<A>.

    public interface Num<A>
    {
        A Zero();
        A Add(A x, A y);
    }
    public struct NumInt : Num<int>
    {
        int Zero() => 0;
        int Add(int x, int y) => x + y;
    }
    public struct NumFloat : Num<float>
    {
        float Zero() => 0;
        float Add(float x, float y) => x + y;
    }
    public static A Sum<NumA, A>(A[] xs) where NumA : struct, Num<A> =>
        xs.Fold(default(NumA).Zero(), (s, x) => default(NumA).Add(s, x));
    int sumInt = Sum<NumInt, int>(new [] {1,2,3,4});
    int sumFlt = Sum<NumFloat, float>(new [] {1.0,2.0,3.0,4.0});

> Of course, but that too is part of the problem.

That's a very hand wavey statement. It's the goal of all staticly typed compilers to resolve to concrete types at compile time. How is part of any problem? The CLR has the capability to generate new concrete types from generic types at runtime if required, but obviously tries to do as much of that work up front at compile time.

reply

jdoe2018 4 days ago [-]

You seem to mix up C# and CLR. The CLR can do type erasure if you want. It has an entire runtime called DLR to handle this stuff. Did you even know that? Type erasure was the only way before C# 2.0 when generics where introduced. The CLR can run Java just fine, we used to have J# before Oracle killed it through legal means. It has nothing to do with CLR technical limitations. You can run any fully fledged dynamic language on the CLR such as Python. Where is the practical problem here?

Next, you can't say that C# has a "weak" type system. It is mainly a strongly and statically typed by design. There are elements of weak typing in C# such as the dynamic keyword, and all C# type safety features can be bypassed if desired. It seems you prefer "weak" to mean "poor" but that is your definition, not the generally accepted one.

Further, it is completely false to indicate something to the effect that C#/CLR can't let other languages flourish on the platform. The CLR stands for "Common Language Runtime" and can run any language you can imagine, including weakly typed languages. It is a plain old finite stack machine at heart. Unlike JVM, the CLR is an ECMA standard and anyone can implement it and .NET sources are not GPL licensed.

Scala used to be available on CLR, and the reason it is not supported is more political than technical. You are again trying to make a technical justification for a political decision. Besides, Scala still only runs on JDK 8 which is funny because Oracle JDK will need a commercial license from 2019 to get patches so you're forced to switch to OpenJDK? or pay for a Scala runtime.

The Java community has talked about value types forever, we know that. Good luck unwinding that decision, it is not that easy.

The cost for unsafe hacks to let Scala and Haskell work had more to do about the fact that .NET was not cross platform at that time. That was not the ecosystem impetus for for having Scala run on JVM. It was more because Java as a language is aging and lacks innovation which is well recognized. The cross platform argument against the CLR is gone now, so there is little reason why we shouldn't be able to run Scala and Haskell on CLR instead if you are after those languages.

Also, the point about "forcing JVM engineers to get creative" is not really a solid argument at all. They just have to work around the mistakes they made in the past (no value types + runtime type erasure). We also know for a fact that escape analysis in JVM doesn't really cover that many cases anyway in practice. You need to look at the generated code to tell if it works or not. Besides, it is not something that the CLR cannot implement. On the contrary, there is active effort in CLR towards Object Stack Allocation (you can Google for it).

The general point that reflection is "only" needed because of a "weak static type system" is also not true. You seem to confuse the shape of the API for getting runtime type information with the need for getting such information in the first place. You can try to build a polymorphic serializer in C++ which also has run time type erasure, it is not as easy as in C#. Eventually we need to carry over type information anyway, no matter what language.

Reflection is not about "adding info about type parameters at runtime". It is much more than that. First, we don't "add information" at runtime for existing types, we can inquire about it. This is super useful for many scenarios. Second, we can generate code at runtime as well with reflection, that is, MSIL op codes. This is also super useful.

Your use case of "express sum function over Array<A>" is not the direction C# is heading. We have LINQ for the expressiveness part which Java doesn't have apart from some lookalike hacks. Also, for performance, your sum function is going to run at glacier speed compared with .NET Core 3.0 where we will have SIMD instructions and we can execute dedicated AVX code paths depending on memory alignment, in managed code. This stuff used to be C++ only until now. I don't think your Scala and Haskell implementations stand a chance to perform close to that. And reusing a sum function over a read only span is a matter of a one line call. I like the pureness argument but in practice we need to consider performance as well.

reply

---

cryptonector 2 days ago [-]

...

And sometimes innovations can have simplifying effects. For example, the various "effects" libraries for Haskell simplify I/O code compared to not having them.

reply

wasted_intel 2 days ago [-]

Rust 2018's non-lexical lifetimes are a good example of this.

jules 1 day ago [-]

It's quite difficult to simplify an existing language, even when adding a feature could theoretically simplify a language. For instance, higher kinded types are simple in principle: it's just removing an arbitrary restriction about what you're allowed to pass to a type constructor. However, when adding them to an existing language it doesn't always work that way. In the limit of removing type system restrictions you end up with dependent types, which are very simple in terms of the number of type system rules that they require, but try adding them to Java and you'll get a monstrously complicated beast.

reply

cryptonector 1 day ago [-]

You're not wrong. Rust is a very fluid language though, and now is the time to get these things right for Rust. Java feels like a lost cause.

reply

---

Ericson2314 2 days ago [-]

Also the Dependent Haskell work of Richard Eisenberg I'd argue is more important as an attempt to shrink the language than add expressive power. The main thing it does is combine existing language features, which only yields more constructs insofar as the old features had friction. You can think of it roughly as taking the convex hull (or probably better bounding parallelepiped) of the existing feature space.

Seen in this light, I've never such a thing attempted at this scale (GHC's size and GHC/Haskell's age), and very much commend it. Hope other things can do the same thing someday.

reply

---

fgonzag 30 days ago [-]

.net 2.0 had breaking changes, and it was absolutely justified. The ecosystem would not be where it is now without it. (.net 2.0 introduced reified generics, and required a major overhaul of the internals, with quite a bit of breaking changes)

what are reified generics vs. generics with erasure? This explains it pretty well: http://gafter.blogspot.com/2006/11/reified-generics-for-java.html

seems like they are saying that reified generics are when the runtime has (reflective-ish) access to the type of instances of a generic data structure (e.g. you can test if X is an instanceof List<String>), whereas with generics with erasure it doesn't. The point of having generics with erasure is apparently mostly just that you can upgrade a language without generics at all to one with generics with erasure without breaking backwards compatibility.

so it sounds like in a new language with generics, you'd probably want reified generics?

Rust by contrast has limited runtime reflection anyway so maybe this distinction doesn't apply? However at least one commentator says that type erasure means actually using a type superclass for various subclasses, which is not what Rust does:

" In Java, type erasure is relevant because it's as if the type parameters were never there. For example, you can store a String in an ArrayList?<Integer>, because there is no such thing as an ArrayList?<Integer> at runtime; it's just an ArrayList?. The only advantage to using generics in Java is that the compiler will insert casts for you where it's safe, whereas without generics, you'd have to type the casts yourself (and you could get them wrong). – Francis Gagné Sep 13 '15 at 4:19 " [15]

" ((in Rust)) When you use a generic function or a generic type, the compiler generates a separate instance for each distinct set of type parameters (I believe lifetime parameters are ignored, as they have no influence on the generated code). This process is called monomorphization. For instance, Vec<i32> and Vec<String> are different types, and therefore Vec<i32>::len() and Vec<String>::len() are different functions. This is necessary, because Vec<i32> and Vec<String> have different memory layouts, and thus need different machine code! Therefore, no, there is no type erasure.

If we use Any::get_type_id() (unstable for now, so must be compiled with a nightly compiler), as in the following example:

  1. ![feature(get_type_id)]

use std::any::Any;

fn main() { let v1: Vec<i32> = Vec::new(); let v2: Vec<String> = Vec::new();

    let a1 = &v1 as &Any;
    let a2 = &v2 as &Any;
    println!("{:?}", a1.get_type_id());
    println!("{:?}", a2.get_type_id());}

we obtain different type IDs for two instances of Vec. This supports the fact that Vec<i32> and Vec<String> are distinct types.

However, reflection capabilities in Rust are limited; Any is pretty much all we've got for now. You cannot obtain more information about the type of a runtime value, such as its name or its members. In order to be able to work with Any, you must cast it (using Any::downcast_ref() or Any::downcast_mut() to a type that is known at compile time.

" [16]

---

Julia is fast because...type-stability through specialization via multiple-dispatch

...

Type stability is the idea that there is only 1 possible type which can be outputtted from a method. For example, the reasonable type to output from *(::Float64,::Float64) is a Float64.

...

The upside is that Julia's functions, when type stable, are essentially C/Fortran functions. Thus ^ (exponentiation) is fast. However, ^(::Int64,::Int64) is type-stable, so what type should it output?

((Int64))

...

2^-5

Here we get an error. In order to guarantee to the compiler that ^ will give an Int64 back, it has to throw an error. If you do this in MATLAB, Python, or R, it will not throw an error. That is because those languages do not have their entire language built around type stability.

...

Now let's define our own exponentiation on integers. Let's make it "safe" like the form seen in other scripting languages:

function expo(x,y) if y>0 return x^y else x = convert(Float64,x) return x^y end end

((article shows that this generates much longer, less efficient assembly code than the built-in, type-stable ^))

...

Core Idea: Multiple Dispatch + Type Stability => Speed + Readability¶

...

There are a few things we learn from this. For one, in order to achieve this level of optimization, you must have type-stability. This is not featured in the standard libraries of most languages, and was choice that was made to make the experience a little easier for users. Secondly, multiple dispatch was required to be able to specialize the functions for types which allows for the scripting language syntax to be "more explicit than meets the eye". Lastly, a robust type system is required. In order to build the type-unstable exponentiation (which may be needed) we needed functionalities like convert. Thus the language must be designed to be type-stable with multiple dispatch and centered around a robust type system in order to achieve this raw performance while maintaining the syntax/ease-of-use of a scripting language. You can put a JIT on Python, but to really make it Julia, you would have to design it to be Julia.

...

Small Expansion of the Idea: Strict Typing

Type-stability is not the only necessity. You also need strict typing. In Python you can put anything into an array. In Julia, you can only put types of T into a Vector{T}. To give generality, Julia offers various non-strict forms of types. The biggest example is Any. Anything satisfies T:<Any (hence the name). Therefore, if you need it, you can create a Vector{Any}.

...

This leads to the performance principle: use strict typing whenever possible. There are other advantages: a strictly typed Vector{Float64} is actually byte-compatible with C/Fortran, and so it can be used directly by C/Fortran programs without conversion.

"

-- [17]

---

" In Rust, the Fn trait and friends (FnMut?, FnOnce?) are automatically implemented on eligible functions and closures. In case of a simple Fn case the restriction is that they must not modify their captured environment and can only borrow from it. "

---

https://areknawo.com/how-to-think-and-type-in-typescript/

---

[18]

---

https://ucsd-progsys.github.io/liquidhaskell-blog/

---

chriswarbo on Feb 16, 2016 [-]

> Rust (and Haskell) is not easy to _type-check_ though.

Is that really the case? It seems to me that a Prolog-like resolution system, coupled with a constraint solver, would get most of the job done with little effort.

There are certainly many rules to keep track of, but in some cases the newer rules are strict generalisations of the older rules. For example, Haskell's original type classes are just a special-case of modern multi-parameter, flexible instances/contexts, etc. type classes. Likewise, we can implement constraint kinds, type class instance resolution and type equalities using one constraint solver.

sklogic on Feb 16, 2016 [-]

Hindley-Milner is already quite Prolog-like. But you're right, CLP(fd) rocks for typing.

My usual approach to implementing a type system is to derive a flat list of Prolog equations out of the AST and leave it to Prolog for solving. If you ask what to do with error messages, I've got a comprehensive answer, but it is not for a mobile phone typing.

another_user on Feb 17, 2016 [-]

I'd really appreciate further comments on what to do with type errors for prolog/minikanren-like tools as typecheckers

sklogic on Feb 18, 2016 [-]

Ok, I've got a proper keyboard now.

My approach to the typing error reporting is quite compatible with the Prolog codegen. What I do: for each AST node which produce one or more type equations I record the location data (cache it and give it an index), and then issue a Prolog term like this: with_location(Id, Term), for each of the terms generated by this location.

with_location semantics is simple, but may depend on the presence of a backtracking. If it's a plain Hindley-Milner, there is no backtracking, and it simply do call/1(Term), continue if true, store Id and the failed term with all the current bindings and fail/0() if false.

If there is a backtracking, I store a stack of location ids and failure reasons instead, but this required patching the Prolog execution engine a little bit (luckily, I'm using my own Prolog for this).

Now, the next step is to decipher the failed term. Most of the equations are in a trivial form: equals(A, B), so I simply report that prolog_to_type(A) failed to unify with prolog_to_type(B). For the other (rare) kinds of equations I define custom pretty-printing rules.

another_user on Feb 18, 2016 [-]

Thanks, looks nice and pretty straightforward, will try it out

---

some kinds of tricky dynamicity:

" Dynamic constant references

Sorbet does not support resolving constants through expressions. For example, foo.bar::A is not supported—all constants must be resolvable without knowing what type an expression has. In most cases, it’s possible to rewrite this code to use method calls (like, foo.bar.get_A).

Dynamic includes

Sorbet cannot statically analyze a codebase that dynamically includes code. For example, code like this is impossible to statically analyze.

Dynamic includes must be rewritten so that all includes are constant literals.

Missing constants

For Sorbet to be effective, it has to know about every class, module, and constant that’s defined in a codebase, including all gems. Constants are central to understanding the inheritance hierarchy and thus knowing which types can be used in place of which other types. "

---

Sorbet's type system (these are all links at [19]):

Type System

The complete type system reference can be found to the left. Here’s a quick table of contents:

    Integer, String, T::Boolean – Class Types
    T.nilable – Nilable Types
    T.any – Union Types
    T.let, T.cast, T.must, T.assert_type! – Type Assertions
    [Type1, Type2] – Tuple Types
    {key1: Type1, key2: Type2} – Shape Types
    T.untyped
    T.noreturn
    T.type_alias – Type Aliases
    T::Array, T::Hash, T::Set, T::Enumerable – Generics in the Standard Library
    T.proc – Proc Types
    T.class_of
    T.self_type
    T.all – Intersection Types

---

 losvedir 1 day ago [-]

> Your Ruby program will fail to run for stupid reasons the first time you run it, which is not the experience Java and Go and Rust programmers have.

I know exactly the feeling, but I don't think it's just the types that give it but that it's compiled. I similarly moved on from ruby after 5 years but to Elixir and was similarly surprised to (re-) learn that a simple compile step catches so many braindead typos, misnamed variables, not imported function calls, and the like.

Types go above and beyond this, but I think there's a huge step up just going from interpreted to compiled.

reply

---

I regard the ML featureset as table stakes for a language these days; while most languages (all serious languages?) have first-class functions, proper sum types are so useful that I don't want to use a language without them. That cuts the list of possibilities down quite a lot: in terms of having enough maturity/popularity for production use it probably comes down to OCaml (which becomes the default option by seniority), Haskell, F#, Scala, Rust, Swift, or Kotlin. D or Nim just don't offer anything compelling that's worth giving up sum types for, and while Swift or Kotlin can more or less equal OCaml they don't offer a compelling advantage.

Any of OCaml/Haskell/F#/Scala/Rust is a defensible choice. Haskell and Scala offer higher-kinded types, which are immensely useful once you get used to them. F# and Scala offer decent IDEs/tooling in a way that the others mostly don't. Rust offers a limited form of linear typing, but as a special case built into the language rather than as functionality emerging from a general type system; you can achieve a similar level of resource safety with a rank-2 type trick (famously used in ST) in a language that supports those (i.e. Haskell or Scala).

I use Scala for everything these days. Better-than-Python dev time, better-than-Rust type system, Nim/Java/C#/Go/D/Kotlin-like performance, and first-class Java interop (which I consider better than C/C++ interop, because using C/C++ interop makes your whole program memory-unsafe). Bonus of a compile-to-JS implementation that just works with the same code that you can run in your IDE.

reply

gdy 8 hours ago [-]

"higher-kinded types, which are immensely useful once you get used to them"

Could you give an example of such immense usefulnes?

reply

lmm 7 hours ago [-]

All of the problems that aspect-oriented programming tries to solve, but without any reflectioney magic. So for example I have a custom type to represent database operations that need to happen within a transaction; the type system ensures that there will be a transaction boundary around them eventually, but I can still compose together several different such functions and run a single transaction around them all. In theory you could do this with a "command object", but in a language without HKT you'd have to reimplement all the utility functions that make it practical to use those objects (e.g. traverse, which takes a list of must-happen-in-transaction commands and combines them into a single command that will evaluate to a list) and in practice people doing this kind of thing in Java/C#/Python/... give up and fall back to reflection/AOP/metaclasses/decorators, because it's too cumbersome to manage the transaction boundaries explicitly. But then you get things like http://thecodelesscode.com/case/211 happening.

With HKT you can have reusable, generic functions that work on any kind of effect - all the examples on https://philipnilsson.github.io/Badness10k/escaping-hell-wit... and more. E.g. a tree structure library will offer a traversal method that can already handle doing an effectful operation at each level and composing together the effects correctly - even if it's an effect that didn't exist when that library was written. That makes it practical to manage these effects/cross-cutting concerns explicitly, which enables fearless refactoring (it's always clear whether you can e.g. reorder some statements without changing behaviour or not), so you need much less test coverage for the same level of confidence which in turn makes refactoring even easier, and you end up with a virtuous cycle where your code stays clear because it's always easy to refactor for clarity.

reply

---

Ruby Steep type annotations (contrast with Sorbet type annotations):

https://github.com/soutaro/steep

---

jasonhansel 15 hours ago [-]

IMHO if C had generics then so many problems could be fixed (e.g. you could easily have a struct Slice<T> for a fat pointer). Existential generics would even give you OOP (and, in a roundabout way, closures) for free. Hmm...

reply

JoachimSchipper? 13 hours ago [-]

C11 has _Generic.

reply

jasonhansel 13 hours ago [-]

_Generic is something of a misnomer; it's more like a special form of overloading than a general way of adding type parameters.

C++ is a bit better with templates, but lacks type erasure (hence long compile times) and existential generics (so OOP isn't just syntactic sugar).

reply

---

existential types vs classes:

" Let’s create a type for stacks. What operations do we need? We need push and pop, of course, but we also need the ability to create an empty stack, so we have a base to work with. A more complete implementation would include further operations, such as peek and reverse.

Here’s the type we’ll use:

exists 'a t. { emp: 'a t, push: 'a t * 'a -> 'a t, pop: 'a t -> 'a option }

Here, I’m specifying that the hidden type is in fact a type constructor t parameterized by another type 'a. This means that the t is still hidden, but any 'a can be supplied by the caller, to get e.g. a stack of integers, or a stack of strings, or even a stack of stacks of strings!

Here’s a stupid-simple example implementation:

pack 'a list as {emp: [], push: fn (elem, stack) => elem::stack, pop: fn stack => case stack of x::xs -> Some x

None

Here, out implementation type is 'a list (which I’m assuming for brevity is built into the language, though we could define it ourselves in a few lines of code). Push is simply cons, and pop is pattern-matching. Note that fn is the notation for creating an anonymous function.

To create the stack equivalent of “Hello, world!”, we could write:

my_list.push ("world", my_list.push ("Hello", my_list.emp))

Now, why would anyone want to do this? Why are these types and constructs useful? Well, the astute reader may have noticed that they look suspiciously like objects. If you did, I award you one (existent but hidden) cookie. The fact is that existential types and classes are both ways to define abstract data types, as I mentioned earlier.

But I believe existential types are nicer, and here’s why (here comes the opinion part of the article — if you‘re not interested, you may skip the next few paragraphs.). Classes are existential types bundled up with several other concepts that ought to be separate. In my mind, and to the best I can summarize many wildly different interpretations and implementations of objects in numerous languages, they are:

There’s nothing wrong with any of these ideas (besides inheritance). They are all useful tools in the programming toolkit. But tying them all up together doesn’t provide any benefit in my eyes. This is akin to how many languages combine the definition of a function with the naming of a function. But often we just want to define a function without naming it (anonymous functions are quite useful in functional programming), and many languages now allow this. Analogously, sometimes I would like to use abstract data types, and just abstract types.

Anyway, now that I’ve stepped off my soapbox, let’s get to real-world programming. What languages implement existential types (in a form similar to that given here, and not as objects)? The ML family does (including SML and OCaml). The ML module system allows us to create existential types, but “one level up”. This means that existentials in ML are not first-class: namely, they can’t be passed to or returned from functions. However, there are in fact also functions “one level up” in ML called functors (no relation to the concept from category theory) that do allow such manipulation.

Below, I’ll write out the existential type for a stack as an OCaml signature:

module type STACK = sig type 'a t val emp: 'a t val push: 'a * 'a t -> 'a t val pop: 'a t -> 'a option end

And an implementation:

module Stack: STACK = struct type 'a t = 'a list let emp = [] let push (elem, stack) = elem::stack let pop stack = case stack of x::_ => Some x

end
[] => None

Note that in ML, modules not only allow the creation of existential types, but also perform the usual function of organizing code into hierarchical namespaces.

And actually, I lied before: while SML doesn’t support first-class modules, OCaml does. This means we can use existenials in much the same way I have laid out in this post. To create a first-class module from a normal module, we would write:

let my_stack = (module Stack: STACK) " -- [20]

---

i disagree with this post:

https://lukepalmer.wordpress.com/2010/01/24/haskell-antipattern-existential-typeclass/

(i guess my disagreement is along the lines of what commentor Navid said: "The whole point of type classes is, that you can have functions that can process any type with an instance of that class *and* return that type too! In this example, once Window has become a Widget, it will never be a Window again.")

but it makes a useful point in demonstrating how verbose the typeclass syntax is in Haskell when you want it to accomplish the same thing as a 'data'.

---

" It is confusing but there are three, not two types of immutability:

    Mutable - you are supposed to change the collection (Kotlin's MutableList)
    Readonly - you are NOT supposed to change it (Kotlin's List) but something may (cast to Mutable, or change from Java)
    Immutable - no one can change it (Guavas's immutable collections)"

also note that with immutability, you generally want the language to implement transparent structural sharing

---