proj-oot-ootNotes41

draft

What are some global properties of programs/programming languages to consider during programming language design? For example, temporal memory safety.

I'm particularly interested in concurrency-related global properties. Possible examples:

I feel like the zoo of concurrency paradigms could be organized according to which paradigms guarantee which properties.

---

"A simile that just struck me: A fascination with 6502 assembly or CP/M is akin to building your own suit of armor and re-enacting medieval jousts (a la the SCA.) Designing your own “clean-slate” virtual machine and applications, or building atop someone else’s, is more like escaping into medieval fantasy worlds (a la Lord Of The Rings.)

Neither of those are bad, of course! I love me some escapism. But neither has anything to do with the world today or the future, or has any real purpose other than fun. The future, even post-apocalyptic, is not going to be like the Middle Ages nor Middle Earth, and your homemade plate armor will not save you from a survivalist toting a rifle. Nor is Rivendell or Narnia a guideline for a better tomorrow. " -- snej

---

" When you’re building string parsers, such as flags from a command line, it’s useful to be able to use untagged unions composed of all possible flags. The Derw compiler, written in TypeScript?, uses these in a couple of places: for example, when parsing target output for the compile (ts/js/derw/english/elm). So this feature was pretty nice to have, especially during a rewrite of the compiler that is currently underway (currently at 42% Derw, 58% TypeScript?). "-- [1]

---

i only skimmed this but it's a plea for staticly-linked compiled languages for dev tooling, so that the end-user's tooling isn't broken by dependency problems when there is an update:

https://borud.no/dev/2022/embedded-tooling/

"

21 Student 18 hours ago

link flag

Tldr the author likes tools that are statically compiled executables and (reading between the lines) seems to be installing all their Python tools in a single Python installation rather than one installation per tool (aka a venv or its equivalent).

In fairness to the author if you use a Linux distribution that attempts to shove everything into your system installation, you will have a bad time. Python also doesn’t come built in with a tool that neatly solves this problem, soluble though it is; and there are a bunch of third party tools to solve this very problem.

    6
    kornel 3 hours ago | link | flag | 

For someone who isn’t into Python, the whole thing with environments is alien. It adds extra steps to installation, and extra steps when using Python.

I suspect it doesn’t seem like a big deal if it’s your standard procedure, but when I need to use Python once every couple of years, this is my experience: https://xkcd.com/1987/

    ~
    Student 1 hour ago | link | flag | 

Right, it’s not completely unfounded

"

---

"Rust's ownership checker is probably the most common example of "no analogue in a more common language," but this can also mean Mercury's determinism checks, or Pony's "reference guarantees")." https://www.mercurylang.org/information/doc-latest/mercury_ref/Determinism.html#Determinism-categories https://tutorial.ponylang.io/reference-capabilities/guarantees.html#rights-are-part-of-a-capability -- [2]

---

why Rust is used for CRUD apps sometimes:

"On balance, I would say Rust’s “quality of implementation” story is ahead of mature languages, and that’s exactly what creates this “let’s write CRUD in Rust” pressure...Really, it seems that there’s “Go with enums / OCaml with channels and Cargo”-shaped hole in our language-space these days, which is pretty imperfectly covered by existing options. I really wish that such language existed, this would relieve a lot of design pressure from Rust and allow it to focus on systems use-case more." -- [3]

---

[6]

---

nil edited 1 month ago

link

Some examples of “ML-like” features include…

    Sum types as a primary data modeling tool, in some cases even displacing records for small-ish data structures
    First-class functions and some use of recursion, often used to constrain and thereby elucidate control flow. This is sometimes in lieu of imperative-/procedural-style constructs such as…
        If/else statements (not if/else expressions)
        Switch/case statements (not if/else expressions)
        Goto, which has basically been a strawman against this style of programming for the past decade or so
        Defer, which though it allows code to be written out-of-order, still introduces a discrete procedural “step” to computation
    In languages like Javascript and Gleam, this extends to the use of a syntactic construct you may know as a callback function.
    Type inference (usually Damas–Hindley–Milner type inference)
    Other idioms that, like the above, help to obviate and/or discourage the use of global state in implementations as they scale

There are plenty of ML-like languages for different runtimes, including a few that are used for systems programming.† Languages often described as “ML-like” include…

    Scala, ‘an ML for the JVM’
    F#, ‘an ML for the CLR’
    Elm, which also takes some inspiration from Haskell while not going quite as far with the generics, apparently for sake of error message readability
    Facebook’s Reason, which is sometimes even called ReasonML for clarity
    Rust, one of the only systems programming languages to have this distinction. Check for the features above if you don’t believe me!

Haskell is explicitly inspired by ML, but is often considered its own category due to the radical departures it makes in the directions of a) purely functional programming and b) requiring (in most cases) the use of monads to represent effects in the type system.

† My educated guess: This is largely because the core syntax and feature-set is relatively well understood at this point. As such syntactic sugar is rarely necessary in order to express intent directly in efficient code. This is unlike in “dynamic” languages such as Python, Ruby, and Elixir, which tend to make liberal use of metaprogramming in order to make the syntax more directly express intent. This can often make it unclear what is actually happening to make a given piece of code run.

    3
    Casperin 1 month ago | link | 

I find it interesting that nothing on that list is inherent to functional languages. All these sweet goodies might as well exist in an imperative language, but outside of Rust, they don’t.

I’m still sad Go completely missed the boat on that one.

    1
    nil edited 1 month ago | link | 

Yup. Maybe someday we’ll have an approach in between those of Go and Rust; that’s some of what I’m looking to Gleam for, even if it’s not primarily a systems programming language.† In the meantime, we have the sometimes-insightful, sometimes-regressive minimalism of Go; and the sometimes-insightful, sometimes-overwhelming maximalism of Rust.

† It is my broad understanding that the BEAM VM—on which I expect most* Gleam code will run—helps concurrent applications scale by denying granular control of threaded execution. This can be thought of as vaguely similar to the Go runtime’s decision to have a blessed implementation of cooperative multitasking, namely goroutines. In contrast, the Erlang ecosystem benefits from having a blessed provider and model (respectively) for all concurrent computation, thanks to the OTP supervision tree combined with the actor model of concurrent computation. It takes power away from the developer, but to the benefit of the system at large. Boy, is it ever exciting that we might actually have a statically-typed language built atop that excellent multitasking infrastructure!

singpolyma 1 month ago

link

It honestly depends what one even means by “functional language” lots of ML and ML-like things exist which are not particularly functional including possibly: SML, OCaml, Swift, Scala, Kotlin, Haxe

    2
    nil 1 month ago | link | 

What’s not-functional about SML, OCaml, and Scala? Are you perhaps comparing them to the purely functional Haskell?

    1
    singpolyma 1 month ago | link | 
            What’s functional about them? Every language I listed is equally functional and not-functional depending how you use it. They’re all ML variants after all.

2 agent281 1 month ago

link
    Facebook’s Reason, which is sometimes even called ReasonML for clarity

Credit where credit is due: Reason is a syntax extension for OCaml. It’s not a whole cloth creation by Facebook.

---

https://gleam.run/news/v0.25-introducing-use-expressions/

10 elldritch 1 month ago

link

Oh, this is monadic do-syntax! It’s always awesome to see a more conventional language adopt generalized monads.

(For the uninitiated, the basic premise of monads is “a thing that is then-able”, and it looks like Gleam’s flavor is “a function that takes a callback as its final argument”.)

In JavaScript?, the equivalent is async/await, which is a monadic do-syntax that’s hardcoded to the Promise monad. It’s awesome that Gleam has chosen to provide a general do-syntax here rather than taking the common route of only providing hard-coded support for specific common monads.

6 rs86 1 month ago

link

Do-notation! Wonderful

3 quad edited 1 month ago

link

My knee jerk reaction is to lament the loss of indentation to visually represent scope.

But deep indentation is also a readability problem. For example, Python’s solution to have special case syntax to initialise immediately adjacent contexts in one long statement. In contrast, perhaps Gleam is right and a variable’s lifetime rather than indentation is the way here!

2 nil 1 month ago

link

Huh. Do-notation but adapted to callbacks. Fascinating. Loved the examples, too. Excited to see what this opens up!

---

" A lot of the pain in the netaddr.IP article is caused by:

    Go not having sum types — making it really awkward to have a type that is "either an IPv4 address or an IPv6 address"
    Go choosing which data structures you need — in this case, it's the one-size-fits-all slice, for which you pay 24 bytes on 64-bit machines.
    Go not letting you do operator overloading, harkening back to the Java days where a == b isn't the same as a.equals(b)
    Go's lack of support for immutable data — the only way to prevent something from being mutated is to only hand out copies of it, and to be very careful to not mutate it in the code that actually has access to the inner bits.
    Go's unwillingness to let you make an opaque "newtype". The only way to do it is to make a separate package and use interfaces for indirection, which is costly and awkward.

" ... Because function signatures don't tell you much of anything (does this mutate data? does it hold onto it? is a zero value there okay? does it start a goroutine? can that channel be nil? what types can I really pass for this interface{} param?), you rely on documentation, which is costly to update, and costlier still not to update, resulting in more and more bugs. -- [7]

---

matklad 1 month ago

link

Oh, a topic near and dear to my heart. My experience here is:

    building IntelliJ Rust at JetBrains, using industrial state-of-the art tools tools
    building rust-analyzer using the tools built from-scratch specifically for rust-analyzer

My conclusion so far is:

    if you want to support many different languages in the same tool quickly, go for tree-sitter
    if you want to support a single language in many tools, go for hand-written everything

The difference between compiler backend (LLVM) and frontend (hypothetical meta tool) is that, from the back, all the languages look kinda the same, so you can re-use a lot of code, while from the front languages are meaningfully different in interesting ways, so opportunities for code-sharing are smaller. From rust-analyzer/intellij-rust experience, doing everything from the first principles is not much more labor intensive than using libraries. In particular, writing an error-resilient lossless parser is not hard! Anyone can do that if they know how. It’s a lot of code, but its very simple, repetitive code which you just type once, test, and then forget about.

And when we get to stuff like name resolution or type-systems, the languages are different enough that you only can share libraries (fast hashmap and DSU) rather than frameworks. In particular, depending on the intrinsic properties of the language, you might want to go for three completely different architectures for a language server: https://rust-analyzer.github.io/blog/2020/07/20/three-architectures-for-responsive-ide.html.

At the same time, if you are writing, eg, a code hosting platform, and want to support “best effort” functionality for a lot of languages, shoveling them into the single parser -> indexer -> name resolution pipeline for an “average language” is very effective. Basically, we should have this.

    13
    matklad 1 month ago | link | 

I guess, an extra set of conclusions is:

    someone needs to write an “IDE dragon book”, as know-how/culture seems to be the main bottleneck. It’s not like it was only five years ago that we realized how to do tooling, it was just confined to JetBrains until MS did the LSP.
    the tooling requirements should permeate back into language design. The shape of the language you are working with really defines what you could do at the tooling layer. Most of complexity of rust-analyzer is to work-around the hard language. Carbon would be (if it actually would be a thing) an order of magnitude easier to write an IDE for.

-- https://lobste.rs/s/myyznl/tooling_for_tooling --- " Web 2.0 never happened: it ended up being a trap that swallowed the best minds of a generation to web (ad) agencies with countless millions of hours lost fighting CSS rules and Javascript build tools to replicate functionality that was readily available in 1980s MS Word and desktop publishing apps. It should have been something like single-threaded blocking logic distributed on Paxos/Raft with an event database like Firebase/RethinkDB? and layout rules inspired by iOS's auto layout constraint solver with progressive enhancement via HTMX, finally making #nocode a reality. Oh well.

" -- [8]

---

Cool program from https://stackoverflow.com/questions/1261557/whats-the-most-useful-thing-youve-done-in-less-than-50-lines-of-clojure?rq=1 :

(use '(clojure.contrib java-utils)) (defn make-thumbnail "Given an input image (File, URL, InputStream?, ImageInputStream?), output a smaller, scaled copy of the image to the given filename. The output format is derived from the output filename if possible. Width should be given in pixels." ([image out-filename width] (if-let [format (re-find #"\.(\w+)$" out-filename)] (make-thumbnail image out-filename width (nth format 1)) (throw (Exception. "Can't determine output file format based on filename.")))) ([image out-filename width format] (let [img (javax.imageio.ImageIO?/read image) imgtype (java.awt.image.BufferedImage?/TYPE_INT_RGB) width (min (.getWidth img) width) height (* (/ width (.getWidth img)) (.getHeight img)) simg (java.awt.image.BufferedImage?. width height imgtype) g (.createGraphics simg)] (.drawImage g img 0 0 width height nil) (.dispose g) (javax.imageio.ImageIO?/write simg format (as-file out-filename)))))

this other answer to that question (about short useful Clojure programs you've wrote) is also interesting:

" I'm often frustrated by the REPL stacktrace function only displaying eight lines. So this is now in the development file for all my projects:

(defn stack [n] (clojure.stacktrace/print-stack-trace (clojure.stacktrace/root-cause *e) n))

I want argmin and argmax all the time.

(defn argmin ([f x] x) ([f x y] (if (< (f x) (f y)) x y)) ([f x y & more] (reduce (partial argmin f) (argmin f x y) more))) (defn argmax ([f x] x) ([f x y] (if (> (f x) (f y)) x y)) ([f x y & more] (reduce (partial argmax f) (argmax f x y) more)))

And, finally, for evaluating algorithms robustly I use this function on datasets a lot:

(defn kfolds "Given an integer k and a collection of data, this partitions the data into k non-overlapping collections, then returns a list of length k, where the ith item is itself a list of two items: (1) the union of all but the ith partition (2) the ith partition. If (count data) is not divisible by k, a few points (< k) will be left out." [k data] (let [total (count data) fold-size (int (/ total k)) folds-test (take k (partition fold-size fold-size [] data)) folds-train (map #(apply concat %) (map #(take (dec k) (drop % (cycle folds-test))) (range 1 (inc k))))] (map list folds-train folds-test))) "

---

"Need to embed your code anywhere? GC is probably not an option for you." ... "While JavaScript? is great in some aspects (it’s the first mainstream language with lambdas!), it surely isn’t hard to imagine a trivially better version of it (for example, without two different nulls)." ... "Null-terminated strings are just a bad design" ... "Let’s look at the 90’s popular languages...On this list, Java is the only non-dynamic cross-platform memory safe language. That is, Java is both memory safe (no manual error-prone memory management) and can be implemented reasonably efficiently (field access is a load and not a dictionary lookup). This seems like a pretty compelling reason to choose Java, irrespective of what the language itself actually looks like." ... "One can argue whether focus on simplicity at the expense of everything else is good or bad, but statically linked zero dependency binaries definitely were a reason for Go popularity in the devops sphere. In a sense, Go is an upgrade over “memory safe & reasonably fast” Java runtime, when you no longer need to install JVM separately." -- https://matklad.github.io/2020/09/13/your-language-sucks.html

---

" Language Design for Locality

There’s a very important language property that an IDE can leverage to massively improve performance:

What happens inside a function, stays inside the function

If it is possible to type-check the body of a function without looking at the bodies of other functions, you can speed up an IDE by drastically reducing the amount of work it needs to do.

Rust mostly conforms to this property, but there are a couple of annoying violations:

    local inherent impls with publicly visible methods.
    local trait impls for non-local types.
  1. [macro_export] local macros.
    local out-of-line modules.

If we want to have fast & correct IDE support, we should phase out those from the language via edition mechanism.

" -- https://rust-analyzer.github.io/blog/2020/05/18/next-few-years.html

---

"tip: mark variables with icons representing their function: fixed value, stepper, flag, walker, most recent holder, most wanted holder, gatherer, container, follower, organizer, temporary" -- https://gitlab.com/dahjelle/notes-from-the-programmers-brain/-/blob/main/README.md https://en.wikibooks.org/wiki/A-level_Computing/AQA/Problem_Solving,_Programming,_Data_Representation_and_Practical_Exercise/Fundamentals_of_Programming/The_Role_of_Variables

---

a few good ideas in here:

https://nim-lang.org/blog/2023/08/01/nim-v20-released.html

---

model-view-controller: cleanly decompose program into state/storage, pure functions, and control

---

doug-moen 30 hours ago

link flag

While trawling through 1960’s programming history, I recently realized that BCPL, from 1967, the ancestor of C, had a lot of “advanced” features that were omitted from C, such as:

    Length prefixed character strings,
    The ability to write function definitions in any order, so you can call a function that is defined later in the file,
    Nested functions,
    Block expressions, which comprise a sequence of statements and local variable definitions, followed by an expression whose value is the result of the block expression.

These features weren’t exactly rare in the 1960s. BCPL got them from CPL, which got them from Algol. LISP had them too, but see, BCPL was a low level systems programming language, used to write compilers and operating systems, just like C.

But C omitted these features, and we’ve had to wait for Rust in order to once again have a popular systems language with these kind of features. So this is one of the negative effects that Unix had on computing culture: it popularized C, and held back the evolution of systems programming languages for many decades.

~ lproven 16 hours ago

link flag

This is very odd. I read your comment:

    But C omitted these features

And thought “hang on, someone was talking about BCPL just a day or 2 ago on Lobsters, and they explained why they were omitted – they wouldn’t fit into the 8kB of RAM in the original hardware Unix was implemented on.

So, I looked up that comment…

And it was you!

Why would you post such an ostensibly critical comment while leaving out the vital context?

    ~
    doug-moen 12 hours ago | link | flag | 

C omitted the features for a reason, but the outcome still sucked. Once C became popular, it became the template for how new languages should be designed. I remember being really disgusted by how Python picked up so many of C’s bad design decisions, for example. And these older features were no longer well known by a new generation of language implementors, so they were often left out of C’s successors.

---

"Learning different programming languages is one of the best way to improve your programming skills...In general, you want to cover big families of languages: Python, Java, Haskell, C, Rust, Clojure would be a good baseline. Erlang, Forth, and Prolog would be good additions afterwards." -- https://matklad.github.io/2023/08/06/fantastic-learning-resources.html

" Chat Server

    An exercise in networking and asynchronous programming. Multiple client programs connect to a server program. A client can send a message either to a specific different client, or to all other clients (broadcast). There are many variations on how to implement this: blocking read/write calls, epoll, io_uring, threads, callbacks, futures, manually-coded state machines."

---

"just learn to use maps and vectors, and learn to divide programs into functions" -- https://lobste.rs/s/sra8zc/fantastic_learning_resources#c_kjjefh

---

"[?] Function properties (const, pure, noalloc, etc)" -- [9]

---

https://eluaproject.net/doc/v0.9/en_arch_ltr.html

" LTR (Lua Tiny RAM) is a Lua patch (written specifically for eLua by Bogdan Marinescu) that significantly decreases the RAM usage of Lua scripts, thus making it possible to run large Lua programs on systems with limited RAM ... Details

The patch adds two new data types to Lua. Both or them are based on the lightuserdata type already found in Lua, and they share the same basic attributes: they don't need to be dynamically allocated (as they're just pointers on steroids) and they're compared in the same way lightuserdatas are compared (by value). And of course, they are not collectable, so the garbage collector won't have anything to do with them. The new types are:

    lightfunctions: these are "simple" functions, in the sense that they can't have upvalues or environments. They are just pointers to regular C functions. Other than that, you can use them from Lua just as you'd use any other function.
    rotables: these are read-only tables, but unlike the read-only tables that one can already implement in Lua with metamethods, they have a very specific property: they don't need any RAM at all. They are fully constant, so they can be read directly from ROM. They have a number of special features and limitations when compared with a regular table:
        rotables can only contain values of type "lightfunction", lua_Number or pointers to other rotables.
        you can't add/delete/modify elements from rotables (obviously). However, rotables will honour the "__newindex" metamethod.
        you can use rotables as metatables for both "regular" tables and for Lua types (via debug.setmetatable)
        a rotable can have another rotable (or itself) as a metatable
        you can iterate over rotables with pairs/ipairs/next just as you do with "regular" tables.

Just as with lightuserdata, you can only create lightfunctions and rotables from C code and never from Lua itself. " ---

" When you do language design, you basically have an (M operator x N data types) problem. When you subtract 1 from N, it gives you M less work :) And that’s significant.

I have a blog post draft explaining this concept, requires login - https://oilshell.zulipchat.com/#narrow/stream/266575-blog-ideas/topic/N.20Perlis-Thompson.20Problems.20in.20Language.20Design

Examples:

    float/int, or just float
    signed/unsigned, or just signed
        Java doesn’t have unsigned, which is painful to C programmers in the same way that having only floats is painful here
        on the other hand, unsigned in C++ causes design mistakes! Bjarne and many other people agree that std::vector::size() returning and unsigned type is a mistake, even though it can’t be negative. These choices are hard.
    string/unicode, or just string
    dict vs. list, or just list (Scheme)
    control flow: async/sync, or just sync
    for YSH: procs/func, or just proc

" -- https://lobste.rs/s/cnpaup/was_javascript_really_made_10_days#c_xewini

---

---

i love most of graydon's "i would have done this is if i were BDFL" items:

https://graydon2.dreamwidth.org/307291.html

---

" So let's say we have a language that provides at least the basic affordances for a performant implementation:

    Ability to use static dispatch, static field offsets and stack allocation.
    Control over memory layout - at minimum being able to express arrays-of-structs without pointer soup." -- https://www.scattered-thoughts.net/writing/implementing-interactive-languages/

"There's not much to gain by compiling a language like python - it's all hash-table lookups and dynamic dispatch. I can see all the bodies on that hill." -- https://www.scattered-thoughts.net/writing/implementing-interactive-languages/

---

"Right? Where can I find a language with no garbage collection, nice sum types and no over complicated async syntax? Where? Nowhere. This is what people want." -- https://news.ycombinator.com/item?id=37892756 above e also says e wants namespaces

withoutboats replies that GC is required to provide high-perf async without special syntax

"Rust-like but without all the low-level requirements...There is essentially no widely-adopted programming language out that feels like a modern ML with a good tooling situation. Until that happens, Rust will continue to awkwardly serve the audience of such a language while never truly being what they want it to be." -- https://news.ycombinator.com/item?id=37892666 other ppl respond: what about F#? ReasonML?? Swift?

---

"Green threads done right strikes me as something like Windows' user mode scheduling (or Google's switchto patch that sadly never made it upstream), in which threads really are 1:1 as far as the kernel is concerned, but manually scheduled by userspace. This requires kernel support, but it fixes every issue except preemption, which is better to just not fix." -- https://news.ycombinator.com/item?id=15146750

---

random interesting idea: safe recursion (guaranteeing there will never be a stack overflow, by eg proving that most functions are not recursive or have bounded recursion below some statically known limit, and allocating the stack frames for the rest of them dynamically on the heap)

https://github.com/ziglang/zig/issues/1006

see also https://lobste.rs/s/6fjkeh/why_async_rust#c_ej9rzo and https://lobste.rs/s/6fjkeh/why_async_rust#c_veu7ht and https://lobste.rs/s/6fjkeh/why_async_rust#c_qlerqc and https://lobste.rs/s/6fjkeh/why_async_rust#c_2bstgx

---