Bayle Shanks's website: proj-oot-ootNotes34

kragen 2 hours ago [-]

> In chapter 4, you write your own Lisp interpreter. If they had chosen C++, would you be writing a C++ compiler? Or a Lisp interpreter in C++? Either way, it would be ugly.

It's an interpreter for a small subset of Scheme, but Scheme is of course itself small and simple†, which means that it's not hard to mentally extrapolate from the small subset to the rest of the language. C++ is a large language, and so it would be less compelling to implement a small subset of it. And you could of course use C++ to implement Lisp, as you say; chapter 4 actually covers several programming paradigms quite alien to Scheme itself, so implementing a functional programming language in C++ would be quite reasonable. As I read it, the main point of chapter 4 is that it's not true that "Most languages give you some abstractions, and others simply can't be built." Metacircularity makes chapter 4 very effective at convincing you that what you've written is actually a real programming language, but it has confused you into missing (what I read as) the main point of the chapter. Perhaps it was counterproductive.

If not, though, it would be quite reasonable to achieve the same pedagogical metacircularity in Lua, Prolog, Forth, ML, Smalltalk, GHC Core, my own Bicicleta, or any number of other small, simple languages. I don't know about you, but those languages feel very different (from Lisp and from one another) to me.

† "Simple" here refers, for better or for worse, to the observed behavior of the language, not the difficulty of its implementation. Much of this is easy to gloss over in a metacircular interpreter, where you can inherit garbage collection, procedure call semantics, dynamic typing, threading, call/cc, and even parsing, lazy evaluation, and deterministic evaluation order, from the host language. (The book points out many of these points.) If you were going to design a language at Scheme's programming level with an eye to making the implementation as simple as possible, Scheme would not be that language.

---

YeGoblynQueenne? 2 hours ago [-]

>> They do this even though Lisp is now the second-oldest programming language in widespread use, younger only than Fortran, and even then by just one year.

And the third-oldest is COBOL (it's at least as widespread as FORTRAN; arguably, it's even more widespread than both FORTRAN and LISP together, considering that it's used by pretty much every financial org on the planet).

It seems that, alredy from such an early time, the kind of languages we would end up creating was already pretty much set in stone: FORTRAN, as the grandady of languages aimed at scientists and mathematicians, that modern-day R, Python, Julia etc draw their heritage from; LISP as the grandmother of languages aimed to computer scientists and AI researchers, still spawning an unending multitude of LISP variants, including Scheme, ML and Haskell; and COBOL, the amorphous blob sitting gibbering and spitting at the center of the universe of enterprise programmers, that begat the Javas, VBs and Adas of modern years.

(Note that I'm referring to language philosophy and intended uses- not syntax or semantics).

(I'm also leaving out large swaths of the programming community: the Perl uses and the C hackers etc. It's a limited simile, OK?).

truculent 1 hour ago [-]

I would say that R (and probably Julia as well, although I'm much less familiar) draws it's heritage from Lisp more than FORTRAN.

---

an investigation into why a Hello World executable in Rust is 650k, and what can be done about it:

https://lifthrasiir.github.io/rustlog/why-is-a-rust-executable-large.html

(the takeaway is that it's mostly (a) debug symbols and (b) standard library, and some library-y stuff like an allocator)

---

https://news.ycombinator.com/item?id=18227837

---

Chabs 1 day ago [-]

One thing that I'm surprised this doesn't cover, especially since it's so C++-centric, is that modern C++ OOP is much more defined by lifetime/scope management than anything else. What defines something as an object is the fact that it doesn't exist until it is constructed, and doesn't exist after it gets destructed (which is the case even for fundamental types, with the exception of char/std::byte, btw).

Hot take: RAII has basically taken over everything else as far as structural design foundation goes. Type erasure and encapsulation still play a role, but it's not nearly as fundamental anymore.

stochastic_monk 22 hours ago [-]

RAII is my primary use for objects in C+++, and I believe Rust is similar. Inheritance has fewer uses for me.

Ace17 13 hours ago [-]

RAII is a very powerful technique. However, it has actually very little to do with object-orientation. To my knowledge, the only other langages having it are D (multi-paradigm) and Rust (explicitly non-object oriented).

---

p2t2p 21 hours ago [-]

All my attempts to jump off OOP train break at exact same moment when I try to write a unit test.

Closure: or, simply use this monstrous component pattern when 70% of your code is boilerplate or hack into namespaces and override functions in them in runtime. And don’t forget to do it in right order!

JavaScript?: yeah, simply re-define ‘require’ before importing dependencies in tests. Yeah, do it in right order.

Recent example - I was researching how to mock calls to functions in packages in go... Well, the best thing you can do is to have package private variable, assign function to it and use it throughout the code so you can swap it with mock/stub in a test.

There is none of that bs when I write Java or C#. I have a mechanism to decouple contracts from implementations - interfaces. I have mechanism to supply dependencies to modules - it’s called constructor parameters. I can replace implementations easily with mocks or stubs in tests without target even noticing that.

Can somebody provide me with an example of this kind of decoupling achieved in other paradigms _without_ hacking the runtime of a language or ugly tricks like in go case?

loup-vaillant 19 hours ago [-]

Give it up. Mocks are mostly useless.

If you want testable code, the first step is to separate computations from effects. Most of your program should be immutable. Ideally you'd have a mostly functional core, used by an imperative shell.

Now to test a function, you just give it inputs, and check the outputs. Simple as that.

Oh you're worried that your function might use some other function, and you still want to test it in isolation? I said give it up. Instead, test that other function first, and when you're confident it's bug free (at least for the relevant use cases), then test your first function.

And in the rare cases where you still need to inject dependencies, remember that most languages support some form of currying.

humanrebar 17 hours ago [-]

That's a solid plan for acceptance testing and a great way to make sure you can never test diagnostic, recovery, rollback, and other something-abnormal-happened-here logic.

For small programs with few interfaces to worry about, that might be fine, but as you number of users go up, the odds go up that you'll be ensuring rollback bits get flipped when filesystems fail. None of that is simple to test without some sort of dependency injection or other heavyweight design pattern.

---

great article on some best practices:

[1]

---

oot intends to be editable in a normal text editor; it does not sort-of-almost-need an IDE

---

hguhghuff 1 day ago [-]

One of the things that really put me off Lua was lack of a strong central control and organization for the language, resulting in fragmentation.

How does lisp stand in this regard?

jorams 1 day ago [-]

It's interesting that you say that about Lua, because it actually does have a strong central control and organization for the language itself. The "problem" is that it doesn't go further than the language, large parts of the community disagree with their choices, and they tend to not care about backwards compatibility too much.

Lisp is very different in that area. I think that is because the language standard is large and doesn't change. As long as people want to conform to that standard, splitting up too much doesn't really make sense.

---

https://graydon2.dreamwidth.org/5785.html is a great read, should read it again sometime in the future. Graydon (Rust guy) comments on some things in Swift. His comments are pretty concise so there's no point in summarizing.

---

just noting here that i've completely read through [2] and copied the relevant parts to the relevant notes files. No need to read it again.

---

fulafel on Aug 19, 2017 [-]

Again my pet ignored language/compiler technology issue goes unmentioned: data layout optimizations.

Control flow and computation optimizations have enabled use of higher level abstractions with little or no performance penalty, but at the same time it's almost unheard of to automatically perform (or even facilitate) the data structure transformations that are daily bread and butter for programmers doing performance work. Things like AoS?->SoA? conversion, compressed object references, shrinking fields based on range analysis, flattening/dernormalizing data that is used together, converting cold struct members to indirect lookups, compiling different versions of the code for different call sites based on input data, etc.

It's baffling considering that everyone agrees memory access and cache footprint are the current primary perf bottlenecks, to the point that experts recommend considering on-die computation is free and counting only memory accesses in first-order performance approximations.

chubot on Aug 19, 2017 [-]

FWIW Scala has research/support for this, and it indeed seems to require a lot of support from the language, not just the compiler:

http://scala-miniboxing.org/ldl/

mpweiher on Aug 19, 2017 [-]

More generally, optimization has to become 1st class, not something that the compiler may or may not do on whim.

modeless on Aug 19, 2017 [-]

http://halide-lang.org/ should be much better known than it is. It has programmer-specified optimizations and data layout separated from algorithm definitions.

sitkack on Aug 19, 2017 [-]

Memory layout should be library/protocol. Look at the gigawatt hours that have been spent on serialization, and the resulting work one has to put into having a serialization-free format like Capnproto. The heap should be like a well formed database and those accessors should be portable across languages and systems.

One should be able to specify a compile time macro that controls memory layout.

AaronFriel? on Aug 19, 2017 [-]

I think you may be talking past each other? I think the sort of thing the falafel is referring to is the problem of data-locality.

Let's say I have a Vec<User>. Perhaps it would be more efficient to have a tuple of: Vec<User::Name>, Vec<User::Address>, Vec<User::Siblings>...

That is, can we design a language where the fields of a type can be analyzed by a container, and deconstructed to be able to implicitly reconstitute an object from its fields?

This is what entity component systems try to do, but some of it can be clumsy, and you have to build these entities and components with the system at the beginning. Could we achieve more if we baked some of this into the language and allowed more "introspective" generic types?

Edit: Here's a trivial example of an optimization that could be possible. In systems like C#'s LINQ, or JavaScript?'s lodash, we might sometimes like to take a set of objects and group them by a key. So we do listOfThings.GroupBy?(x => x.key). Why do we need to store the full "x" and the key separately? Or if we, in a subsequent step, return a sum over a field, could we intelligently remove the data shuffling of all the other fields, and construct an intersection-type consisting of only the fields we care about?

---

 Animats on Aug 19, 2017 [-]

One big problem we're now backing into is having incompatible paradigms in the same language. Pure callback, like Javascript, is fine. Pure threading with locks is fine. But having async/await and blocking locks in the same program gets painful fast and leads to deadlocks. Especially if both systems don't understand each other's locking. (Go tries to get this right, with unified locking; Python doesn't.)

The same is true of functional programming. Pure functional is fine. Pure imperative is fine. Both in the same language get complicated. (Rust may have overdone it here.)

More elaborate type systems may not be helpful. We've been there in other contexts, with SOAP-type RPC and XML schemas, superseded by the more casual JSON.

Mechanisms for attaching software unit A to software unit B usually involve one being the master defining the interface and the other being the slave written to the interface. If A calls B and A defines the interface, A is a "framework". If B defines the interface, B is a "library" or "API". We don't know how to do this symmetrically, other than by much manually written glue code.

Doing user-defined work at compile time is still not going well. Generics and templates keep growing in complexity. Making templates Turing-complete didn't help.

 Animats on Aug 19, 2017 [-]

See Architectural Mismatch or, Why it's hard to build systems out of existing parts

If you want to study that, look at ROS, the Robot Operating System. ROS is a piece of middleware for interprocess communication on Linux, plus a huge collection of existing robotics, image processing, and machine learning tools which have been hammered into using that middleware. The dents show. There's so much dependency and version pinning that just installing it without breaking a Ubuntu distribution is tough. It does sort of work, and it's used by many academic projects.

In a more general sense, we don't have a good handle on "big objects". Examples of "big objects" are a spreadsheet embedded in a word processor document or a SSL/TLS system. Big objects have things of their own to do and may have internal threads of their own. We don't even have a good name for these. Microsoft has Object Linking and Embedding and the Common Object Model, which date from the early 1990s and live on in .NET and the newer Windows Runtime. These are usually implemented, somewhat painfully, through the DLL mechanism, shared memory, and through inter-process communication. All this is somewhat alien to the Unix/Linux world, which never really had things like that except as emulations of what Microsoft did.

"Big object" concepts barely exist at the language level. Maybe they should.

mercer on Aug 19, 2017 [-]

So this might just me over-fitting my new obsession to everything in the world, or alternatively I might just be out of my depth here, but could it be argued that Elixir's (or rather Erlang's) OTP approach solves/sidesteps most if not all the issues you mention?

Starting a separate 'Erlang process' for all async stuff, for example, seems so wonderfully simple to me compared to the async mess I find in JS, and applying various patterns(?) to that (Task, GenServer?, SuperVisor?) still provides a lot of freedom without incompatibility.

Please correct me if I'm wrong though. I'm still in the research phase so I haven't even written much Elixir/Erlang yet...

---

borplk on Aug 19, 2017 [-]

I'd say the elephant in the room is graduating beyond plaintext (projectional editor, model-based editor).

If you think about it so many of our problems are a direct result of representing software as a bunch of files and folders with plaintext.

Our "fancy" editors and "intellisense" only goes so far.

Language evolution is slowed down because syntax is fragile and parsing is hard.

A "software as data model" approach takes a lot of that away.

You can cut down so much boilerplate and noise because you can have certain behaviours and attributes of the software be hidden from immediate view or condensed down into a colour or an icon.

Plaintext forces you to have a visually distracting element in front of you for every little thing. So as a result you end up with obscure characters and generally noisy code.

If your software is always in a rich data model format your editor can show you different views of it depending on the context.

So how you view your software when you are in "debug mode" could be wildly different from how you view it in "documentation mode" or "development mode".

You can also pull things from arbitrarily places into a single view at will.

Thinking of software as "bunch of files stored in folders" comes with a lot baggage and a lot of assumptions. It inherently biases how you organise things. And it forces you to do things that are not always in your interest. For example you may be "forced" to break things into smaller pieces more than you would like because things get visually too distracting or the file gets too big.

All of that stuff are arbitrary side effects of this ancient view of software that will immediately go away as soon as you treat AND ALWAYS KEEP your software as a rich data model.

Hell all of the problems with parsing text and ambiguity in sytnax and so on will also disappear.

beagle3 on Aug 19, 2017 [-]

This claim is often repeated, but I haven't seen it substantiated even once. It's possible that no one has yet come up with the right answer that "obviously there". But it is also possible that this claim is not true, and a tend to believe the latter more as time passes.

Every attempt that I've seen, e.g. Lamdu, Subtext, any "visual app builder", all fail miserably at delivering ANY benefit except for extremely simple programs -- while at the same time, taking away most of the useful tools we already have like "grep", "diff", etc. Sure, they can be re-implemented in the "rich data model", perhaps even better than their textual ancestors - but the thing is, that they HAVE to be re-implemented, independently for each such "rich data model", or you can't have their functionality at all -- whereas as 1972 "diff" implementation is still useful for 2017 "pony", a language with textual representation.

regarding your example, the "breaking things into smaller pieces" was solved long ago by folding editors (I used one on an IBM mainframe in 1990, I suspect Emacs already had it at the same time, it did for sure in 1996).

the problems with "parsing and ambiguity" are self inflicted, independent of whether the representation is textual. Lisp has no ambiguity, Q (the K syntax sugar) has no ambiguity. Both languages eschew operator precedence, by the way, because THAT is the real issue that underlies modern syntax ambiguities.

---

[3] says "As it turns out, Linux and SmartOS? make slightly different guarantees with respect to the interaction of vfork and signals, and our code was fatally failing on a condition that should be impossible. Any old Unix hand (or quick study!) will tell you that vfork and signal disposition are each semantic superfund sites in their own right — and that their horrific (and ill-defined) confluence can only be unimaginably toxic. But the real problem is that actual software implicitly depends on these semantics — and any operating system that is going to want to run existing software will itself have to mimic them. You don’t want to write this code, because no one wants to write this code."

cleaning up vfork semantics sounds like something that Oot doesn't need to be too concerned about. But signals are pretty fundamental, so i was wondering what's wrong with them.

i found some comments on https://jvns.ca/blog/2016/06/13/should-you-be-scared-of-signals/

pm215 on June 13, 2016 [-]

If you've ever read the Lions' Book commentary on 6th Edition Unix, you'll notice that many parts of the API as implemented back there are pretty solid -- quality, well designed interfaces that have stood the test of time.

Signals are not one of those parts. The 6th Ed signal handling code reads to me as somewhat of an afterthought whose use cases were mostly "kill the process for a fatal signal or terminal ^C", "ptrace for a debugger" and maybe SIGALRM. The data structures don't allow a process to have more than one pending signal -- if a new one comes along the fact an old one was pending is simply dropped. Running a signal handler automatically deregistered it, leaving a race condition if two signals arrived in close succession (this is a well known bug fixed by BSD later). And EINTR is an irrelevance if signals are generally fatal but its effects spread like poison through every other kernel API if you need your program to be reliable even with signals being delivered.

The worst bugs and races were fixed up by the BSD folks and others, but the underlying concept is an unfortunate combination of "basically irredeemable", "indispensable" (you have to have some kind of "kernel tells you something has happened" API, and signals are what we got) and "insidious" (thanks to EINTR). I think they're a strong candidate for "worst design decision in unix".

(PS: one of the reasons they stand out in 6th Ed is that so much of the rest of that code is so good!)

Koromix on June 14, 2016 [-]

Neil Brown published an interesting series about Unix design mistakes a few years ago: https://lwn.net/Articles/414618/

The first part of the "Unfixable designs" article is about Unix signals.

gpderetta on June 14, 2016 [-]

The big problem with unix signals is that they have been abused to deliver some messages that should really be delivered via a message pipe (e.g. SIGCHD, all the terminal/tty specific signals). The other is that the set of signals is limited and signal handlers are a process (or thread) wide resource, so it is hard to make use of them in a library.

Other than that, the general ability of interrupting and delivering a message to a thread no matter what is doing is necessary and signals are a way to implement that. Exceptions are another way, but that can be implemented on top of signals.

edit: but there is really no excuse for EINTR. The "Worse is Better" essay has something to say about this.

Qwertious on June 14, 2016 [-]

>The worst bugs and races were fixed up by the BSD folks and others, but the underlying concept is an unfortunate combination of "basically irredeemable", "indispensable" (you have to have some kind of "kernel tells you something has happened" API, and signals are what we got) and "insidious" (thanks to EINTR). I think they're a strong candidate for "worst design decision in unix".

Suppose you were to throw the whole thing out and write a good replacement (and backwards-compatibility be damned), what would it be like?

(.. a lot of ppl answered that signals were used for a lot of non-urgent stuff, so for those uses replace it with something that doesn't interrupt... but i still get the feeling that the dtrace guy thinks that even for cases needing signals/interrupts there are some flaws in the API, so what are those?)

---

 sdegutis 14 days ago [-]

> "With function specs, I could assert the inputs and outputs to specific functions at runtime; not quite as reassuring as compile-time but better than nothing!"

For 10 years I've seen many developers who have only really used statically typed languages like C++ and Java, now finding the "joy" of dynamically typed languages and all the "freedom" they give you, only to realize within a year or two that, wait a second, verifying those types at compile-time was actually pretty useful. I've also seen mass migrations away from languages like Ruby, Python and Clojure towards OCaml, Haskell, and F#, because people really enjoyed the functional programming aspects but wanted their static typing back. After writing the 10,000th unit test that only exists to ensure that you spelled some function/method/hash-key correctly, it gets a little tiresome. Personally I've settled on TypeScript?, which has nice IDE integration in VS Code so that it validates your types live while you code, while still giving you the flexibility of a dynamically typed language (it's all still JS at the end of the day).

---

"As you can probably tell, I would like my programming future to have a weird mix of Crystal, Go and ReasonML?." [4]

---

" Windows NT is like a microkernel in the sense that it has a core Kernel (KE) that does very little and uses the Executive layer (Ex) to perform all the higher-level policy. Note that EX is still kernel mode, so it's not a true microkernel. The kernel is responsible for thread dispatching, multiprocessor synchronization, hardware exception handling, and the implementation of low-level machine dependent functions. The EX layer contains various subsystems which provide the bulk of the functionality traditionally thought of as kernel such as IO, Object Manager, Memory Manager, Process Subsystem, etc. "

---

some notes on secure/correct programming from [5]:

you should have a clear boundary between untrusted input data, and validated internal data
in a language like C, you should not just cast untrusted input bytes into the data structure that they are supposed to represent; rather, you should treat the input bytes as an array of bytes and parse it into the data structure, doing validity checks along the way
in terms of errors, you should clearly distinguish between failed assertions/internal errors, which means that you detect a situation that should be impossible, indicating a bug in YOUR code; and errors which indicate that the inputs are bad. And also clearly distinguish between something that the administrator running your code did wrong, and something that an external user did wrong (e.g. if invalid user input attempted to overflow a buffer, this is a different situation from when valid but long input did not fit in the buffer; in the former case it's some malicious external user's fault, in the latter maybe the administrator should have configured larger buffers). You should throw/return clearly different types of errors for all of these different situations
rather than doing pointer arithmetic with a single buffer pointer, do this: "buf, offset, and length. The buf variable points to the start of the buffer and is never incremented. The length variable is the max length of the buffer and likewise never changes. It's the offset variable that is incremented throughout."
don't overload other people's error codes with new meanings unless they were intended to be used that way

" Unknown said...

    "offset + x < length" is not a safe form for a bounds check, if x can be a value decoded from the network data. If x can have a value close to the maximum positive value for its type, then "offset + x" may overflow and compare less than length, even though x is not within bounds. Instead, compare x to length - offset, knowing that offset <= length is an invariant."

---

https://cacm.acm.org/magazines/2018/11/232214-a-look-at-the-design-of-lua/fulltext

read through " Figure 2. A simple module in Lua." and up to the heading "Environments" below that

notes so far:

"Lua offers exactly one general mechanism for each major aspect of programming: tables for data; functions for abstraction; and coroutines for control. On top of these building blocks, programmers implement several other features, including modules, objects, and environments, with the aid of minimal additions (such as syntactic sugar) to the language"

"Like other scripting languages, Lua has dynamic types, dynamic data structures, garbage collection, and an eval-like functionality"

goals: Simplicity, Small size (also important for portability and embedding), Portability, Embeddability

embeddability: "Lua is thus implemented not as a standalone program but as a library with a C API. This library exports functions that create a new Lua state, load code into a state, call functions loaded into a state, access global variables in a state, and perform other basic tasks. The"

"Portability restricts what the standard libraries can offer to what is available in ISO C, including date and time, file and string manipulation, and basic mathematical functions. Everything else must be provided by external libraries

...

Embeddability has a subtler influence. To improve embeddability, Lua favors mechanisms that can be represented naturally in the Lua-C API. For instance, Lua tries to avoid or reduce the use of special syntax for a new mechanism, as syntax is not accessible through an API. On the other hand, mechanisms exposed as functions are naturally mapped to the API. "

"Lua supports eight data types: nil, boolean, number, string, userdata, table, function, and thread, which represents coroutines. The first five are no surprise. The last three give Lua its flavor and are the ones we discuss here. However, given the importance of embeddability in the design of Lua, we first briefly introduce the interface between Lua and its host language."

" The Lua-C API

To illustrate the concept of embedding in Lua, consider a simple example of a C program using the Lua library. Take this tiny Lua script, stored in a file

    pi = 4 * math.atan(1)

Figure 1 shows a C program that runs the script and prints the value of pi. The first task is to create a new state and populate it with the functions from the standard libraries (such as math.atan). The program then calls luaL _ loadfile to load (precompile) the given source file into this state. In the absence of errors, this call produces a Lua function that is then executed by lua _ pcall. If either loadfile or pcall raises an error, it produces an error message that is printed to the terminal. Otherwise, the program gets the value of the global variable pi and prints its value.

f1.jpg Figure 1. A C program using the Lua library.

The data exchange among these API calls is done through an implicit stack in the Lua state. The call to luaL _ loadfile pushes on the stack either a function or an error message. The call to lua _ pcall pops the function from the stack and calls it. The call to lua _ getglobal pushes the value of the global variable. The call to lua _ tonumber projects the Lua value on top of the stack to a double. The stack ensures these values remain visible to Lua while being manipulated by the C code so they cannot be collected by Lua's garbage collector.

Besides the functions used in this simple example, the Lua-C API (or "C API" for short) offers functions for all kinds of manipulation of Lua values, including pushing C values (such as numbers and strings) onto the stack, calling functions defined by the script, and setting variables in the state. "

" "Table" is the Lua term for associative arrays, or "maps." A table is just a collection of entries, which are pairs (key, value).

Tables are the sole data-structuring mechanism in Lua. "

" Lua supports records with syntactic sugar, translating a field reference like t.x to a table-indexing operation t["x"]. "

" Arrays have no special status in the semantics of Lua; they are just ordinary tables. However, arrays pervade programming. Therefore, implementation of tables in Lua gives special attention to their use as arrays. The internal representation of a table in Lua has two parts: an array and a hash.7 If the array part has size N, all entries with integer keys between 1 and N are stored in the array part; all other entries are stored in the hash part. The keys in the array part are implicit and do not need to be stored. The size N of the array part is computed dynamically, every time the table has to rehash as the largest power of two such that at least half the elements in the array part will be filled. A generic access (such as t[i]) first checks whether i is an integer in the range [1, N]; this is the most common case and the one programmers expect to be fast. If so, the operation gets the value in the array; otherwise, it accesses the hash. When accessing record fields (such as t.x) the Lua core knows the key is a string and so skips the array test, going directly to the hash.

An interesting property of this implementation is that it gives sparse arrays for free. For instance, when a programmer creates a table with three entries at indices 5, 100, and 3421, Lua automatically stores them in the hash part, instead of creating a large array with thousands of empty slots.

Lua also uses tables to implement weak references. In languages with garbage collection, a weak reference is a reference to an object that does not prevent its collection as garbage.10 In Lua, weak references are implemented in weak tables. A weak table is thus a table that does not prevent its contents from being collected. If a key or a value in an entry is collected, that entry is simply removed from the table; we discuss later how to signal that a table is weak. Weak tables in Lua also subsume ephemerons.4 "

" Lua supports first-class anonymous functions with lexical scoping, informally known as closures.13 Several non-functional languages nowadays (such as Go, Swift, Python, and JavaScript?) offer first-class functions. However, to our knowledge, none uses this mechanism as pervasively as Lua.

All functions in Lua are anonymous. This is not immediately clear in the standard syntax for defining a function

    function add (x, y)
        return x + y
    end

Nevertheless, this syntax is just syntactic sugar for an assignment of an anonymous function to a variable

    add = function (x, y)
        return x + y
    end

Most dynamic languages offer some kind of eval function that evaluates a piece of code produced at runtime. Instead of eval, Lua offers a load function that, given a piece of source code, returns a function equivalent to that code. We saw a variant of load in the C API in the form of luaL _ loadfile. Consider the following piece of code

    local id = 0
    function genid ()
        id = id + l
        return id
    end

When one loads it, the function load returns an anonymous function equivalent to the following code

    function ()
        local id = 0
        function genid ()
            id = id + 1
            return id
        end
    end

...

The function load simplifies the semantics of Lua in two ways: First, unlike eval, load is pure and total; it has no side effects and it always returns a value, either a function or an error message; second, it eliminates the distinction between "global" code and "function" code, as in the previous chunk of code. The variable id, which in the original code appears outside any function, is seen by Lua as a local variable in the enclosing anonymous function representing the script. Through lexical scoping, id is visible to the function genid and preserves its value between successive calls to that function. Thus, id works like a static variable in C or a class variable in Java.

" Modules. The construction of modules in Lua is a nice example of the use of first-class functions and tables as a basis for other mechanisms. At runtime, a module in Lua is a regular table populated with functions, as well as possibly other values (such as constants). Consider this Lua fragment

    print(math.sin(math.pi/6)) --> 0.5

Abstractly, programmers read this code as calling the sin function from the standard math module, using the constant pi from that same module. Concretely, the language sees math as a variable (created when Lua loaded its standard libraries) containing a reference to a table. That table has an entry with the key "sin" containing the sine function and an entry "pi" with the value of π. "

module example (fig. 2):

" local M = {}

function M.new (x, y) return {x = x, y = y} end

function M.add (u, v) return M.new(u.x+v.x, u.y+v.y) end "

" local vec = require "mymodule" "

" require is a regular function from the standard library; when the single argument to a function is a literal string, the code can omit the parentheses in the call. If the module is not already loaded, require searches for an appropriate source for the given name (such as by looking for files in a list of paths), then loads and runs that code, and finally returns what the code returns. In this example, require returns the table M created by the chunk.

Lua leverages tables, first-class functions, and load to support modules. The only addition to the language is the function require. This economy is particularly relevant for an embedded language like Lua. Because require is a regular function, it cannot create local variables in the caller's scope. Thus, in the example using "mymodule", the programmer had to define explicitly the local variable vec. Yet this limitation gives programmers the ability to give a local name to the module. ... it has an easy integration with the C API: One can easily create modules in C; create mixed modules with some functions defined in Lua and others in C; and for C code call functions inside modules. The API needs no additional mechanisms to do these tasks; all it needs is the existing Lua mechanisms to manipulate tables and functions. "

---

[6]

---

" Include Guards

When you include a header, there is usually a #ifndef and #define statement at the top of the file and a corresponding #endif at the bottom. We call this an include guard. It is responsible for setting a variable the first time it is run so that including the same file a second time doesn't redefine things that already exist and cause the compiler to panic.

ifndef FILENAME_INCLUDED
define FILENAME_INCLUDED

code

endif

This is a very useful trick but it's also one of the more fundamental problems of C:

    you include a file a first time,
    it modifies the compiler state,
    you include the same file a second time,
    based on the compiler state, it pretends to be empty.

That is completely crazy - the file you include can change based on the state of the compiler. Not only that but the included files themselves can modify the state of the compiler (windows.h is infamous for doing this).

Because of this, compiling becomes slow and complex. Suppose that we want to compile two files which both include <string.h> and that <string.h> itself includes about 50 other files. We are not able to cache <string.h> without proving that the compiler state is the same when we include it!

So what started out as a simple, easy to implement solution turns out to scale really poorly. This wasn't an issue in 1972 when the computers limited the complexity but almost 50 years later, it's a big problem. The C++ standards committee has been trying to introduce a module system to fix this but it's a difficult task to change such a fundamental system in an established language. " [7]

---

" The following source file gives us an idea of mangling:

extern "C" int add_c(int a, int b) { return a + b; }

int add(int a, int b) { return a + b; }

int add(const int *a, const int &b) { return *a + b; }

float add(float a, float b) { return a + b; }

namespace manu { int add(int a, int b) { return a + b; } }

If we look at it with nm:

nm mangling.o c++filt _Z3addff ...

Position Type Name Signature 0 Text add_c int add_c(int, int) 44 Text _Z3addff float add(float, float) 14 Text _Z3addii int add(int, int) 28 Text _Z3addPKiRS_ int add(const int *, const int &) 5e Text _ZN4manu3addEii int manu::add(int, int)

Basically, in C, functions are simply identified by their names. This prevents us from having namespaces and having a function with the same name but different arguments. C++ gets around this by using mangling. extern "C" turns off mangling so that C++ can be compatible with C.

Unfortunately, many compilers do mangling differently and so are incompatible. Luckily, most compilers have recently standardized on the Itanium C++ ABI that you see above.

    start with _Z since underscore capital letter is reserved in C,
    an N after the Z indicates nested names,
    put numbers that indicate the length of the next argument,
    this gives us a list of strings,
    the last string is the function, class or struct name,
    the previous ones are the namespaces or outer classes,
    if our names were nested, we insert an E,
    we indicated the type and modifiers of our arguments.

mangling details

    Even with mangling, we don't have enough size information for function calls to forgo headers. We are missing the size of the return value and the size of structures.

" [8]

---

 chadaustin 2 days ago [-]

Compiling C++ to asm.js with Emscripten is another weird architecture you might actually use these days.

Unaligned accesses don't trap with a SIGBUS or anything, they just round the pointer value down to an aligned address and read whatever that is.

Reading from and writing to NULL will generally succeed (just as on SGI).

Function pointers are just indices into tables of functions, one table per "function type" (number of arguments, whether the arguments are int or float, whether it returns an argument or not). Thus, two different function pointers may have the same bit pattern.

---

sephware 3 days ago [-]

Go was originally intended as a systems level language, suitable for writing extremely efficient servers, dealing directly with raw bytes where needed, therefore enabling writing code at both high and low levels. VMs typically take advantage of the lowest level of code that their language supports to be as efficient as possible. There's no reason Go code can't be as fast as C++ code with enough time for optimizations, and there's no reason a VM written in Go has to be slower than any other low-level program written in Go.

skybrian 3 days ago [-]

I believe there are compiler improvements that can be made without changing the Go language, like computed goto's for large switch statements. That will help.

However, to really do as well as C, Go would need a more space-efficient union type. A Go interface is always two pointers due to how the garbage collector works, and that's not going to be as efficient as using something like NaN?-encoding for a union between a pointer and a float64.

ComputerGuru? 3 days ago [-]

For go code to be as fast as C it would necessarily not look like go code as you’d have to avoid anything that would involve non-deterministic allocation or garbage collection. Sure, it’s possible. But it doesn’t make sense.

---

(fortran still "excels in the good old structured programming. It has features that mainstream C-like languages lack. For instance, it can exit or continue a loop from a nested loop.

rows: do i = 1, 10 columns: do j = 1, 10 if (a(i, j) = 0.) exit rows ... enddo columns enddo rows

If has case statement with ranges.

integer :: temp_c

! Temperature in Celsius! select case (temp_c) case (:-1) write (*,*) 'Below freezing' case (0) write (*,*) 'Freezing point' case (1:20) write (*,*) 'It is cool' case (21:33) write (*,*) 'It is warm' case (34:) write (*,*) 'This is Texas!' end select

And it can use an array of indexes to access another array.

real, dimension(5) :: a = [ 2, 4, 6, 8, 10 ] integer, dimension(2) :: i = [ 2, 4 ]

print *, a(i) ! prints 4. 8.

---

"int" should be arbitrary precision -- int32 int64 etc for modular arithmetic

golang is adding:

" 2. #19308, #28493 Binary integer literals and support for_ in number literals: These are relatively minor changes that seem hugely popular among many programmers. They may not quite reach the threshold of solving an “important issue” (hexadecimal numbers have worked well so far) but they bring Go up to par with most other languages in this respect and relieve a pain point for some programmers. They have minimal impact on others who don’t care about binary integer literals or number formatting, and the implementation is well understood.

3. #19113 Permit signed integers as shift counts: An estimated 38% of all non-constant shifts require an (artificial) uint conversion (see the issue for a more detailed break-down). This proposal will clean up a lot of code, get shift expressions better in sync with index expressions and the built-in functions cap and len. It will mostly have a positive impact on code. The implementation is well understood. " "

i don't like binary integer literals or support for _. but i guess i do like signed integers as shift counts? not sure exactly what the semantics are there, but why not.

and of course we want generics

nickcw 2 days ago [-]

I've been reading this proposal

> #19113 Permit signed integers as shift counts: An estimated 38% of all non-constant shifts require an (artificial) uint conversion (see the issue for a more detailed break-down). This proposal will clean up a lot of code, get shift expressions better in sync with index expressions and the built-in functions cap and len. It will mostly have a positive impact on code. The implementation is well understood.

The proposal as far as I can make out says allow signed integers for shifts but panic if they are negative.

This seems like a step backwards to me pushing checking which the compiler made you do to runtime.

Personally I'd expect a negative shift to shift the other way, but that doesn't seem to be a popular option with the team.

---

" Particularly thunderous applause for providing both backwards compatibility (published packages will keep working, won't require anything from maintainers, and we can mix old and new) and autofix features (for when maintainers are willing to make the breaking changes).

Glad Rust learned from breaking changes hiccups of other programming languages :) "

---

"With this year’s push, Rust has really good support for ARM Cortex-M family of microprocessor cores, which are used in a lot of devices. However, there are lots of architectures used on embedded devices, and those aren’t as well supported. Rust needs to expand to have the same level of support for these other architectures."

so, if we support embedded at all, focus on ARM Cortex-M first?

so as far as CPU ISAs go, sounds like the big ones for us to support are x86, ARM (may as well target Cortex-M, since those have a subset of normal ARM it should be easier to go from there to regular ARM than in the other direction? not sure), RISC-V?

---

not sure if this is relevant, but wow:

https://repl.it/site/blog/multi

---

losvedir 4 days ago [-]

It didn't click until just now that rust is very naturally suited to WebAssembly? due to its lack of garbage collection. As I understand it, there's no current plans for WebAssembly? to include a garbage collector, so higher level languages that want to compile to it will need to bundle their own, increasing the size of the download quite a bit.

---

the_duke 15 hours ago [-]

I was drawn to Rust because, behind the apparent complexity of the borrow checker, the language is/was actually very simple and easy to understand.

I have been using Rust extensively for over 2 years.

In those two years, the language complexity has increased substantially. It's already at a stage where I'm worried about losing track.

Also, there is quite a large amount of changes in the pipeline that were already accepted but are either not finished or not implemented yet. All of those can/will have a significant impact on idiomatic code and API design and will increase the complexity burden:

specialization * generic associated types (a simple form of higher kinded types) * async/await + generators * existential types * const generics ...

So I do believe Rust needs to slow down considerably and get much stricter with accepting new features. Both to not overwhelm the existing userbase and to not make the language another C++ in terms of complexity.

steveklabnik 15 hours ago [-]

One thing that’s come out of this years’ posts by various people is that, after those features have landed, people generally do want things to be done. These features shore up major weaknesses in the language for various domains, but there’s not really things not in that list that many people still extremely desire.

z3phyr 9 hours ago [-]

Try Zig too. Its on the stage when rust was 0.4 alpha, but it has some very great ideas and its really simple

---

baq 1 day ago [-]

Rust both greatly benefits from and is encumbered by LLVM. Maybe there’s room for a different LLVM-like or perhaps LLVM compatible project.

dullgiulio 1 day ago [-]

Interesting, what do you think makes is encumbered by LLVM?

I thought that sqeezing MIR in the middle Rust could get both Rust-specific transformations and then benefits from the LLVM generic code optimizations...

steveklabnik 1 day ago [-]

It can!

LLVM is fantastic, but it's also a huge C++ component, and it's not necessarily optimized for speed. It would be nice to end up with something in Rust to simplify the toolchain needed, and to maybe have something that could be even faster for debug builds.

---

jniedrauer 1 day ago [-]

Go is still a very simple language. It's not just social engineering. Compare it to Python, which is another "simple" language that you can be dangerous in on day one.

As you gain experience in Go, you learn idiomatic ways to do things and memorize the core library, but the fundamental mechanics don't change.

As you gain experience in Python, you develop a preference for virtualenvwrapper and experiment with 300 different ways to build a distributable package, then start extending __getitem__ and __setitem__ and adding decorators to everything and before you realize what you've done, you've torn apart the laws of physics and your code has nightmarish side effects lurking around every corner.

sevensor 1 day ago [-]

> As you gain experience in Python, you develop a preference for virtualenvwrapper and experiment with 300 different ways to build a distributable package, then start extending __getitem__ and __setitem__ and adding decorators to everything and before you realize what you've done, you've torn apart the laws of physics and your code has nightmarish side effects lurking around every corner.

Alternatively, you glimpse the chthonic horrors lurking at the end of that path, purge your code of hidden mutable state, and use Python to write straightforward procedural programs.

---

cageface 2 days ago

unvote [-]

What tends to happen is that you bring out a simplified, cleaned-up version of the popular language of the day. People then flock to that new language and start making feature requests. Over time that language becomes as complex and inelegant as the language it replaced and you can start the whole cycle all over again.

The move from C++ -> Java -> Go is a perfect example of this cycle.

---

klodolph 1 day ago [-]

I would disagree with the claim that “C++98 isn’t all that complex.” It was so complex that any project I found using it would restrict itself to some arbitrary subset, just to keep complexity under control. Java has gone through years of backwards-compatible changes, and is still wonderfully simple compared to early versions of C++. C++ started with a mess of different features and hasn’t really gotten worse. If anything, C++11 is simpler than C++98 from a user’s perspective, even though it’s not from an implementer’s perspective.

One of the big differences is that Java and Go have straightforward context-free grammars, mostly, and you can whip up a working parser in no time. C++ is a bit of a beast, by comparison (hence keywords like “typename”).

C++ also has the complex overloads and template system. I think people underestimate how complex these things are when they are learning C++, and how complex their interactions are. Then there’s the preprocessor.

You can kind of argue that these are just an accumulation of changes, but other languages contemporary with C and C++ do not suffer from these complexities, so the argument falls flat. By comparison, Go and Java rely on reflection or code generation more, and these are a bit simpler.

---

 pjmlp 1 day ago [-]

Just like wanting to learn Python 3.7 means learning about 500 PEPs and how they interact across each Python release (major.minor, during the last 25 years and alternatives to CPython.

adrianN 1 day ago [-]

Well Python 3 broke compatibility so you don't have to start at Python 0.1 alpha, but yes, completely understanding Python is also no easy task. I write Python almost every day and have taught courses using it, but I know of features that I don't understand. There are probably more that I'm not even aware of.

---

"For instance, Fortran is very performant in numerical computing, even outperforming C, but it has difficulty in accessing I/O mapped registers or implementing an interrupt handler, a jump table, or just cleaning a particular chunk of memory addresses. "

---

"By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine."

---

duncanawoods 1 day ago [-]

It's been a while since I watched this but my current thoughts on "Growing a language" is that I fear it is an unhelpful simplification of what makes language design hard. IMHO language design is a holistic design problem that needs global optimisation. You cannot incrementally hill-climb yourself anywhere useful when you can't backtrack over past decisions.

It is like designing a workshop. You only have so much space within arms reach. Here you place your most frequently used and valuable tools. You can "grow your language" by adding tools but they can't replace what you have in this limited and privileged position - instead, new tools have to go in cupboards or on another table. The new tools will have higher friction than the first priority tools you added.

Maybe you can have a workshop where you imagine a tool and it appears in your hand thereby removing the constraint of "tools within reach" but I think this then makes it too ephemeral and abstract. It is the constraints that makes a language tactile and ergonomic - remove the constraints and you have no structure at all.

In this case, the "tools within reach" are your core keywords and syntax. Growing features via libraries or syntactic extensions generally incur more ceremony and less elegance. Having totally flexible syntax extensions/keywords doesn't solve the problem, it just moves up a level abstraction/generality and means you have given your users the "design a language" problem instead of solving it for them.

---

" I think Rust has already made a lot of "just get it out there, don't think about the ramifications" decisions in its language design, especially around its grammar. On one hand there is still logic to what glyphs do what and how control flow is laid out - particularly concerning minimizing the number of retraced parses the source needs to go through - but it doesn't change the syntax of lifetimes ('), the use of double colon namespacing (::), or the turbofish but being an unintitive disgusting hack.

In my experience the RFC process though has helped temper that. All those blemishes are predominantly from the origin of the language up through ~2014. Once the ecosystem developed and decisions required months or years of hanging around on the issue tracker to see stabilization Rust mostly stopped grafting on arcane behavior and started reusing its syntax in intuitive ways (like impl trait).

I don't think any of the current crop of outstanding wants really makes the language more complex. A lot of it is just increasing the compliers complexity to handle more generic code, be it in terms of kindedness or by type delegation or inference. Those kinds of complexities aren't generally something for users to explicitly learn in rote memorization to use the language - if Rust introduced overloading and default arguments all those who don't know it exists won't have the language made any harder to use, but those that want it can take advantage of it.

Thiez 1 day ago [-]

   > if Rust introduced overloading and default arguments all those who don't know it exists won't have the language made any harder to use, but those that want it can take advantage of it.

The following program might no longer compile if `f` could be overloaded, additional type annotations would be required.

    fn f(n: u8) -> u8 {
        n + 5
    }
    let o = Some(Default::default());
    println!("{:?}", o.map(f));

In general I would expect overloading to make type inference less reliable (not so much with the compiler getting it wrong, but needing more annotations), which users of Rust that do not want overloading would be right to object to.

 zanny 1 day ago [-]

I don't want to really derail the thread into a discussion on overloading - I was citing it more as an example of a common desire that also introduces a lot of compiler level complexity without really modifying the language syntax - but I imagine any first pass conservative overload implementation would be restrictive enough to let the compile know if theres only one definition of a function. Default would only break if there were 2+ impls of the same signature and would just require a type annotation then.

I'd argue you could not implement overloading without enough support being in place to let the current single signature inference work as it does, which is also probably a good chunk of why its never been really proposed in earnest. Its a hard problem to approach.

---

lambda 2 days ago [-]

The (((Rust))) borrow checker allows for one mutable reference, or multiple immutable references. There are also primitives for allowing interior mutability through shared references, which enforce single access at a time by other means (mutexes, copying data in and out, etc).

References are always guaranteed to point to something that lives longer than them. What they refer to could either be on the stack, or it can be allocated and deallocated on the heap via smart pointers. Box<T> is similar to C++'s unique_ptr; there is a single owner, and when that owner is dropped, the referenced value is dropped. Arc<T> is similar to C++'s shared_ptr; it's an atomically reference counted pointer, so the value will be deallocated when the last clone is deleted.

These combine in ways that allow you to use most of those "best practice" patterns from C and C++, but with actual language guarantees that you will follow them and the compiler will catch it if you don't. In the internals of data structure implementation, you may need to go beyond what this allows with use of unsafe, but you can limit that just to a few small critical pieces of code that can be more thoroughly audited and tested.

In a large project "try to avoid doing that" doesn't really fly; in a large codebase, with dozens of developers, developed over multiple years, people are going to make mistakes. They will need to do some kind of refactoring, and one of the guarantees will get missed.

If you've done a decent amount of work in a large C or C++ codebase, you might appreciate this description of the process of storing a reference to a slice of a Vec in Rust, without having to manually reason about all of the places in which that slice could be invalidated: https://manishearth.github.io/blog/2015/05/03/where-rust-rea....

This article, written as a way to help introduce Rust 1.0, provides a few more examples, though they are artificial rather than from a real-world project: https://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.h...

wasted_intel 2 days ago [-]

For vanilla references (&), Rust will prevent you from having multiple mutable references. Beyond that, there are several types in the stdlib[0] that progressively increase flexibility, but always safely.

[0] Cell, RefCell?, Mutex, Rc, Arc

---

gpderetta 2 days ago [-]

There are only two types of languages, those that are too complex and those that aren't yet.

augustk 1 day ago [-]

There is at least one language that evolves in the opposite direction. The philosophy of Oberon is that "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away."

cryptonector 1 day ago [-]

So we should be using Forth then, got it :)

---

Groxx 2 days ago [-]

JS especially has been a rocket-ship of new language features lately. It used to be my go-to language for new people since it was so simple (when used with a few boundaries (like every language needs)) that it could be picked up and the basics understood fairly quickly, and you can set people free to look at other people's code, and gradually learn DOM APIs and whatnot as needed.

Not any more. Not even close. The amount of stuff you have to deal with immediately is far larger now.

---

tspiteri 1 day ago [-]

> Same way an actual, legitimate newbie to Rust is going to have much more difficulty processing why :: is the namespace delimiter or why you need curly braces to deliniate scope than how having the option to declare your function const would.

I disagree. :: and curly braces are just a shallow bit of syntax: you see it, learn it and it doesn't interact with anything. It is the easy part of learning the language. For example Go has curly braces too, and it is one of the fastest languages to get started with.

Maybe const functions are not visible immediately, but then as a beginner you would start looking at some documentation, and you'll see const there in API documentation and not understand it, so that makes the language look like it has these opaque features you do not understand. I'm not saying const functions are not worth the cost, I'm just saying that they do have a cost and add complexity in a way :: and curly braces do not.

---

leoc 1 day ago [-]

Allow me to summarise the pitfalls of any sweeping 'keeping features out of the languge spec will prevent users from having to deal with the complexity burden of those features' assumption with a story:

Once, long ago, there was a new programming language. It was lean and mean. Partly to keep codebases understandable to all, and to make the language easy to learn and straightforward, it eschewed both having too many features, and having any feature that was too abstruse. People marvelled that the whole language, standard library and all, could fit in a nutshell. The name of the language was Java. The end.

Ericson2314 1 day ago [-]

Yeah understanding the forest is more important than understanding the trees!

---

"Goal of programmers is to ship, on time, on budget. It’s not “to produce code.” IMO most modern C++ proponents 1) overassign importance to source code over 2) compile times, debugability, cognitive load for new concepts and extra complexity, project needs, etc. 2 is what matters." [9]

---

" If C++ had something like a “coroutine” concept, it would be possible to implement the triples generator that would be as clear as the original nested for loops, yet not have any of the “problems” (Jason Meisel points out exactly that in “Ranges, Code Quality, and the Future of C++” post); something like (tentative syntax, as coroutines aren’t part of any C++ standard yet):

generator<std::tuple<int,int,int>> pytriples() { for (int z = 1; ; ++z) for (int x = 1; x <= z; ++x) for (int y = x; y <= z; ++y) if (x*x + y*y == z*z) co_yield std::make_tuple(x, y, z); }

...

Compare C#:

var triples = from z in Enumerable.Range(1, int.MaxValue?) from x in Enumerable.Range(1, z) from y in Enumerable.Range(x, z) where x*x+y*y==z*z select (x:x, y:y, z:z);

---

should aim for this much, or less, complexity:

http://cslibrary.stanford.edu/101/EssentialC.pdf

---