proj-oot-ootLibrariesNotes13

---

" Math functions

For many years, users literally begged SQLite devs to add basic functions like sqrt(), log() and pow(). The answer was always about the same:

    SQLite is called ‘lite’ for a reason. If you need functions, add them yourself.

An understandable position indeed. But refusing to add the square root? At the same time implementing window functions, recursive queries and other advanced SQL magic? Seriously?

Maybe SQLite developers prefer to focus on features that large customers are willing to pay for. Anyway, after 20 years we now have mathematical functions!

Here is the full list:

acos(X) acosh(X) asin(X) asinh(X) atan(X) atan2(X,Y) ceil(X) ceiling(X) cos(X) cosh(X) degrees(X) exp(X) floor(X) ln(X) log(B,X) log(X) log10(X) log2(X) mod(X,Y) pi() pow(X,Y) power(X,Y) radians(X) sin(X) sinh(X) sqrt(X) tan(X) tanh(X) trunc(X)

"

---

"

samatman 8 hours ago [–]

`RETURNING` will substantially clean up my code, and I already have one migration which could have just been a `DROP COLUMN`, so this is great news.

On the subject of "it's called 'lite' for a reason", my wishlist does include library functions for working with RFC 3339 timestamps. SQLite already ships with a fairly large suite of JSON tools, which are optional to compile into the library, so there's precedent.

Datetimes are of those things which is incredibly annoying to get right, and really belongs inside the database. RFC 3339 timestamps are already well designed, since if you stick to UTC (and if you don't store timezone data separately you deserve those problems), lexical order is temporal order, but queries which would be rendered in English as "return all accounts where last payment is ninety days prior to `now`" isn't really possible with string comparisons.

Also, with the JSON library, you can use a check constraint to fail if a string isn't valid JSON, another affordance I would love to have for datetimes.

Grateful for what we just got, though! Just daydreaming...

reply

nalgeon 8 hours ago [–]

SQLite has ISO-8601 compatible date functions, isn't that enough?

  sqlite> select datetime('now', '-90 days');
  2020-12-12 21:44:22

https://sqlite.org/lang_datefunc.html

reply

samatman 4 hours ago [–]

Beats a swift kick in the pants!

You're right, that was a bad example. Maybe it's just me, but I've never figured out how to get SQLite to do a query like "select from orders where the order was on a Tuesday in Pacific time". I don't think you can; that requires predicates, and all I see are strftime and some useful pre-cooked variations on it.

reply "

---

ritchie46 7 hours ago [–]

Hi Author here,

Polars is not an alternative to PyArrow?. Polars merely uses arrow as its in-memory representation of data. Similar to how pandas uses numpy.

Arrow provides the efficient data structures and some compute kernels, like a SUM, a FILTER, a MAX etc. Arrow is not a query engine. Polars is a DataFrame? library on top of arrow that has implemented efficient algorithms for JOINS, GROUPBY, PIVOTs, MELTs, QUERY OPTIMIZATION, etc. (the things you expect from a DF lib).

Polars could be best described as an in-memory DataFrame? library with a query optimizer.

Because it uses Rust Arrow, it can easily swap pointers around to pyarrow and get zero-copy data interop.

DataFusion? is another query engine on top of arrow. They both use arrow as lower level memory layout, but both have a different implementation of their query engine and their API. I would say that DataFusion? is more focused on a Query Engine and Polars is more focused an a DataFrame? lib, but this is subjective.

Maybe its like comparing Rust Tokio vs Rust async-std. Just different implementations striving the same goal. (Only Polars and DataFusion? can easily be mixed as they use the same memory structures).

reply

---

on Debian, installing npm installs:

  gyp libc-ares2 libjs-inherits libjs-is-typedarray libjs-psl libjs-typedarray-to-buffer libnode-dev
  libnode72 libssl-dev libssl1.1 libssl1.1:i386 libuv1-dev node-abbrev node-ajv node-ansi
  node-ansi-align node-ansi-regex node-ansi-styles node-ansistyles node-aproba node-archy
  node-are-we-there-yet node-asap node-asn1 node-assert-plus node-asynckit node-aws-sign2 node-aws4
  node-balanced-match node-bcrypt-pbkdf node-bl node-bluebird node-boxen node-brace-expansion
  node-builtin-modules node-builtins node-cacache node-call-limit node-camelcase node-caseless
  node-chalk node-chownr node-ci-info node-cli-boxes node-cliui node-clone node-co node-color-convert
  node-color-name node-colors node-columnify node-combined-stream node-concat-map node-concat-stream
  node-config-chain node-configstore node-console-control-strings node-copy-concurrently
  node-core-util-is node-cross-spawn node-crypto-random-string node-cyclist node-dashdash
  node-debbundle-es-to-primitive node-debug node-decamelize node-decompress-response node-deep-extend
  node-defaults node-define-properties node-delayed-stream node-delegates node-detect-indent
  node-detect-newline node-dot-prop node-duplexer3 node-duplexify node-ecc-jsbn node-editor
  node-encoding node-end-of-stream node-err-code node-errno node-es6-promise node-escape-string-regexp
  node-execa node-extend node-extsprintf node-fast-deep-equal node-find-up node-flush-write-stream
  node-forever-agent node-form-data node-from2 node-fs-vacuum node-fs-write-stream-atomic
  node-fs.realpath node-function-bind node-gauge node-genfun node-get-caller-file node-get-stream
  node-getpass node-glob node-got node-graceful-fs node-gyp node-har-schema node-har-validator
  node-has-flag node-has-symbol-support-x node-has-to-string-tag-x node-has-unicode node-hosted-git-info
  node-http-signature node-iconv-lite node-iferr node-import-lazy node-imurmurhash node-inflight
  node-inherits node-ini node-invert-kv node-ip node-ip-regex node-is-npm node-is-obj node-is-object
  node-is-path-inside node-is-plain-obj node-is-retry-allowed node-is-stream node-is-typedarray
  node-isarray node-isexe node-isstream node-isurl node-jsbn node-json-parse-better-errors
  node-json-schema node-json-schema-traverse node-json-stable-stringify node-json-stringify-safe
  node-jsonify node-jsonparse node-jsonstream node-jsprim node-latest-version node-lazy-property
  node-lcid node-libnpx node-locate-path node-lockfile node-lodash node-lodash-packages
  node-lowercase-keys node-lru-cache node-make-dir node-mem node-mime node-mime-types node-mimic-fn
  node-mimic-response node-minimatch node-minimist node-mississippi node-mkdirp node-move-concurrently
  node-ms node-mute-stream node-nopt node-normalize-package-data node-npm-bundled node-npm-package-arg
  node-npm-run-path node-npmlog node-number-is-nan node-oauth-sign node-object-assign node-once
  node-opener node-os-locale node-os-tmpdir node-osenv node-p-cancelable node-p-finally
  node-p-is-promise node-p-limit node-p-locate node-p-timeout node-package-json node-parallel-transform
  node-path-exists node-path-is-absolute node-path-is-inside node-performance-now node-pify
  node-prepend-http node-process-nextick-args node-promise-inflight node-promise-retry node-promzard
  node-proto-list node-prr node-pseudomap node-psl node-pump node-pumpify node-punycode node-qs node-qw
  node-rc node-read node-read-package-json node-readable-stream node-registry-auth-token
  node-registry-url node-request node-require-directory node-require-main-filename node-resolve
  node-resolve-from node-retry node-rimraf node-run-queue node-safe-buffer node-semver node-semver-diff
  node-set-blocking node-sha node-shebang-command node-shebang-regex node-signal-exit node-slash
  node-slide node-sorted-object node-spdx-correct node-spdx-exceptions node-spdx-expression-parse
  node-spdx-license-ids node-sshpk node-ssri node-stream-each node-stream-iterate node-stream-shift
  node-strict-uri-encode node-string-decoder node-string-width node-strip-ansi node-strip-eof
  node-strip-json-comments node-supports-color node-tar node-term-size node-text-table node-through
  node-through2 node-timed-out node-tough-cookie node-tunnel-agent node-tweetnacl node-typedarray
  node-typedarray-to-buffer node-uid-number node-unique-filename node-unique-string node-unpipe
  node-uri-js node-url-parse-lax node-url-to-options node-util-deprecate node-uuid
  node-validate-npm-package-license node-validate-npm-package-name node-verror node-wcwidth.js
  node-which node-which-module node-wide-align node-widest-line node-wrap-ansi node-wrappy
  node-write-file-atomic node-xdg-basedir node-xtend node-y18n node-yallist node-yargs node-yargs-parser
  nodejs nodejs-doc

---

https://github.com/gruns/icecream

kissgyorgy 18 hours ago [–]

You can do the same thing with Python 3.8+ by using f-strings and just appending "=" to the variable name:

    >>> print(f"{d['key'][1]=}")
    d['key'][1]='one'

reply

grun 8 hours ago [–]

Hey! I'm Ansgar. I wrote Icecream.

f-strings's `=` is awesome. I'm overjoyed it was added to Python. I use it all the time.

That said, IceCream? does bring more to the table, like:

etc

I hope that helps!

reply

est 16 hours ago [–]

You can also use the breakpoint() introduced in py3.7 via PEP533

https://www.python.org/dev/peps/pep-0553/

reply

---

https://github.com/nothings/stb

---

http://smtlib.cs.uiowa.edu/index.shtml

---

a paper about the cons of fork:

A fork()in the road https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf https://dl.acm.org/doi/abs/10.1145/3317550.3321435

what they suggest instead:

" High-level: Spawn.In our opinion, most uses of fork andexec would be best served by a spawn API...Theposix_spawn()API can ease such refactoring. Ratherthan requiring that all parameters affecting a new pro-cess be provided at a single call-site (as is the case forCreateProcess()), spawn attributes are set by extensibly-defined helper functions. A post-forkclose(), for example,can be replaced by a pre-spawn call that records a “closeaction” to occur in the child. Unfortunately, this means thatthe API is specified as if it were implemented by fork andexec, although it is not actually required [32].The main drawback ofposix_spawn()is that it is not acomplete replacement for fork and exec. Some less-commonoperations, such as setting terminal attributes or switchingto an isolated namespace, are not yet supported. It also lacksan effective error-reporting mechanism: failures occurring inthe context of the child before it begins execution (such as in-valid file descriptor parameters) are reported asynchronouslyand are indistinguishable from the child’s termination. Theseshortcomings can and should be corrected. ... Low-level: Cross-process operations....an alternative model where system calls thatmodify per-process state are not constrained to merely thecurrent process, but rather can manipulate any process towhich the caller has access. This yields the flexibility andorthogonality of the fork/exec model, without most of itsdrawbacks: a new process starts as an empty address space,and an advanced user may manipulate it in a piecemeal fash-ion, populating its address-space and kernel context prior toexecution, without needing to clone the parent nor run codein the context of the child. ExOS? [43] implemented fork inuser-mode atop such a primitive. Retrofitting cross-processAPIs into Unix seems at first glance challenging, but mayalso be productive for future research. ... Copy-on-write memory...POSIX would benefit from an API for using copy-on-writememory independently of forking a new process. Bittau[16]proposedcheckpoint()andresume()calls to take copy-on-write snapshots of an address space, thus reducing the over-head of security isolation. More recently, Xu et al.[82]ob-served that fork time dominates the performance of fuzzingtools, and proposed a similarsnapshot()API. These designsare not yet general enough to cover all the use-cases outlinedabove, but perhaps can serve as a starting point "

related discussion: https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234 https://news.ycombinator.com/item?id=30502392 https://news.ycombinator.com/item?id=19621799

the author of the gist comments in https://news.ycombinator.com/item?id=30502392:

" I vehemently disagree with those who say that vfork() is much more difficult to use correctly than fork(). Neither is particularly easy to use though. Both have issues to do with, e.g., signals. posix_spawn() is not exactly trivial to use, but it is easier to use it correctly than fork() or vfork(). And posix_spawn() is extensible -- it is not a dead end.

My main points are that vfork() has been unjustly vilified, fork() is really not good, vfork() is better than fork(), and we can do better than vfork(). That said, posix_spawn() is the better answer whenever it's applicable. "

so.. sounds like posix_spawn is what we should focus on?

---

more opinions on the 'fork' model:

https://news.ycombinator.com/item?id=26984986

---

" Rust is the only programming language I've used that attempts to expose operating system primitives like environment variables, command arguments, and filesystem paths and doesn't completely mess it up. Truth be told, this is kind of a niche topic. But as I help maintain a version control tool which needs to care about preserving content identically across systems, this topic is near and dear to my heart.

In POSIX land, primitives like environment variables, command arguments, and filesystem paths are char*, or a bag of null-terminated bytes.

On Windows, these primitives are wchar_t*, or wide bytes.

On both POSIX and Windows, the encoding of the raw bytes can be... complicated.

Nearly every programming language / standard library in existence attempts to normalize these values to its native string type, which is typically Unicode or UTF-8. That's doable and correct a lot of the time. Until it isn't.

Rust, by contrast, has standard library APIs like std::env::vars() that will coerce operating system values to Rust's UTF-8 backed String type. But Rust also exposes the OsString? type, which represents operating system native strings. And there are function variants like std::env::vars_os() to access the raw values instead of the UTF-8 normalized ones.

Rust paths internally are stored as OsString?, as that as the value passed to the C API to perform filesystem I/O. However, you can coerce paths to String easily enough or define paths in terms of String without jumping through hoops. " [1]

---

" It is easy to forget that JavaScript? comes with a lot of batteries included (despite the claim that it doesn’t have a standard library). For example: You can handle arrays, objects, iterate over keys and values, split strings, filter, map, have prototypical inheritance and so on and so forth. All that is built into the JavaScript? engine. WebAssembly? comes with nothing, except arithmetic. "

---

https://observablehq.com/@observablehq/plot

---

mitchs 6 minutes ago [–]

Or just cast the pointer to uint##_t and use be##toh and htobe## from <endian.h>? I think this is making a mountain out of a mole hill. I've spent tons of time doing wire (de)serialization in C for network protocols and endian swaps are far from the most pressing issue I see. The big problem imo is the unsafe practices around buffer handling allowing buffer over runs.

reply

Animats 2 hours ago [–]

Rust gets this right. These primitives are available for all the numeric types.

    u32::from_le_byte(bytes) // u32 from 4 bytes, little endian
    u32::from_be_byte(bytes) // u32 from 4 bytes, big endian
    u32::to_le_bytes(num) // u32 to 4 bytes, little endian
    u32::to_be_bytes(num) // u32 to 4 bytes, big endian

This was very useful to me recently as I had to write the marshaling and un-marshaling for a game networking format with hundreds of messages. With primitives like this, you can see what's going on.

reply

(those are comments on https://justine.lol/endian.html )

---

https://github.com/golang/go/issues/45955 proposal: slices: new package to provide generic slice functions #45955

---

https://github.com/skullchap/chadstr

---

https://github.com/FrozenVoid/C-headers

---

https://arrayfire.org/docs/index.htm

---

https://github.com/SerenityOS/serenity/tree/master/AK AK is SerenityOS?'s C stdlib analog

---

"LLVM’s built-in unordered containers have a mode where iteration order is reversed" to catch bugs where someone accidentally/unknowingly used an unordered container when they needed an ordered one

---

various flaws in Golang's libraries, having to do with filesystem file attributes, filesystem paths, clocks, cross-platform ifdefs; and suggestions about how it should be (usually based on how Rust does the same thing): https://fasterthanli.me/articles/i-want-off-mr-golangs-wild-ride

---

twic on Feb 28, 2020 [–]

A post about fixing Date in JavaScript? got me thinking about why it took so long for languages to get good date/time APIs.

I think it's because it took so long to accept that date and time really is complicated.

If you sit down and work it out carefully, you end up with Joda-Time (more or less - not in all the details, but in the set of abstractions). If you balk at that and make something simpler, you make a subtly but fundamentally broken API.

It took a long time for us to get comfortable with the level of complexity in Joda-Time, but now nobody thinks a serious date/time API can be substantially simpler.

It sounds to me like you and the author are saying that Go does this balking systematically.

alexhutcheson on Feb 28, 2020 [–]

The author of Joda-Time actually thinks that even Joda-Time didn't get it quite right, and believes the java.time libraries in Java 8 and above (aka JSR-310[1]) are better than Joda-Time: https://blog.joda.org/2009/11/why-jsr-310-isn-joda-time_4941...

It turns out that abstractions for time are really hard to get right.

[1] https://jcp.org/en/jsr/detail?id=310

dehrmann on Feb 29, 2020 [–]

Props to him for not having an ego with that. Jodatime even recommends using JSR-310 time for new development.

nogridbag on Feb 29, 2020 [–]

The author of Joda was the primary person responsible for the new Java Date and Time API (JSR-310).

---

" sql.Result (the return value of Exec) has a LastInsertId?() that's an int64, so if you're using uuids, you can't use that at all and have to call Query instead and manage generated IDs yourself. "

-- https://news.ycombinator.com/item?id=22445165

uhoh-itsmaciek on Feb 28, 2020 [–]

Not to mention that sql.Result (the return value of Exec) has a LastInsertId?() that's an int64, so if you're using uuids, you can't use that at all and have to call Query instead and manage generated IDs yourself.

HelloNurse? on Feb 28, 2020 [–]

This is a more ridiculous symptom of bad library design than the filesystem trouble mentioned by the article.

In the real world, executing most SQL statement could be made to return a semi-useful integer according to simple and consistent rules (e.g. affected row count, -1 if there's no meaningful integer).

But the official Go documentation

https://golang.org/pkg/database/sql/#Result

makes it quite clear that the Go design committee decided to imitate a remarkably limited and inelegant MySQL? function that returns the value of an auto-increment column, not even realizing that only a few statements have auto-increment columns to begin with. I'd call this a negative amount of design effort.

  LastInsertId returns the integer generated by the database
  in response to a command. Typically this will be from an
  "auto increment" column when inserting a new row. Not all
  databases support this feature, and the syntax of such
  statements varies.

(Of course, MySQL?'s LAST_INSERT_ID() is only bad as a building block and inspiration for a general API; in SQL queries assumptions aren't a problem and overspecialized tools can be occasionally very useful)

Cthulhu_ on March 6, 2020 [–]

In a lot of cases - esp. distributed systems - it's not up to a database to generate a UUID, but the application. In theory you can have a hundred servers that generate records and send them to a central storage platform (which may or may not be a database, or event bus, etc).

UUIDs are not meant to be generated by databases.

whateveracct on Feb 28, 2020 [–]

Haha yes I've long since resigned myself to using Query + explicit RETURNING for inserts

Cut 2 of 1000

---

whateveracct on Feb 28, 2020 [–]

Here's an example of why Go's simplicity is complicated:

Say I want to take a uuid.UUID [1] and use it as my id type for some database structs.

At first, I just use naked UUIDs as the struct field types, but as my project grows, I find that it would be nice to give them all unique types to both avoid mixups and to make all my query functions clearer as to which id they are using.

    type DogId uuid.UUID
    type CatId uuid.UUID

I go to run my tests (thank goodness I have tests for my queries) and everything breaks! Postgres is complaining that I'm trying to use bytes as a UUID. What gives? When I remove the type definition and use naked UUIDs, it works fine!

The issue is Go encourages reflection for this use-case. The Scan() and Value() methods of a type tell the sql driver how to (de)serialize the type. uuid.UUID has those methods, but when I use a type definition around UUID, it loses those methods.

So the correct way to wrap a UUID to use in your DB is this:

    type DogId struct { uuid.UUID }
    type CatId struct { uuid.UUID }

Go promised me that I wouldn't have to deal with such weird specific knowledge of its semantics. But alas I always do.

[1] https://github.com/google/uuid

EDIT: This issue also affects encoding/json. You can see it in this playground for yourself! https://play.golang.org/p/erfcSIe-Z7b

EDIT: I wrongly used type aliases in the original example, but my issue is with type definitions (`type X Y` instead of `type X = Y`). So all you commenters saying that I did the wrong thing, have another look!

whateveracct on Feb 28, 2020 [–]

...

Haskell's json/sql marshalling do not use runtime reflection but instead ad hoc polymorphism, so when I create (or even derive automatically!) a marshalling instance, it is pretty easy for me to reason about what will happen statically. Haskell's Generic & newtype-deriving go a long way here, and are good examples of principled abstractions that do not leak.

Haskell's conduit (and other streaming libraries) is another good example. I use them to create programs that process things in constant memory, and when I compose them (e.g. with operators like =$= or .

in conduit), the resulting program streams in constant memory. I have built entire systems (CLIs, batch jobs, event processors, etc) on top of this abstraction and conduit itself has never leaked.

---

" In this way, most of libc in the glibc case resides on the target file system. But not all of it! There are still the "C runtime start files":

    Scrt1.o
    crti.o
    crtn.o

These are statically compiled into every binary that dynamically links glibc, and their ABI is therefore Very Very Stable.

And so, Zig bundles a small subset of glibc's source files needed to build these object files from source for every target. The total size of this comes out to 1.4 MiB? (252 KB gzipped). I do think there is some room for improvement here, but I digress.

There are a couple of patches to this small subset of glibc source files, which simplify them to avoid including too many .h files, since the end result that we need is some bare bones object files, and not all of glibc. " [2]

---

some random stuff about cross-compiliation and ELF linkloaders:

fao_ on March 25, 2020 [–]

You can do that in clang/gcc but you need to pass: -static and -static-plt(? I can't find what it's called). The second option is to ensure it's loader-independent, otherwise you get problems when compiling and running across musl/glibc platforms

nh2 on March 25, 2020 [–]

Could you elaborate/link on the loader-independency topic?

fao_ on March 25, 2020 [–]

In brief, most programs these days are position-independent, which means you need a runtime loader to load sections(?) and symbols of the code into memory and tell other parts of the code where they've put it. Because of differences between musl libc and gnu libc, in effect for the user this means that a program compiled on gnu libc can be marked as executable, but when they try to run it the user is told it is "not executable", because the binary is looking in the wrong place for the dynamic loader, which is named differently across the libraries. There are also some archaic symbols that gnu libc describes that are non-standard, which musl libc has a problem with, that can cause a problem for the end-user.

e: I didn't realise it was 5am, so I'm sorry if it's not very coherent.

acqq on March 25, 2020 [–]

I would also appreciate if you manage to be even more specific once more "coherency" is possible. I'm also interested what you specifically can say more about "The second option is to ensure it's loader-independent, otherwise you get problems when compiling and running across musl/glibc platforms"

fao_ on March 25, 2020 [–]

Ok so, it's been a year or so since I was buggering around with the ELF internals (I wrote a simpler header in assembly so I could make a ridiculously small binary...). Let's take a look at an ELF program. If you run `readelf -l $(which gcc)` you get a bunch of output, among that is:

    alx@foo:~$ readelf -l $(which gcc)
    Elf file type is EXEC (Executable file)
    Entry point 0x467de0
    There are 10 program headers, starting at offset 64
    Program Headers:
      Type           Offset             VirtAddr           PhysAddr
                     FileSiz            MemSiz              Flags  Align
      PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                     0x0000000000000230 0x0000000000000230  R      0x8
      INTERP         0x0000000000000270 0x0000000000400270 0x0000000000400270
                     0x000000000000001c 0x000000000000001c  R      0x1
          [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
      LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                     0x00000000000fa8f4 0x00000000000fa8f4  R E    0x200000

you can see that in the ELF header is a field called "INTERP" that requests the loader. This is because the program has been compiled with the -fPIE flag, which requests a "Position Independent Executable". This means that each section in the code has been compiled so that they don't expect a set position in memory for the other sections. In other words, you can't just run it on a UNIX computer and expect it to work, it relies on another library, to load each section, and tell the other sections where to load it.

The problem with this is that the musl loader (I don't have my x200 available right now to copy some output from it to illustrate the difference) is usually at a different place in memory. What this means is that when the program is run, the ELF loader tries to find the program interpreter to execute the program, because musl libc's program interpreter is at a different place and name in the filesystem hierarchy, it fails to execute the program, and returns "Not a valid executable".

Now you would think a naive solution would be to symlink the musl libc loader to the expected position in the filesystem hierarchy. The problem with this is illustrated when you look at the other dependencies and symbols exported in the program. Let's have a look:

    alx@foo:~$ readelf -s $(which gcc)
    Symbol table '.dynsym' contains 153 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
         1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __strcat_chk@GLIBC_2.3.4 (2)
         2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __uflow@GLIBC_2.2.5 (3)
         3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND mkstemps@GLIBC_2.11 (4)
         4: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND getenv@GLIBC_2.2.5 (3)
         5: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND dl_iterate_phdr@GLIBC_2.2.5 (3)
         6: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __snprintf_chk@GLIBC_2.3.4 (2)
         7: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __pthread_key_create
         8: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND putchar@GLIBC_2.2.5 (3)
         9: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strcasecmp@GLIBC_2.2.5 (3)

As you can see, the program not only expects a GNU program interpreter, but the symbols the program has been linked against expect GLIBC_2.2.5 version numbers as part of the exported symbols (Although I cannot recall if this causes a problem or not, memory says it does, but you'd be better off reading the ELF specification at this point, which you can find here: https://refspecs.linuxfoundation.org/LSB_2.1.0/LSB-Core-gene...). So the ultimate result of trying to run this program on a GNU LibC? system is that it fails to run, because the symbols are 'missing'. On top of this, you can see with `readelf -d` that it relies on the libc library:

    alx@foo:~$ readelf -d $(which gcc)
    Dynamic section at offset 0xfddd8 contains 25 entries:
      Tag        Type                         Name/Value
     0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
     0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]
     0x000000000000000c (INIT)               0x4026a8

Unfortunately for us, the libc.so.6 binary produced by the GNU system is also symbolically incompatible with the one produced by musl, also GNU LibC? defines some functions and symbols that are not in the C standard. The ultimate result of this is that you need to link statically against libc, and against the program loader, for this binary to have a chance at running on a musl system.

acqq on March 25, 2020 [–]

Wow. Your answer is really a good fit to the details provided by the author of the original article.

Many, many thanks for the answer! I've already done some experimenting myself and wanted to do more, so it really means a lot to me.

fao_ on March 28, 2020 [–]

For further interest you might want to take a look at:

http://www.muppetlabs.com/~breadbox/software/tiny/somewhat.h...

I altered a version of that ELF64 header for 64 bit, and then modified it to work under grsec's kernel patches: https://gitlab.com/snippets/1749660

bregma on March 25, 2020 [–]

One example of a description of how the Linux linkloader works is here [0]. Other OSes are similar.

[0] https://lwn.net/Articles/631631/

---

23 Sietsebb 3 days ago

link flag

The article mentions that Coreutils used to be 3 separate bundles, and I fell down a little bunnyhole. Here is the composition of Fileutils (2001), Textutils (2003), and Shellutils (2000), with thanks to the Tucows collection as preserved by the Internet Archive:

    Textutils: cat, cksum, comm, csplit, cut, expand, fmt, fold, head, join, md5sum, nl, od, paste, pr, sort, split, sum, tac, tail, tr, unexpand, uniq, wc.
    Fileutils: ansi2knr, chgrp, chown-core, chmod, chown, cp, copy, cp-hash, dd, df, ls, ls-dir, dircolors, du, install, ln, ls-ls, mkdir, mkfifo, mknod, mv, remove, rm, rmdir, shred, sync, touch, ls-vdir
    Shutils: basename, chroot, date, dirname, echo, env, expr, factor, false, groups, hostname, id, logname, nice, nohup, pathchk, printenv, printf, pwd, seq, sleep, stty, su, tee, test, true, tty, uname, users, who, whoami, yes.

---

from comments on "Rewriting the GNU Coreutils in Rust" https://lobste.rs/s/m0npll/rewriting_gnu_coreutils_rust:

epilys 3 days ago

link flag

I wish Rust had a bigger standard library (“batteries” included, like python in some degree)

See for example sort. I realise all of us download and run programs with lots of dependencies most days but I feel like core utils should not pull non-standard dependencies.

[dependencies] binary-heap-plus clap compare fnv itertools memchr ouroboros rand rayon semver tempfile unicode-width uucore uucore_procs

    13
    matklad 3 days ago | link | flag | 

Note that among those 12 direct dependencies, Python’s stdlib has direct equivalents only to 4: clap, itertools, rand, tempfile. Things like unicode-width, rayon, semver, binary-heap-plus are not provided by Python. compare, fnv, memchr and ouroboros are somewhat hard to qualify Rust-isms.

    2
    worr 2 days ago | link | flag | 

In addition, it’s worth noting that a lot of projects eschew argparse (what the alternative to clap would be) for click. If a similar project was done in python, I’d almost bet money that they’d use click.

rand being separate has some advantages, largely that it is able to move at a pace that’s not tied to language releases. I look at this as a similar situation that golang’s syscall package has (had? the current situation is unclear to me rn). If an OS introduces a new random primitive (getrandom(2), getentropy(2)), a separate package is a lot easier to update than the stdlib, which is tied to language releases.

Golang’s syscall package has (had?) a similar problem, which led to big changes being locked down, and the recommended pkg being golang.org/x/sys. There’s a lot more agility to be had to leverage features of the underlying OS if you don’t tie certain core features to the same cadence as a language release. (this is not to say that this is the only problem with the syscall package being in the stdlib, but it’s definitely one of them. more info on the move here: https://docs.google.com/document/d/1QXzI9I1pOfZPujQzxhyRy6EeHYTQitKKjHfpq0zpxZs/edit)

    1
    Forty-Bot 2 days ago (unread) | link | flag | 
        argparse
    I’d use getopt over argparse. argparse just has really abysmal parsing which is different from other shell tools, especially when dealing with subcommands.

1 epilys 3 days ago

link flag

True. Could be just rust-lang crates like futures-rs or cargo instead of being in the stdlib.

legoktm 3 days ago

link flag

Same. A big part of the learning curve for me was discovering modules like serde, tokio, anyhow/thiserror, and so on that seem necessary in just about every Rust program I write.

    3
    epilys 3 days ago | link | flag | 

Not providing a standard executor was the only complaint I had from async Rust.

    2
    kornel 2 days ago | link | flag | 

I like that there is no built-in blessed executor - it keeps Rust runtime-free.

I’ve worked on projects where using an in-house executor was a necessity.

Also gtk-rs supports using GTK’s event loop as the executor and it’s very cool to await button clicks :)

    1
    epilys 2 days ago | link | flag | 
        Yeah i’ve used it and it felt refreshing :) But for other small tools perhaps having a reference and minimal implementation would be good. I like the [https://crates.io/crates/smol smol] crate and I think would be perfect for this

3 proctrap edited 3 days ago

link flag

All of them developed over time and became a de-facto standard. But it was always the intention that the std doesn’t try to develop these tools as you need some iterations, which won’t work on with a stability guarantee. tokio just went to 1.0 this? year, I’ve got code lying around using 0.1,0.2 and some 0.3 (and don’t forget futures etc).

anyhow/thiserror ? Well there is failure,error-chain,quick-error,snafu,eyre (stable-eyre,color-eyre), simple-error….. And yes, some of them are still active as they solve different problems (I specifically had to move away from thiserror) and some are long deprecated. So there was a big amount of iteration (and some changes to the std Error trait as a result).

You don’t want to end up like c++ (video) where everybody treats the std implementation of regex as something you don’t ever want to use.

2 alexwennerberg 3 days ago

link flag

The rust ecosystem, IMO, is far too eager to pull in third party dependencies. I haven’t looked deep into this tool, but a quick glance leads me to believe that many of these dependencies could be replaced with the standard library and/or slimmed down alternative libraries and a little extra effort.

    2
    epilys 2 days ago | link | flag | 

Unfortunately it’s not always that simple. Let’s see the third party dependencies I pulled for meli, an email client, which was a project I started with the intention of implementing as much as possible myself, for fun.

xdg = "2.1.0" crossbeam = "0.7.2" signal-hook = "0.1.12" signal-hook-registry = "1.2.0" nix = "0.17.0" serde = "1.0.71" serde_derive = "1.0.71" serde_json = "1.0" toml = { version = "0.5.6", features = ["preserve_order", ] } indexmap = { version = "^1.6", features = ["serde-1", ] } linkify = "0.4.0" notify = "4.0.1" termion = "1.5.1" bincode = "^1.3.0" uuid = { version = "0.8.1", features = ["serde", "v4"] } unicode-segmentation = "1.2.1" smallvec = { version = "^1.5.0", features = ["serde", ] } bitflags = "1.0" pcre2 = { version = "0.2.3", optional = true } structopt = { version = "0.3.14", default-features = false } futures = "0.3.5" async-task = "3.0.0" num_cpus = "1.12.0" flate2 = { version = "1.0.16", optional = true }

From a quick glance, only nix, linkify, notify, uuid, bitflags could be easily replaced by invented here code because the part of the crates I use is small.

I cannot reasonably rewrite:

    serde
    flate2
    crossbeam
    structopt
    pcre2
    1
    alexwennerberg 2 days ago | link | flag | 

You could reduce transitory dependencies with:

serde -> nanoserde

structopt -> pico-args

Definitely agree that it isn’t that simple, and each project is different (and often it’s not worth the energy, esp for applications, not libraries), but it’s something I notice in the Rust ecosystem in general.

    4
    kornel edited 2 days ago | link | flag | 

But then you’re getting less popular deps, with fewer eyeballs on them, from less-known authors.

Using bare-bones pico-args is a poor deal here — for these CLI tools the args are their primary user interface. The fancy polished features of clap make a difference.

tobin_baker 2 days ago

link flag

Why do you think an external merge sort should be part of the Rust stdlib? I don’t think it’s part of the Python stdlib either. Rust already has sort() and unstable_sort() in its stdlib (unstable sort should have been the default, but that ship has sailed).

---

interesting reputation system (with transitive trust) for package reviews:

https://github.com/crev-dev/crev/ https://github.com/crev-dev/cargo-crev/blob/master/cargo-crev/src/doc/getting_started.md https://docs.rs/cargo-crev/0.20.1/cargo_crev/doc/user/index.html

---

syscalls in jonesforth:

EXIT OPEN CLOSE READ WRITE CREAT BRK

---

" Stability

We promise to maintain a stable API in Deno. Deno has a lot of interfaces and components, so it's important to be transparent about what we mean by "stable". The JavaScript? APIs that we have invented to interact with the operating system are all found inside the "Deno" namespace (e.g. Deno.open()). These have been carefully examined and we will not be making backwards incompatible changes to them. "

---

https://github.com/halfer53/winix

Supported Commands

    snake
    bash
    cat
    cp
    echo
    grep
    history
    ls
    mkdir
    mv
    ps
    pwd
    rm
    stat
    test
    touch
    uptime
    wc
    df
    du
    ln

Supported System Call

    times
    exit
    fork
    vfork
    execve
    brk
    alarm
    sigaction
    sigret
    waitpid
    kill
    getpid
    winfo
    strerror
    dprintf
    sysconf
    sigsuspend
    sigpending
    sigprocmask
    setpgid
    getpgid
    open
    read
    write
    close
    creat
    pipe
    mknod
    chdir
    chown
    chmod
    stat
    fstat
    dup
    dup2
    link
    unlink
    getdent
    access
    mkdir
    sync
    lseek
    umask
    fcntl
    ioctl
    setsid
    csleep
    getppid
    signal
    sbrk
    statfs
    getcwd
    tfork

(as of 210628)

older list of supported system calls, from commit de5790224b1ea9c0b37d9e8b8d75d958476b8481 at https://github.com/halfer53/winix.git , which is from 180928, the last commit (out of those commmits that changed the README.md, i think) in 2018:

TIMES 1 EXIT 2 FORK 3 VFORK 4 EXECVE 5 BRK 6 ALARM 7 SIGACTION 8 SIGRET 9 WAITPID 10 KILL 11 GETPID 12 WINFO 13 GETC 14 PRINTF 15 SYSCONF 16 SIGSUSPEND 17 SIGPENDING 18 SIGPROCMASK 19 SETPGID 20 GETPGID 21

that same list persisted in commit 433186765c0b49358849f0278f0c89aab2919599 , the first commit (out of those commmits that changed the README.md, i think) in 2020. The list stayed the same for the next few commits (except that the name of PRINTF was changed; also the commit description talks about CSLEEP but it isn't actually added to the list afaict), and then in commit 0686d08c1e2c8ec4e583b85a88418f8e37333462, tagged 2.0, at 200903, the list is changed to:

Name Syscall Number TIMES 1 EXIT 2 FORK 3 VFORK 4 EXECVE 5 BRK 6 ALARM 7 SIGACTION 8 SIGRET 9 WAITPID 10 KILL 11 GETPID 12 WINFO 13 STRERROR 14 DPRINTF 15 SYSCONF 16 SIGSUSPEND 17 SIGPENDING 18 SIGPROCMASK 19 SETPGID 20 GETPGID 21 OPEN 22 READ 23 WRITE 24 CLOSE 25 CREAT 26 PIPE 27 MKNOD 28 CHDIR 29 CHOWN 30 CHMOD 31 STAT 32 FSTAT 33 DUP 34 DUP2 35 LINK 36 UNLINK 37 GETDENT 38 ACCESS 39 MKDIR 40 SYNC 41 LSEEK 42 UMASK 43 FCNTL 44 IOCTL 45 SETSID 46 CSLEEP 47 GETPPID 48 SIGNAL 49 SBRK 50 STATFS 51

---

http://lars.nocrew.org/forth2012/core/

ABORT ABORTq ABS ACCEPT ACTION-OF AGAIN ALIGN ALIGNED ALLOT AND BASE BEGIN BL Bracket BracketCHAR? BracketCOMPILE? BracketTick? bs BUFFERColon CASE CComma CELLPlus CELLS CFetch CHAR CHARPlus CHARS Colon ColonNONAME? Comma COMPILEComma CONSTANT COUNT Cq CR CREATE CStore d DECIMAL DEFER DEFERFetch DEFERStore DEPTH Div DivMOD? DO DOES Dotp Dotq DotR? DROP DUP ELSE EMIT ENDCASE ENDOF ENVIRONMENTq Equal ERASE EVALUATE EXECUTE EXIT FALSE Fetch FILL FIND FMDivMOD? HERE HEX HOLD HOLDS I IF IMMEDIATE INVERT IS J KEY LEAVE less LITERAL LOOP LSHIFT MARKER MAX MIN Minus MOD more MOVE MTimes ne NEGATE NIP num-end num-start num numS OF OneMinus? OnePlus? OR OVER p PAD PARSE-NAME PARSE PICK Plus PlusLOOP? PlusStore? POSTPONE qDO qDUP QUIT RECURSE REFILL REPEAT RESTORE-INPUT RFetch Rfrom ROLL ROT RSHIFT SAVE-INPUT Semi Seq SIGN SMDivREM? SOURCE-ID SOURCE SPACE SPACES Sq STATE StoD? Store SWAP THEN Tick Times TimesDiv? TimesDivMOD? TO toBODY toIN toNUMBER toR TRUE TUCK TwoDiv? TwoDROP? TwoDUP? TwoFetch? TwoOVER? TwoRFetch? TwoRfrom? TwoStore? TwoSWAP? TwoTimes? TwotoR? TYPE Ud UDotR? Uless UMDivMOD? Umore UMTimes UNLOOP UNTIL UNUSED VALUE VARIABLE WHILE WITHIN WORD XOR ZeroEqual? Zeroless Zeromore Zerone

---

http://www.oilshell.org/blog/2022/03/backlog-arch.html

---