Table of Contents for Programming Languages: a survey

Standard libraries

Case studies of the complete standard libraries of various languages

C standard library

The C standard libary, 'libc', provides us with a unique opportunity to study since C is so popular. Due to this, libc has been reimplemented on a variety of platforms, and there also exist various 'stripped-down' libcs which provide less functionality than is required by the spec. This allows us to not only study the standard C library and its implementation, but also to study which subsets of this library were considered most essential by the authors of the various stripped-down alternative libcs. In addition, there are some in-progress implementations of libc, which allows us to study which parts the authors felt were necessary to implement first.

Many "libc" implementations combine the C standard library with an implementation of the POSIX standard.

"Whereas the kernel (Linux) governs access to hardware, memory, filesystems, and the privileges for accessing these resources, the C library is responsible for providing the actual C function interfaces userspace applications see, and for constructing higher-level buffered stdio, memory allocation management, thread creation and synchronization operations, and so on using the lower-level interfaces the kernel provides, as well as for implementing pure library routines of the C language like strstr, snprintf, strtol, exp, sqrt, etc. " -- http://www.etalabs.net/musl/faq.html

See also plChImplementationTools, which has a section on libc (from the point of view of a language implementor making use of it).

Following are some suggestions for both standards-compliant and 'alternative' libcs to look at for inspiration:

C standard lib and related standards

C99 standard (document WG14 N1256 with TC1/2/3 applied; according to [1] this is C99)
POSIX 2008 with TC1 applied
other standards

Full libcs

musl (permissive license)
- http://www.etalabs.net/musl/faq.html
  - section "What is musl?" is well-written
  - "Unlike glibc, musl is lightweight and simple. Compiling musl from source takes less than 40 seconds on very modest hardware (Intel Atom D510 1.666 GHz). Simple programs can get by with as little as 8kb of overhead per process, and trivial startup time. Locale support is limited to UTF-8 encoding only, which allows much more direct code paths and better performance. Stdio is also much simpler, smaller, and in many cases significantly faster. musl tends to match glibc's big-O performance for most operations, with constant factors varying between moderately worse and significantly better. Unlike glibc, all algorithms used by musl have optimal space big-O; many operations which can fail on glibc due to out-of-memory conditions never fail with musl, due to superior O(1)-space implementations. Unlike dietlibc, musl does not strive to eliminate every last “wasted” byte at the expense of cutting corners on performance and correctness. Small size instead follows as a natural consequence of clean, simple imlementation. musl does not try to discourage you from using stdio, threads, regular expressions, and so on, but rather makes these components sufficiently efficient (in binary size, runtime memory usage, and performance) that you don't have to think twice about using them. Unlike uClibc, musl has a stable, well-defined ABI. Aside from major library components like stdio and locale, uClibc derives (often with changes to reduce size) a lot of code from glibc and other implementations. The vast majority of musl, on the other hand, has been written from the ground up to be simple and efficient. There are no options to omit certain functionality to reduce library size; instead, very strict care has been taken not to introduce unnecessary dependencies between components of the library, so that static linking will only pull in code which is strictly necessary. Unlike Google Bionic (the Android libc), musl does not arbitrarily omit functionality that's required by the standards in order to achieve a simpler and lighter library, and is not based on old BSD code. Otherwise, Bionic and musl share a goal of simplicity, and in fact musl would make a good choice of libc for mobile devices. Unlike all of the above, musl unifies all standard functions into a single library (.a or .so) file. This greatly reduces startup time and memory overhead when using shared libraries, and eliminates much of the complexity of upgrades and the risk of version mismatches. " -- section "What sets musl apart from other implementations?"
  - "Musl's default thread stack size is 12k (+4k guard page), which might come as a shock for programmers used to glibc's 2MB-10MB defaults" (section "What should I know about musl and pthreads?")
  - custom malloc (design is described in section "What malloc implementation does musl use?")
  - "musl only supports UTF-8 as the locale's character encoding" (section "What is the status of locale support?")
  - missing: Some wide character interfaces, POSIX priority scheduling options (section "What major interfaces are still missing?")

Summaries: musl only supports UTF-8 as the locale's character encoding

glibc (LGPL; considered by some to be 'bloated' compared to eg musl)

popular alternative libcs

newlib libc (permissive license but only on non-Linux targets):

(is this actualyl a full libc?)

dietlibc (GPL!):

list of the exported symbols

bionic:

uClibc (LGPL) (is this actualyl a full libc?):

what is in uclibc/libc
what is in uclibc/
http://git.uclibc.org/uClibc/tree/docs/Glibc_vs_uClibc_Differences.txt
http://git.uclibc.org/uClibc/tree/docs/uClibc_vs_SuSv3.txt
http://git.uclibc.org/uClibc/plain/extra/Configs/Config.in.arch
" In other cases, uClibc leaves certain features (such as full C99 Math library support, wordexp, IPV6, and RPC support) disabled by default. Those features can be enabled for people that need them, but are otherwise disabled to save space. Some of the space savings in uClibc is obtained at the cost of performance, and some is due to sacrificing features. Much of it comes from aggressive refactoring of code to eliminate redundancy. In regards to locale data, elimination of redundant data storage resulted in substantial space savings. The result is a libc that currently includes the features needed by nearly all applications and yet is considerably smaller than glibc. To compare "apples to apples", if you take uClibc and compile in locale data for about 170 UTF-8 locales, then uClibc will take up about 570k. If you take glibc and add in locale data for the same 170 UTF-8 locales, you will need over 30MB!!! " -- http://www.uclibc.org/FAQ.html
"uClibc was originally created to support µClinux, a port of Linux for MMU-less microcontrollers such as the Dragonball, Coldfire, and ARM7TDMI. These days, uClibc also works just fine on normal Linux systems (such as i386, ARM, and PowerPC?), but we couldn't think of a better name. " -- http://www.uclibc.org/FAQ.html

under construction, or less popular but notable because more stripped/minimalistic

klibc (GPL! though as Wikipedia notes "(This only applies to klibc as a whole due to embedding some Linux kernel derived files; most of the library source code is actually[5] available under a BSD licence from UCB or the Historical Permission Notice and Disclaimer.)" citing https://archive.today/20120710002157/http://git.kernel.org/?p=libs/klibc/klibc.git;a=blob;f=usr/klibc/LICENSE;h=aa6d7a7e37015f95f852ceb29cc417519919d806;hb=75216b5c62b3b3635ae0c6cf7ee47757d7d99100 ):

PDCLib (permissive license):

libc11 (permissive license):

what is included in libc11 (/src)
https://www.codeproject.com/Articles/15156/Tiny-C-Runtime-Library
newlib-nano
- https://community.arm.com/developer/ip-products/system/b/embedded-blog/posts/shrink-your-mcu-code-size-with-gcc-arm-embedded-4-7
ARM Microlib
- https://developer.arm.com/docs/100073/0613/the-arm-c-micro-library
- https://os.mbed.com/docs/mbed-os/v6.2/bare-metal/using-small-c-libraries.html
nuttX's libc
- https://nuttx.yahoogroups.narkive.com/XnWb6BC4/newbie-nuttx-on-stm32-mcu
Zephyr's minimal C library
- https://docs.zephyrproject.org/latest/guides/c_library.html
Embedded Artistry libc
IBM Netazza pre 5.0
Cosmopolitan
- "Compared with glibc, you should expect Cosmopolitan to be almost as fast, but with an order of a magnitude tinier code size. Compared with Musl or Newlib, you can expect that Cosmopolitan will generally go much faster, while having roughly the same code size, if not tinier."
- Apparently runs on x86 bare metal, as well as "every Linux distro in addition to Mac OS X, Windows NT, FreeBSD?, and OpenBSD?"

pedagogical/non-production-use minimal libc-like libraries:

there are more on plChImplementationTools under 'Comparisons and lists of libcs', and links to lists which may have added more

what Golang did

"Gc uses a custom C library to keep the footprint under control; it is compiled with a version of the Plan 9 C compiler that supports resizable stacks for goroutines" -- [2].

minimalistic and roll-your-own libcs

rt0:
- https://news.ycombinator.com/item?id=8974024
- provides only "argc, argv, envp, __environ, _exit, and syscall0/1/2/3/4/5/6"
a tutorial for doing without a libc
http://wiki.osdev.org/Creating_a_C_Library
https://github.com/selfsigned/libft

libc comparisons and discussions

C POSIX Library

Links:

Haskell: prelude

Redesigns of the Haskell-prelude

Haskell: other libraries

C++ standard libraries

Discussion of LLVM's libc++ implementing the C++11 standard using C++11 language extensions provided by Clang added onto C++03 == Python std libraries == === Comprehensions preferred over map, filter, reduce === http://www.artima.com/weblogs/viewpost.jsp?thread=98196 ---- = By topic = == Pseudo-random number generation (PRNG, or RNG for short) == === Criteria for PRNGs === Quality: * cycle length: how many random number you can request from the PRNG before it starts repeating itself * various statistical tests of randomness (see below) Performance: * speed * size of memory required to hold the state === CSPRNG vs not === CSPRNG stands for "cryptographically secure pseudorandom number generator". "The requirements of an ordinary PRNG are also satisfied by a cryptographically secure PRNG, but the reverse is not true." [https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator. Both ordinary PRNGs and CSPRNGs are supposed to pass statistical randomness tests, but CSPRNGs in addition should have the following two properties:
Given all previous outputs of the CSPRNG (but not the current state), it should be difficult to guess the next output (a formalization of this criterion is the next-bit test)
Given the current state of the CSPRNG (but not the previous outputs), it should be difficult to guess the previous outputs.

Some CSPRNGs are designed to be given external entropy on an ongoing basis, rather than just once as a random seed.

CSPRNGs do tend to have disadvantages over other PRNGs, however.

Some CSPRNGs don't support being given a single random seed (making them unsuitable for applications demanding the ability to exactly recreate previous behavior by simply reusing the seed) [3].
CSPRNGs tend to be much slower than PRNGs [4], making them relatively unsuitable for computations which are can be bottlenecked by random number generation, such as simulation and hashing (note however that .
CSPRNGs sometimes do worse than PRNGs at non-cryptographic criteria such as cycle length and statistical tests of randomness
CSPRNGs are often harder to formally prove things about than some PRNGs, meaning less proven guarantees about things like cycle length and statistical tests of randomness, although on the other hand a popular CSPRNGs has likely been subjected to intense scrutiny by cryptographic experts.

In the context of programming language design, sometimes programming language implementation benchmarks are bottlenecked by the speed of random number generation [5].

Test suites and comparisons

Test suites:

TestU01 seems to be the most popular test suite framework. Among other things, it contains two popular test suites, SmallCrush? and BigCrush? (BigCrush? is more rigorous).
NIST tests
rngtest
Comparisons:
http://www.pcg-random.org/other-rngs.html
http://xorshift.di.unimi.it/ compares various xorshift, xorshift*, xorshift+s, and MT19937-64 (Mersenne Twister), and WELL1024a
http://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf

Some examples of the sorts of things that are tested in 'statistical randomness tests'

" An initial battery of statistical tests for uniform RNGs was offered by the 1969 first edition of The Art of Computer Programming, by Donald Knuth. In testing of RNGs, Knuth's tests were supplanted by George Marsaglia's (1996) Diehard tests, which consisted of fifteen different tests. The inability to modify the test parameters or add new tests led to the development of the TestU?01 library. "

Wikipedia says:

" In practice, the output from many common PRNGs exhibit artifacts that cause them to fail statistical pattern-detection tests. These include:

    Shorter than expected periods for some seed states (such seed states may be called 'weak' in this context);
    Lack of uniformity of distribution for large numbers of generated numbers;
    Correlation of successive values;
    Poor dimensional distribution of the output sequence;
    The distances between where certain values occur are distributed differently from those in a random sequence distribution.

Defects exhibited by flawed PRNGs range from unnoticeable (and unknown) to very obvious. An example was the RANDU random number algorithm used for decades on mainframe computers. It was seriously flawed, but its inadequacy went undetected for a very long time. "-- https://en.wikipedia.org/wiki/Pseudorandom_number_generator#Potential_problems_with_deterministic_generators

Kendall and Smith's original four tests were hypothesis tests, which took as their null hypothesis the idea that each number in a given random sequence had an equal chance of occurring, and that various other patterns in the data should be also distributed equiprobably.

    The frequency test, was very basic: checking to make sure that there were roughly the same number of 0s, 1s, 2s, 3s, etc.

    The serial test, did the same thing but for sequences of two digits at a time (00, 01, 02, etc.), comparing their observed frequencies with their hypothetical predictions were they equally distributed.
    The poker test, tested for certain sequences of five numbers at a time (aaaaa, aaaab, aaabb, etc.) based on hands in the game poker.
    The gap test, looked at the distances between zeroes (00 would be a distance of 0, 030 would be a distance of 1, 02250 would be a distance of 3, etc.).

If a given sequence was able to pass all of these tests within a given degree of significance (generally 5%), then it was judged to be, in their words "locally random". "

" Other tests:

    The Monobit test treats each output bit of the random number generator as a coin flip test, and determine if the observed number of heads and tails are close to the expected 50% frequency. The number of heads in a coin flip trail forms a binomial distribution.
    The Wald–Wolfowitz runs test tests for the number of bit transitions between 0 bits, and 1 bits, comparing the observed frequencies with expected frequency of a random bit sequence.
    Information entropy
    Autocorrelation test
    Kolmogorov–Smirnov test
    The Spectral Test[3]
    Maurer's Universal Statistical Test

generate white noise using http://bl.ocks.org/mmalone/bf59aa2e44c44dde78ac and then eyeball it (example provided in section "V8’s PRNG is Comparatively Unsatisfactory" of https://medium.com/@betable/tifu-by-using-math-random-f1c308c4fd9d
do Monte-Carlo Estimate of PI after 10^10 iterations using https://gist.github.com/mmalone/796d959dcf5b780106f4
see also https://en.wikipedia.org/wiki/Randomness_tests
"Though there are commonly used statistical testing techniques such as NIST standards, Yongge Wang showed that NIST standards are not sufficient. Furthermore, Yongge Wang [4] designed statistical distance based and the law of the iterated logarithm based testing techniques. Using this technique, Yongge Wang and Tony Nicol [5] detect the weakness in commonly used pseudorandom generators such as the well known Debian version of OpenSSL? pseudorandom generator." -- https://en.wikipedia.org/wiki/Randomness_tests
https://en.wikipedia.org/wiki/Algorithmically_random_sequence
https://en.wikipedia.org/wiki/Pseudorandom_number_generator#BSI_evaluation_criteria

Popular PRNGs

The most popular PRNG is currently MT19937, "Mersenne Twister".

The V8 Javascript implementation recently switched from an older PRNG to xorshift128+ [7] [8]

Some other PRNGs i've heard of are xorshift*1024, xorshift+64.

Some other (families of?) CSPRNGs i've heard of are ChaCha?20, AES, Fortuna.

There is a (family?) called PCG which is supposedly good for simulation but which is very new [9].

Tombdo: Some PRNGs are said to have a feature called "multiple streams" which is good for some applications (simulation?), i don't yet know what this is.

Misc

To scale random numbers (eg to get a uniform random from 1 to 10 from a uniform random number generator from 0 to 1): Math.floor(Math.random() * scale);

"(Quick note: it’s subtle, but in general this method is slightly biased if your scaled range doesn’t evenly divide your PRNG’s output range. A general solution should use rejection sampling like this, which is part of the standard library in other languages.) "

"A common trap people fall into with standard libraries is filling them up with trivia. Trivia is sand clogging the gears and just dead weight that has to be carried around forever. My general rule is if the explanation for what the function does is more lines than the implementation code, then the function is likely trivia and should be booted out." -- So You Want To Write Your Own Language? By Walter Bright

PRNG Links

Hash tables

Links:

https://en.m.wikibooks.org/wiki/Data_Structures/Hash_Tables
[10] is a discussion on hashing functions for programming languages which protect against 'hash flooding', a denial of service attack where the attacker chooses inputs that cause hash collissions, causing hash table performance to drop from average-case to worst-case. One such function is SipHash, now used in Python and Ruby.
Classical Data Structures That Can Outperform Learned Indexes
- Lobsters discussion on: Classical Data Structures That Can Outperform Learned Indexes
https://www.andreinc.net/2021/10/02/implementing-hash-tables-in-c-part-1
- discussion: https://news.ycombinator.com/item?id=28889442

Rust

https://doc.rust-lang.org/core/

Time

Java

Jodatime

Some flaws in Joda-time, by the author of Joda-time: https://blog.joda.org/2009/11/why-jsr-310-isn-joda-time_4941.html

Misc tips

http://www.johndcook.com/blog/2010/06/07/math-library-functions-that-seem-unnecessary/ explains why, because of floating point roundoff, log1p isn't redundant with log, expm1 isn't redundant with exp, erfc isn't redundant with erf, and lgamma isn't redundant with tgamma.

https://github.com/brson/rust-api-guidelines

on which regex libs might actually implement efficient/non-backtracking regular language recognition:

Todo

Introduction to POSIX which speculates in which situations something else may be needed (cites much related work):

Transcending POSIX: The End of an Era?

Paper that profiles various applications to determine which parts of POSIX are most used (some examples: memory allocation, pthreads, files) and which appear to be becoming obsolete/often superceded by later abstractions (for example, IPC): POSIX Abstractions in Modern Operating Systems: The Old, the New, and the Missing

proj-plbook-plChStdLibraries