proj-plbook-plChStdLibraries

Table of Contents for Programming Languages: a survey

Standard libraries

Case studies of the complete standard libraries of various languages

C standard library

The C standard libary, 'libc', provides us with a unique opportunity to study since C is so popular. Due to this, libc has been reimplemented on a variety of platforms, and there also exist various 'stripped-down' libcs which provide less functionality than is required by the spec. This allows us to not only study the standard C library and its implementation, but also to study which subsets of this library were considered most essential by the authors of the various stripped-down alternative libcs. In addition, there are some in-progress implementations of libc, which allows us to study which parts the authors felt were necessary to implement first.

Many "libc" implementations combine the C standard library with an implementation of the POSIX standard.

"Whereas the kernel (Linux) governs access to hardware, memory, filesystems, and the privileges for accessing these resources, the C library is responsible for providing the actual C function interfaces userspace applications see, and for constructing higher-level buffered stdio, memory allocation management, thread creation and synchronization operations, and so on using the lower-level interfaces the kernel provides, as well as for implementing pure library routines of the C language like strstr, snprintf, strtol, exp, sqrt, etc. " -- http://www.etalabs.net/musl/faq.html

See also plChImplementationTools, which has a section on libc (from the point of view of a language implementor making use of it).

Following are some suggestions for both standards-compliant and 'alternative' libcs to look at for inspiration:

C standard lib and related standards

Full libcs

Summaries: musl only supports UTF-8 as the locale's character encoding

popular alternative libcs

newlib libc (permissive license but only on non-Linux targets):

dietlibc (GPL!):

bionic:

uClibc (LGPL) (is this actualyl a full libc?):

under construction, or less popular but notable because more stripped/minimalistic

klibc (GPL! though as Wikipedia notes "(This only applies to klibc as a whole due to embedding some Linux kernel derived files; most of the library source code is actually[5] available under a BSD licence from UCB or the Historical Permission Notice and Disclaimer.)" citing https://archive.today/20120710002157/http://git.kernel.org/?p=libs/klibc/klibc.git;a=blob;f=usr/klibc/LICENSE;h=aa6d7a7e37015f95f852ceb29cc417519919d806;hb=75216b5c62b3b3635ae0c6cf7ee47757d7d99100 ):

PDCLib (permissive license):

libc11 (permissive license):

pedagogical/non-production-use minimal libc-like libraries:

there are more on plChImplementationTools under 'Comparisons and lists of libcs', and links to lists which may have added more

what Golang did

minimalistic and roll-your-own libcs

libc comparisons and discussions

C POSIX Library

Links:

Haskell: prelude

Redesigns of the Haskell-prelude

Haskell: other libraries

C++ standard libraries

Some CSPRNGs are designed to be given external entropy on an ongoing basis, rather than just once as a random seed.

CSPRNGs do tend to have disadvantages over other PRNGs, however.

In the context of programming language design, sometimes programming language implementation benchmarks are bottlenecked by the speed of random number generation [5].

See also [6].

Test suites and comparisons

Test suites:

Some examples of the sorts of things that are tested in 'statistical randomness tests'

" An initial battery of statistical tests for uniform RNGs was offered by the 1969 first edition of The Art of Computer Programming, by Donald Knuth. In testing of RNGs, Knuth's tests were supplanted by George Marsaglia's (1996) Diehard tests, which consisted of fifteen different tests. The inability to modify the test parameters or add new tests led to the development of the TestU?01 library. "

Wikipedia says:

" In practice, the output from many common PRNGs exhibit artifacts that cause them to fail statistical pattern-detection tests. These include:

    Shorter than expected periods for some seed states (such seed states may be called 'weak' in this context);
    Lack of uniformity of distribution for large numbers of generated numbers;
    Correlation of successive values;
    Poor dimensional distribution of the output sequence;
    The distances between where certain values occur are distributed differently from those in a random sequence distribution.

Defects exhibited by flawed PRNGs range from unnoticeable (and unknown) to very obvious. An example was the RANDU random number algorithm used for decades on mainframe computers. It was seriously flawed, but its inadequacy went undetected for a very long time. "-- https://en.wikipedia.org/wiki/Pseudorandom_number_generator#Potential_problems_with_deterministic_generators

"

Kendall and Smith's original four tests were hypothesis tests, which took as their null hypothesis the idea that each number in a given random sequence had an equal chance of occurring, and that various other patterns in the data should be also distributed equiprobably.

    The frequency test, was very basic: checking to make sure that there were roughly the same number of 0s, 1s, 2s, 3s, etc.
    The serial test, did the same thing but for sequences of two digits at a time (00, 01, 02, etc.), comparing their observed frequencies with their hypothetical predictions were they equally distributed.
    The poker test, tested for certain sequences of five numbers at a time (aaaaa, aaaab, aaabb, etc.) based on hands in the game poker.
    The gap test, looked at the distances between zeroes (00 would be a distance of 0, 030 would be a distance of 1, 02250 would be a distance of 3, etc.).

If a given sequence was able to pass all of these tests within a given degree of significance (generally 5%), then it was judged to be, in their words "locally random". "

" Other tests:

    The Monobit test treats each output bit of the random number generator as a coin flip test, and determine if the observed number of heads and tails are close to the expected 50% frequency. The number of heads in a coin flip trail forms a binomial distribution.
    The Wald–Wolfowitz runs test tests for the number of bit transitions between 0 bits, and 1 bits, comparing the observed frequencies with expected frequency of a random bit sequence.
    Information entropy
    Autocorrelation test
    Kolmogorov–Smirnov test
    The Spectral Test[3]
    Maurer's Universal Statistical Test

"

Popular PRNGs

The most popular PRNG is currently MT19937, "Mersenne Twister".

The V8 Javascript implementation recently switched from an older PRNG to xorshift128+ [7] [8]

Some other PRNGs i've heard of are xorshift*1024, xorshift+64.

Some other (families of?) CSPRNGs i've heard of are ChaCha?20, AES, Fortuna.

There is a (family?) called PCG which is supposedly good for simulation but which is very new [9].

Tombdo: Some PRNGs are said to have a feature called "multiple streams" which is good for some applications (simulation?), i don't yet know what this is.

Misc

To scale random numbers (eg to get a uniform random from 1 to 10 from a uniform random number generator from 0 to 1): Math.floor(Math.random() * scale);

"(Quick note: it’s subtle, but in general this method is slightly biased if your scaled range doesn’t evenly divide your PRNG’s output range. A general solution should use rejection sampling like this, which is part of the standard library in other languages.) "

"A common trap people fall into with standard libraries is filling them up with trivia. Trivia is sand clogging the gears and just dead weight that has to be carried around forever. My general rule is if the explanation for what the function does is more lines than the implementation code, then the function is likely trivia and should be booted out." -- So You Want To Write Your Own Language? By Walter Bright

PRNG Links

Hash tables

Links:

Rust

https://doc.rust-lang.org/core/

Time

Java

Jodatime

Misc tips

http://www.johndcook.com/blog/2010/06/07/math-library-functions-that-seem-unnecessary/ explains why, because of floating point roundoff, log1p isn't redundant with log, expm1 isn't redundant with exp, erfc isn't redundant with erf, and lgamma isn't redundant with tgamma.

https://github.com/brson/rust-api-guidelines

on which regex libs might actually implement efficient/non-backtracking regular language recognition:

Todo

Introduction to POSIX which speculates in which situations something else may be needed (cites much related work):

Paper that profiles various applications to determine which parts of POSIX are most used (some examples: memory allocation, pthreads, files) and which appear to be becoming obsolete/often superceded by later abstractions (for example, IPC): POSIX Abstractions in Modern Operating Systems: The Old, the New, and the Missing