proj-plbook-plChWebBrowserIntermedLangs

Table of Contents for Programming Languages: a survey

Web Browser ILs


WebAssembly

http://webassembly.org/

https://webassembly.github.io/spec/ https://webassembly.github.io/spec/_download/WebAssembly.pdf

https://www.w3.org/TR/wasm-core-1/

announcement: https://brendaneich.com/2015/06/from-asm-js-to-webassembly/

tutorials:

tools:

misc related links:

The language

from https://webassembly.github.io/spec/core/binary/instructions.html :

"Function-pointer values are comparable for equality and the addressof operator is monomorphic. Function-pointer values can be explicitly coerced to and from integers (which, in particular, is necessary when loading/storing to the heap since the heap only provides integer types)."

Multiple return value calls will be possible,

other language comments

https://github.com/WebAssembly/design/blob/master/AstSemantics.md :

" Why not a stack-, register- or SSA-based bytecode?

    Smaller binary encoding: JSZap, Slim Binaries.
    Polyfill prototype shows simple and efficient translation to asm.js."

" Break and continue statements can only target blocks or loops in which they are nested. This guarantees that all resulting control flow graphs are reducible, which leads to the following advantages:

    Simple and size-efficient binary encoding and compilation.
    Any control flow—even irreducible—can be transformed into structured control flow with the Relooper algorithm, with guaranteed low code size overhead, and typically minimal throughput overhead (except for pathological cases of irreducible control flow). Alternative approaches can generate reducible control flow via node splitting, which can reduce throughput overhead, at the cost of increasing code size (potentially very significantly in pathological cases).
    The signature-restricted proper tail-call feature would allow efficient compilation of arbitrary irreducible control flow."

" Expression trees offer significant size reduction by avoiding the need for set_local/get_local pairs in the common case of an expression with only one, immediate use. The ('comma' and 'conditional') primitives provide AST nodes that express control flow and thus allow more opportunities to build bigger expression trees and further reduce set_local/get_local usage (which constitute 30-40% of total bytes in the polyfill prototype). "

other language details

https://github.com/WebAssembly/design/blob/master/AstSemantics.md :

" In the MVP, trapping means that execution in the WebAssembly? module is terminated and abnormal termination is reported to the outside environment. In a JS environment, such as a browser, this would translate into a JS exception being thrown.

...

Individual storage locations in WebAssembly? are typed, including global variables, local variables, and parameters. The heap itself is not typed, but all accesses to the heap are annotated with a type. The legal types for global variables and heap accesses are called Memory types....The legal types for parameters and local variables, called Local types are a subset of the Memory types:"

...

Parameters are addressed as local variables. Local variables do not have addresses and are not aliased in the globals or the heap. Each function has a fixed, pre-declared number of local variables. Local variables have Local types and are initialized to the appropriate zero value for their type at the beginning of the function, except parameters which are initialized to the values of the arguments passed to the function.

...

Break and continue statements can only target blocks or loops in which they are nested.

...

The addition of the offset and index is specified to use infinite precision such that an out-of-bounds access never wraps around to an in-bounds access.

...

Global variables are not aliased to the heap.

...

The specification will add atomicity annotations in the future.

...

Each function has a signature in terms of local types, and calls must match the function signature exactly.

...

For security and safety reasons, the integer value of a coerced function-pointer value is an abstract index and does not reveal the actual machine code address of the target function.

In the MVP, function pointer values are local to a single module. The dynamic linking feature is necessary for two modules to pass function pointers back and forth.

Multiple return value calls will be possible, though possibly not in the MVP. The details of multiple-return-value calls needs clarification. Calling a function that returns multiple values will likely have to be a statement that specifies multiple local variables to which to assign the corresponding return values.

...

All basic data types allow literal values of that data type

...

Explicitly signed and unsigned operations trap whenever the result cannot be represented in the result type. This includes division and remainder by zero, and signed division overflow (INT32_MIN / -1). Signed remainder with a non-zero denominator always returns the correct value, even when the corresponding division would trap. Signed-less operations never trap.

Shifts interpret their shift count operand as an unsigned value. When the shift count is at least the bitwidth of the shift, shl and shr return zero, and sar returns zero if the value being shifted is non-negative, and negative one otherwise.

Note that greater-than and greater-than-or-equal operations are not required, since a < b is equivalent to b > a and a <= b is equivalent to b >= a. Such equalities also hold for floating point comparisons, even considering NaN?."

API not included

"

...

WebAssembly? does not specify any APIs or syscalls, only an import mechanism where the set of available imports is defined by the host environment. In a Web environment, functionality is accessed through the Web APIs defined by the Web Platform. Non-Web environments can choose to implement standard Web APIs, standard non-Web APIs (e.g. POSIX), or invent their own. " -- https://github.com/WebAssembly/design/blob/master/Portability.md

Future features

https://github.com/WebAssembly/design/blob/master/PostMVP.md#threads

    Compiler transforms throw to abort().
    Compiler-enforced -fno-exceptions mode (note caveats).
    Compiler conversion of exceptions to branching at all callsites.
    In a Web environment exception handling can be emulated using JavaScript exception handling, which can provide correct semantics but isn't fast.

These modes are suboptimal for code bases which rely on C++ exception handling, but are perfectly acceptable for C code, or for C++ code which avoids exceptions. This doesn't prevent developers from using the C++ standard library: their code will function correctly (albeit slower at times) as long as it doesn't encounter exceptional cases.

Post-MVP, WebAssembly? will gain support for developer access to stack unwinding, inspection, and limited manipulation. These are critical to supporting zero-cost exception handling by exposing low-level capabilities.

In turn, stack unwinding, inspection, and limited manipulation will be used to implement setjmp/longjmp. This can enable all of the defined behavior of setjmp/longjmp, namely unwinding the stack without calling C++ destructors. It does not, however, allow the undefined behavior case of jumping forward to a stack that was already unwound which is sometimes used to implement coroutines. Coroutine support is being considered separately."

https://github.com/WebAssembly/design/blob/master/FutureFeatures.md :

to reiterate some of that: https://github.com/WebAssembly/design/blob/master/CAndC%2B%2B.md " Porting C and C++ code to WebAssembly? Platform features

WebAssembly? has a pretty conventional ISA: 8-bit bytes, two's complement integers, little-endian, and a lot of other normal properties. Reasonably portable C/C++ code should port to WebAssembly? without difficultly.

In the MVP, WebAssembly? will have an ILP32 data model, meaning that int, long, and pointer types are all 32-bit. The long long type is 64-bit.

In the future, WebAssembly? will be extended to support 64-bit address spaces. This will enable an LP64 data model as well, meaning that long and pointer types will be 64-bit, while int is 32-bit. From a C/C++ perspective, this will be a separate mode from ILP32, with a separate ABI. Language Support

C and C++ language conformance is largely determined by individual compiler support, but WebAssembly? includes all the functionality that popular C and C++ compilers need to support high-quality implementations.

While the MVP will be fully functional, additional features enabling greater performance will be added soon after, including:

    Support for multi-threaded execution with shared memory.
    Zero-cost C++ exception handling. C++ exceptions can be implemented without this, but this feature will enable them to have lower runtime overhead.
    Support for 128-bit SIMD. SIMD will be exposed to C/C++ though explicit APIs such as LLVM's vector extensions and GCC's vector extensions, auto-vectorization, and emulated APIs from other platforms such as <xmmintrin.h>.

""

The implementation

Platform differences

ways in which WebAssembly? might behave differently on different platforms ('nondeterminism'):

" Applications can't access data outside the sandbox without going through appropriate APIs, or otherwise escape the sandbox. WebAssembly? always maintains valid, trusted callstacks; stray pointer writes cannot corrupt return addresses or spilled variables on the stack. Calls and branches always have valid destinations ensuring Control Flow Integrity. WebAssembly? has no (undefined behavior)

Ideally, WebAssembly? would be fully deterministic (except where nondeterminism was essential to the API, like random number generators, date/time functions or input events). Nondeterminism is only specified as a compromise when there is no other practical way to achieve portable native performance.

The following is a list of the places where the WebAssembly? specification currently admits nondeterminism:

    When threads are added as a feature, even without shared memory, nondeterminism will be visible through the global sequence of API calls. With shared memory, the result of load operations is nondeterministic.
    Out of bounds heap accesses may want some flexibility ("The ideal semantics is for out-of-bounds accesses to trap. A module may optionally define that "out of bounds" includes low-memory accesses" or, "Loads return an unspecified value and "Stores are either ignored or store to an unspecified location in the heap", and/or "Either tooling or an explicit opt-in "debug mode" in the spec should allow execution of a module in a mode that threw exceptions on out-of-bounds access."  -- https://github.com/WebAssembly/design/blob/master/AstSemantics.md#out-of-bounds)
    NaN bit patterns
    Fixed-width SIMD may want some flexibility
        In SIMD.js, floating point values may or may not have subnormals flushed to zero.
        In SIMD.js, operations ending in "Approximation" return approximations that may vary between platforms.
    Environment-dependent resource limits may be exhausted. A few examples:
        Memory allocation may fail.
        Program stack may get exhausted.
        Resources such as handles may get exhausted." -- https://github.com/WebAssembly/design/blob/master/Nondeterminism.md

in addition, other differences arising from polyfills:

" Polyfill Deviations

An efficient polyfill may purposely diverge from the specified WebAssembly? semantics: a polyfill doesn't need to be 100% correct with respect to the WebAssembly? specification to be useful in practice. There are corner cases (often undefined behavior in C/C++) where JavaScript? and asm.js don't have ideal semantics to maintain correctness efficiently.

If needed, a polyfill could provide an option to ensure full correctness at the expense of performance, though this is not expected to be necessary for portable C/C++ code.

Some divergences that we've identified as potentially desirable:

    Misaligned heap access: Since misaligned loads/stores are guaranteed to produce correct results and heap accesses in asm.js force alignment (e.g., HEAP32[i>>2] masks off the low two bits), an asm.js polyfill would need to translate all loads/stores into byte accesses (regardless of specified alignment) to be correct. To achieve competitive performance, the polyfill prototype defaults to incorrect behavior by emitting full-size accesses as if the index was never misaligned. Providing correct alignment information is important for portable WebAssembly performance in general; that information also guarantees that the polyfill is both correct and fast.
    Out of bounds heap access: Regardless of semantics chosen for out of bounds access in WebAssembly, an asm.js polyfill will follow standard asm.js behavior:
        Out of bound stores are ignored (treated as no-op);
        Out of bound loads return zero for integer loads or NaN for floating point.
    32-bit integer operations: Regardless of WebAssembly behavior, an asm.js polyfill will follow its standard behavior:
        Division by zero returns zero;
        INT32_MIN / -1 returns INT32_MIN;
        Shift counts are implicitly masked.
    Datatype conversions: Regardless of WebAssembly behavior, an asm.js polyfill will follow its standard behavior:
        Return zero when conversion from floating point to integer fails;
        Optionally canonicalize NaN values.

" -- https://github.com/WebAssembly/design/blob/master/Polyfill.md

Target platforms

" Assumptions for Efficient Execution

Execution environments which, despite limited, local, nondeterminism, don't offer the following characteristics may be able to execute WebAssembly? modules nonetheless. In some cases they may have to emulate behavior that the host hardware or operating system don't offer so that WebAssembly? modules execute as-if the behavior were supported. This sometimes will lead to poor performance.

As WebAssembly?'s standardization goes forward we expect to formalize these requirements, and how WebAssembly? will adapt to new platforms that didn't necessarily exist when WebAssembly? was first designed.

WebAssembly? portability assumes that execution environments offer the following characteristics:

    8-bit bytes.
    Addressable at a byte memory granularity.
    Support unaligned memory accesses or reliable trapping that allows software emulation thereof.
    Two's complement signed integers in 32 bits and optionally 64 bits.
    IEEE-754 32-bit and 64-bit floating point, except for a few exceptions.
    Little-endian byte ordering.
    Memory regions which can be efficiently addressed with 32-bit pointers or indices.
    Heaps bigger than 4GiB with 64-bit addressing may be added later, though it will be done under a feature test so it won't be required for all WebAssembly implementations.
    Enforce secure isolation between WebAssembly modules and other modules or processes executing on the same machine.
    An execution environment which offers forward progress guarantees to all threads of execution (even when executing in a non-parallel manner).

" -- https://github.com/WebAssembly/design/blob/master/Portability.md

" WebAssembly? has a pretty conventional ISA: 8-bit bytes, two’s complement integers, little-endian, and a lot of other normal properties. Reasonably portable C/C++ code should port to WebAssembly? without difficultly.

WebAssembly? has 32-bit and 64-bit architecture variants, called wasm32 and wasm64. wasm32 has an ILP32 data model, meaning that int, long, and pointer types are all 32-bit, while the long long type is 64-bit. wasm64 has an LP64 data model, meaning that long and pointer types will be 64-bit, while int is 32-bit. " -- [1]

Implementation details

on the binary encoding: https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md

Older proposals

from an old version of https://web.archive.org/web/20151220201908/github.com/WebAssembly/design/blob/master/AstSemantics.md :

" The Abstract Syntax Tree (AST) has a basic division between statements and expressions. Expressions are typed; validation consists of simple, bottom-up, O(1) type checking.

...

Each function body consists of exactly one statement.

...

Some operations may trap under some conditions, as noted below. "

Memory types:

control flow:

variable access:

function calling:

arithmetic:

Variants and implementations

Textual syntaxes on top of WebAssembly?:

Implementations:

Correctness:

Webassembly Links


asm.js

using asm.js, emscripten supports LLVM -> JS

types: singed, fixnum (subtype of 'signed'; integers in the range [0, 2^31)), double, "arbitrary JavaScript? values that may flow freely between asm.js code and external JavaScript? code. " ('extern' omitted b/c abstract) other value types: void, unsigned, int, float, float?, floatish (float? and floatish are supertypes of float), double? (supertype of double), intish (supertype of int)

unary operators:

binary operators:

, &, ^, 1, >>>

stdlib:


PNaCL

note: PNaCL? is now (mostly?) deprecated in favor ow WebAssembly?: https://blog.chromium.org/2017/05/goodbye-pnacl-hello-webassembly.html

PNaCl? is (mostly) a subset of LLVM that Google wants to use as a portable low-level browser 'assembly language'. As a (mostly) subset, yet which is still complete enough to write programs, i figure it might be useful as a description of a 'simpler' LLVM, esp. for someone first learning LLVM (like me).

spec: https://developer.chrome.com/native-client/reference/pnacl-bitcode-abi

below, we summarize many of PNaCl?'s restrictions on LLVM; a lot of the following text is quoted from the PNaCl? spec (even though links are to the LLVm reference):

instructions:

intrinsics:

NaCl? intrinsics:

types:

LLVM Links

AssemblyScript

https://github.com/AssemblyScript/assemblyscript

"a subset of TypeScript? that compiles to WebAssembly?"

" Instead of reimplementing TypeScript? as closely as possible at the expense of performance, AssemblyScript? tries to support its features as closely as reasonable while not supporting certain dynamic constructs intentionally:

null representing a nullable), any and undefined are not supported by design
"
expressions is always bool

Footnotes:

1. ,