Cross-platform concordance of instructions from seven intermediate- or assembly- language architectures: RISC-V, WASM, LLVM, ARM Cortex M0, JVM, LuaJIT2, CLI

Introduction

This document surveys the instruction sets of seven intermediate-language, virtual machine, or assembly-language platforms: RISC-V, WASM, ARM Cortex M0, LLVM, JVM, LuaJIT?2, CIL. It groups instructions together by purpose, referencing their analogs in each of those systems (hence the word 'concordance'). In this way, it shows which instructions in one platform are the correspond to which instructions in other platforms.

The aim is to provide insight into which sorts of instruction are 'popular' (common to many platforms), and what the most 'popular' choices for the semantics of those instructions are.

The organization of this document is:

introduction to each of the platforms
concordance of instructions, grouped into sections in decreasing order of popularity (so we start with those instruction classes common to all 7 platforms, and end with instructions found on only 1 platform). Some of the most popular instructions have some details provided on their semantics in these sections (not in the full concordance below).
full concordance of instructions (all instructions on all platforms, grouped by purpose)
for each platform, a concise listing of all instructions on that platform
discussion

Introduction to the platforms

Introduction to RISC-V

RISC-V is an instruction set architecture (ISAs) for hardware microprocessors [1].

As a general-purpose hardware microprocessor ISA, RISC-V is intended to efficiently execute any high-level programming language.

RISC-V is a register machine with a set of integer register with a fixed bitwidth (in this document we'll look at the 64-bit variant of RISC-V), and also another set of floating point registers (if the floating point extension is being used).

Introduction to WASM

WASM is a virtual machine designed to enable high performance applications on the Web [2]. It is designed to support many different high-level-languages.

WASM is considered to be higher-level than RISC-V and ARM, but lower-level than JVM, CIL, and LuaJIT?2. WASM is a sandbox that allows untrusted code to be executed without giving that code arbitrary access to the host machine.

WASM is a stack machine.

WASM has loads and stores for working with memory, but it also has local and global variables.

Unusually for a virtual machine that seeks to be a low-level target platform for many different high-level-languages, WASM has some block-structured control flow, and corresponding restrictions on jump and branch targets.

Introduction to LLVM

LLVM is considered to be higher-level than RISC-V and ARM, but lower-level than JVM, CIL, and LuaJIT?2.

Although LLVM was always intended to be a low-level target platform for many different high-level-languages, it was first used to support the compilation of C and C++.

LLVM code is structured into basic blocks. A basic block is "a straight-line code sequence with no branches in except to the entry and no branches out except at the exit." [3]. Some of the LLVM instructions (for example, branch or return) are called 'terminator instructions', and each LLVM basic block must end with one of these 'terminator instructions'.

Instead of registers (or a stack), LLVM has an infinite set of variables. These variables must be accessed in SSA (single-static-assignment) form, which means that you cannot reassign to the same variable twice.

Introduction to ARM Cortex

ARM is a family of hardware microprocessors. We will look at the instruction set architecture (ISAs) for the ARM Cortex M0, which is the ARM processor with the smallest instruction set.

As a general-purpose hardware microprocessor ISA, ARM is intended to efficiently execute any high-level programming language.

ARM Cortex M0 has 16 numbered 32-bit registers, of which 13 are GPRs and 3 are special (PC, link register, stack pointer). There are three other misc. registers (Program Status Register (PSR), PRIMASK (exception mask), CONTROL (stack control)). [4]

Introduction to JVM

Although the JVM is now host to multiple high-level languages, it was originally designed to run Java.

The JVM is considered a 'higher-level' virtual machine; it provides high-level-language data structures. The JVM is a sandbox that allows untrusted code to be executed without giving that code arbitrary access to the host machine.

The JVM is a stack machine.

Introduction to LuaJIT2

LuaJIT?2 is considered a 'higher-level' virtual machine.

LuaJIT?2 is a register machine.

Note: It was a difficult choice whether to analyze the Lua 5.1 bytecode instruction set or the LuaJIT? 2 bytecode instruction set. The Lua 5.1 bytecode instruction set is smaller and better documented (see the excellent http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf ). However, as the LuaJIT? 2 instruction set is optimized and had the opportunity to learn from the Lua 5.1 bytecode, I was interested in seeing what choices were made with regards to things like: which instructions should have immediate constant forms? which branching conditionals should exist?

The LuaJIT? 2 instruction set is clearly based on the Lua 5.1 instruction set, and seems to be for the most part an elaboration of it, with various additional specialization instructions added, various immediate constants added, some branching conditionals added, various 'marker' instructions to help with the JIT process added, etc.

Introduction to CLI

The CLI is considered a 'higher-level' virtual machine. It provides high-level-language data structures and enables interoperation between high-level languages. The CLI provides facilities to allow untrusted code to be executed without giving that code arbitrary access to the host machine.

The CLI is a stack machine.

The instruction set of the CLI is called the CIL (Common Intermediate Language).

Minutia

What is an instruction?

Generally in this document we consider the dividing line between instructions to be separate mnemonics.

Sometimes if a platform (e.g. LLVM) lists a group of related instructions together in its documentation, then we consider that whole group as one instruction (for instance some LLVM intrinsics have a type name in their mnemonic, but in the LLVM documentation, rather than having a separate section for each type variant, they are all listed together with an asterisk representing the type name).

RISC-V conventions

RISC-V variants

RISC-V has immediate variants. WASM doesn't have immediate variants (todo: is this correct?).

RISC-V is presented as a base integer instruction set (either RV32I (with 32-bit registers), RV32E (slimmed down embedded variant of RV32I), or RV64I (with 64-bit registers)), on top of which may be layered various extensions (such as M for multiplication, A for atomics, F for single-precision floating point, and D for double-precision floating point).

We will be assuming a processor with RV64IMAFD (also called RV64G), that is, a RISC-V processor (RV) with 64-bit integer registers (RV64I) with 64-bit multiply, atomic, single-precision floating point, and double-precision floating point extensions (IMAFD; the IMAFD selection of extensions can be abbreviated as "G" because IMAFD represents the "standard general-purpose ISA", according to the RISC-V spec).

We will be mainly discussing RV64I, in which the registers are 64-bits, but RISC-V also has a 32-bit variant (RV32I) in which the registers are all 32-bits instead of 64. Many operations are only available in 64-bit form in RV64I, but since RV32I would provide them in 32-bit form, we will usually mark these as supporting both 32- and 64-bits, for easier comparison with instruction sets like WASM which provide explicit 32- and 64- bit forms throughout.

RISC-V floating point operations are only included in the F (single) and D (double) extensions, not the base integer instruction set.

RISC-V typing

The RISC-V instructions are untyped and operate on memory and registers, so since i want to give types for everything (to facilitate comparison), here is the convention that i will use. The types indicated in the following will be i64 by default (64 because the RV64I registers are 64-bits). However, if the operation is clearly intended to work with unsigned quantities, the type shown will be 'u' instead of 'i'; and if the operation is clearly intended to work with less than 64 bits, 32, 16, or 8 will be shown instead of 64. TODO enforce this convention.

ARM conventions

Below, we'll say 'ARM' although we only mean ARM Cortex M0.

Polymorphic

Sometimes i write '(polymorphic)' next to an instruction. I usually only do that when the same instruction will be appearing multiple times in the same section. There are other instructions which are also polymorphic which do not get this marking.

Warnings, excuses, and qualifications

This document is probably full of errors. You would probably want this sort of comparison to be written by someone who has written code in each of the platform instruction sets being compared, or who at least has carefully read the documentation and specifications. Instead, what you are reading was written by someone who has merely glanced through the documentation, and who in many cases is making assumptions based only on the names of the instruction, without reading their definitions or using them in code!

This is, to some extent, an apples-to-oranges comparison. The different platforms being compared have different purposes and are different kinds of things.

These particular instruction sets were chosen for comparison because they possess the following properties:

They are general purpose
They have a relatively small number of instructions
They were developed or reimagined relatively recently (so presumably their designers learned from the lessons of the past) (the JVM is the exception here; it was introduced in 1994)
They are relatively widely-used (or at least much talked-about) (LuaJIT?2 is the exception here, but when people do talk about it, it seems to me to be widely regarded as well-designed)
Their description is free and open, and i find it easy to read

There is necessarily some subjective preference involved in grouping similar functionality together. In this document the choice of grouping affects how widely functionality is considered to be shared, and hence how 'popular' it is considered to be.

For example, LLVM 'constrained' arithmetic is somewhat similar to RISC-V arithmetic in that both offer a way of choosing rounding and exception modes; however LLVM offers a constrained floating-point remainder function whereas RISC-V does not offer a floating-point remainder function. Does that mean that LLVM's constrained.frem intrinsic should be counted as something supported by only one platform? I have instead chosen to only consider the larger grouping 'floating-point arithmetic with rounding/exception mode control', which is listed as something supported by two platforms, with a note that RISC-V does not offer a floating-point remainder instruction.

For example, i have chosen to group ARM and CIL's bitwise NOT with LuaJIT?'s boolean NOT.

For example, i have chosen to group RISC-V's 'classify' operation with CIL's 'ckfinite', even though 'classify' provides much more functionality.

For example, i have chosen to group RISC-V's load-release/store-conditional with LLVM's compare-and-swap, because even though these are very different operations, they can each be used as a primitive upon which to build synchronization/consensus/atomicity.

For example, i have chosen to break out 'constant loads from a constant pool/table' as separate functionality, but then also consider 'constant loads' as something supported by all 7 platforms (mixing together immediate constants and constant pools), even though there are many differences in the details (particularly bitwidths).

For example, i have chosen to group ARM's 'reverse subtraction' (rsbs) with integer negation.

For example, i have chosen to group together a wide variety of fence, synchronization, and volatile instructions/prefixes.

For example, i have chosen to group together a wide variety of memory-allocation-related instructions.

Sometimes i have chosen to list the same instruction in multiple places, causing it to be 'double-counted' in a sense. For example, RISC-V's unconditional jumps are listed once in the 'instructions supported by two platforms' in a section noting the link register functionality, but listed again later for their unconditional jump functionality.

References

Instruction lists were collected from these references in May 2019.

Short excerpts of text describing the semantics of instructions are taken, without quoting, from these references.

RISC-V:

https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf

Web Assembly (WASM):

LLVM:

https://llvm.org/docs/LangRef.html

ARM Cortex M0:

JVM:

https://en.wikipedia.org/wiki/Java_bytecode_instruction_listings
http://www.cs.vsb.cz/benes/vyuka/pre/lab/jvm/ (not sure if this is authoritative but it's informative)

CIL:

LuaJIT?2:

http://wiki.luajit.org/Bytecode-2.0
and, although this discusses a different (but related and better documented) instruction set, for context you may wish to read: http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf .

Concordance, divided into sections by popularity

To follow. Full concordance is at the end.

Concordance of instructions supported by all seven platforms in this study

Every platform in this study has:

some mechanism for specifying constants/literals
addition, subtraction, multiplication
instructions for a jump/unconditional branch to an immediate or label
some form of unconditional indirect branching
some form of branch-if-not-equal-to-zero/branch-if-true

Arithmetic

Constant loads

Every platform in this study has some mechanism for specifying constants/literals, however, in terms of INSTRUCTIONS, LLVM does not have a constant load instruction because it doesn't need one; constants can be assigned to a variable in the LLVM IR without an instruction. For this reason, constant load instructions are listed below in the section on instruction classes supported by six platforms.

Add, subtract, multiply

Every platform in this study has instrutions for addition, subtraction, and multiplication. However, one platform (ARM) is integer-only, whereas another platform (LuaJIT?2) is floating-point only. For this reason, these instructions are listed below in the section on instruction classes supported by six platforms.

Control flow

Jumps / unconditional branch

Jump to immediate / direct branch

Every platform in this study provides unconditional jumps/direct branches to an immediate or a label.

RISC-V and ARM provide link register variants of their indirect branch instructions.

JAL (Jump and Link) (32b) (RISC-V RV32I)
BR (WASM)
BR (unconditional form) (LLVM)
B BL (ARM)
goto goto_w (JVM)
jsr jsr_w (JVM) (deprecated)
jmp (LuaJIT?2)
br br.s (CIL)
jmp (CIL)

NOTE: when i say that a jump is 'PC-relative', often it is actually relative to the instruction following the jump instruction, rather than to the jump instruction itself. Sometimes i provide that level of detail, but often i omit it.

in RISC-V, the RISC-V JAL instruction is a PC-relative jump with a 20-bit signed immediate. The units of the immediate are multiplies of 2 bytes, so this means that RISC-V's JAL can encode a jump offset of +-1MiB?. However, jumps to anywhere in a 32-bit absolute address range are available using fixed two-instruction sequences involving a 20-bit immediate constant load followed by an indirect jump (JALR), which contains a base register and a 12-bit displacement. RISC-V JAL writes the address following the jump instruction into a destination register.

In WASM, i'm not quite sure that i understand the WASM BR instruction but i think its immediate argument is a label index, which do NOT "reference program positions in the instruction stream but instead reference outer control constructs by relative nesting depth" [5]. Recall that WASM only has structured control flow. In other words, a WASM BR can only target the end statements of enclosing blocks, and the immediate argument is how many enclosing blocks to break out of. The immediate argument is 32 bits.

In LLVM, the immediate of the BR instruction is of type 'label'. I couldn't find any specification on how many labels are allowed in a program, and LLVM's bitcode representation has variable-width integers, so presumably these labels can be represented by these variable length integers and there is no limit to how many there can be, although i didn't look into it too closely.

In ARM Cortex M0, the branch-with-link variant of the B instruction has a range of +-~16MiB?, relative to the PC register [6].

In the JVM, the goto instruction has a PC-relative signed range of 16 bits (+-~32KiB?), but must be confined to the same method. goto_w has a PC-relative signed range of 32 bits (+-~2GiB?), HOWEVER currently the JVM has a limit on method size of 64KiB? bytes, and goto_w also has the restriction that it must not cross methods, so effectively the limit of goto_w is +-~64KiB? [7].

In LuaJIT?2, the JMP instruction has a PC-relative signed range of 16 bits (+-~32KiB?) [8] (interestingly, this seems to have been 18-bits in the original Lua that LuaJIT?2 is based upon [9]).

In CIL, the BR instruction has a PC-relative signed range of 32 bits (+-~2MiB?), and the BR.S instruction has a PC-relative signed range of 8 bits (+-~128). BR instructions may not jump into or out of exception-handling blocks ("try, catch, filter, and finally blocks").

CIL's JMP is a higher-level unconditional direct jump which exits the current method and jumps to another method with the same calling convention, number and type of arguments as the current method.

Unconditional indirect branches

Every platform in this study offers some form of unconditional indirect branching. However, LuaJIT?2 only offers a higher-level form of indirect branching via its function CALL instructions. For this reason, low-level unconditional indirect branch instructions are listed below in the section on instruction classes supported by six platforms.

Conditional branches

Every platform in this study offers some form of branch-if-true, or branch-if-not-equal-to-zero, or branch-if-non-null. However, RISC-V does not need a separate instruction for this; it can use the 'zero register', which is a 'register' which always holds the value constant zero, as an operand to its BNE (branch-if-not-equal) instruction. For this reason, branch-if-not-equal-to-zero instructions are listed below in the section on instruction classes supported by six platforms.

Comparisons

Above, we discussed conditional branches. However, some platforms offer few compare-and-branch instructions, but also offer non-control-flow-altering compare instructions. In this section, we consider both compare and compare-and-branch instructions, and look at which types of comparisons are shared by all platforms, whether that comparison is offered as a (non-control-flow-altering) compare, or as a compare-and-branch.

Every platform in this study offers both of the following comparisons on integers if they support integers, and on floats if they support floats:

equality comparison
less-than comparison

Concordance of instructions supported by six platforms

In addition to the above, instructions for each of the following is provided by all but one platform in this study:

Arithmetic:

both 32-bit and 64-bit arithmetic
loading constants/literals
signed integer addition, subtraction, multiplication
floating-point addition, subtraction, multiplication, division
bitwise shifts: left shift, right shift signed (shift right arithmetic), right shift unsigned (shift right logical)
bitwise logical: and, or, xor

Control flow:

unconditional indirect branch
branch: branch-if-not-equal-to-zero/branch-if-true/branch-if-non-null
NOP (note: only an intrinsic in LLVM)

Arithmetic

Constant loads

WASM, JVM, CIL have instructions to directly load i32, i64, f32, f64 constants (but not unsigned?). RISC-V has various instructions that can be specialized to directly load integer constants the size of the registers (i64 if RV64I, or i32 if the chip is RV32I instead of RV64I); everything else must be synthesized/coerced.

Lua has f64 constants only (in addition to nils, booleans, strings), but LuaJIT?2 does have an instruction to load 16-bit immediate constants.

JVM does provide immediate constants of 8- and 16-bits (extended to ints of 32 bits) via bipush and sipush.

ARM has up to 32-bit constant loads via immediate mode of some load instructions.

LLVM doesn't need constant loads because constants can be assigned to a variable in the LLVM IR without an instruction.

(pseudoinstruction using ADDI or ORI) (i64, i32) (RISC-V RV32I/RV64I)
i32.const, i64.const, f32.const, f64.const (WASM)
MOV/MVN or LDR (ARM)
bipush sipush (JVM)
kdata kshort knum kpri knil (LuaJIT?2)
ldc.i4 ldc.i4.s ldc.i8 ldc.r4 ldc.r8 (CIL)

In RISC-V, ADDI and ORI can be used with the zero register to load 12-bit immediates. LUI loads a 20-bit immediate in the most-significant bits and zeros the other 12 bits. AUIPC is similar to LUI but then it also adds this offset to the PC.

In WASM, there are instructions to push integer and floating point 32-bit and 64-bit constants onto the stack.

In ARM, single-byte can be loaded using the MOVS instruction with an immediate constant. Otherwise, the constant must be placed in a literal pool within the instruction stream, and loaded from that literal pool using LDR (load) with a PC-relative offset. Literal pools must be manually placed using the LTORG assembler directive [10]), and must be within +1020 bytes of the current instruction [11]. The ARM assembler pseudo-instruction "LDR Rd, =const" chooses between MOV and LDR and places the constant into the literal pool as needed (an error is generated if a literal pool is required but none can be found near enough).

In JVM, there is a constant pool and instructions ldc, ldc_2, ldc2_w to load from it. There are also short instructions iconst_m1..iconst_5 to push the values -1..5 onto the stack as integers, as well aconst_null to push the null reference, dconst_0 and dconst_1 to push 0.0 and 1.0 doubles, fconst_0..fconst_2 to push 0.0, 1.0, 2.0 floats, lconst_0, lconst_1 to push 0, 1 longs. There are also bipush and sipush instructions for pushing 8-bit and 16-bit integers onto the stack.

In LuaJIT?2, there is a constant pool, and instructions KSTR, KCDATA, KNUM, KPRI to load from it. KPRI can load nil, false, true. There is also a KSHORT instruction to load a 16-bit signed immediate integer.

In CIL, there are the short instruction ldc.i4.m1..ldc.i4.8 to push the 32-bit integers -1..8 onto the stack. There are also ldc.i4.s to push an immediate byte onto the stack as a 32-bit integer, and ldc.i4, ldc.i8, ldc.r4, ldc.r8 to push immediate constant integers and floats, both 32-bit and 64-bit. There is also ldnull to push the null reference, and ldstr to push an immediate constant string (it's not clear to me if this is actually stored in the instruction stream, or if it's in a constant pool, although it seems to me that the spec says that the constant pool is not available at runtime).

Add, subtract, multiply, divide

Every platform in this study provides integer operations, except for LuaJIT?2, which only provides floating point. Every platform in this study provides floating-point operations, except for ARM, which only provides integers.

Every integer platform in this study provides both 32-bit and 64-bit integers, except for ARM, which only provides 32-bit integers, and RISC-V, which only provides integers of its register size (either 32-bit or 64-bit).

Every floating-point platform in this study provides both 32-bit and 64-bit floating-point, except for LuaJIT?2, which only provides 64-bit floating-point.

Every integer platform in this study provides signed integer addition, subtraction, and multiplication (this multiplication returns the lower half of the resulting bits; the "low order bits"; equivalently, the result mod 2^bitwidth). Every floating-point platform in this study provides floating-point addition, subtraction, multiplication, and division.

Andreas Olofsson of Adapteva noted in a blog post that RISC-V's FDIV (floating point division) instruction is "expensive" and that it was a "tough call" whether to include such an instruction in his Epiphany ISA [12]. In the same blog post, he noted that Epiphany did not have integer division or remainder because they didn't fit Epiphany's intended use cases.

RISC-V floating point operations are only included in the floating point extensions, not the base integer instruction set.

Integer addition

ADD, ADDI, ADDW, ADDIW (i64, i64 immediate, i32, i32 immediate) (RISC-V RV32I/RV64I)
i32.add, i64.add (WASM)
add (LLVM)
add adds adcs cmn (ARM)
iadd ladd (JVM)
add (polymorphic) (CIL)

In RISC-V (64-bit), ADD is 64-bit addition. ADDI adds a 12-bit sign-extended immediate. ADDW and ADDIW are 32-bit variants.

In WASM, i32.add and i64.add are 32-bit and 64-bit integer addition.

In LLVM, add is polymorphic integer add.

In ARM, ADD adds two 32-bit numbers. The ADCS variant also adds one if the carry flag is set. The ADDS and ADCS variants update the flags (and so indicate overflow). The CMN variant is like ADDS except the result is discarded (but the flags are still updated). For some of the variants, the destination register must be the same as one of the source registers. For some of the variants, an immediate constant may be added also; depending on the variant, this constant may range from 0..7 all the way to 1020. [13].

In JVM, iadd is 32-bit integer addition and ladd is 64-bit integer addition.

In CIL, add is polymorphic integer and float add.

All of these truncate the most-significant bits upon overflow. This has the effect of performing unsigned addition mod 2^N, where N is the bitwidth. All of these use twos-complement representation and use the same operation for unsigned and signed addition. This means that in case of overflow, the sign of the result is not the same as the sign of the true (mathematical) sum. Overflow is silent except on ARM, where the ADCS, ADDS, CMN instruction variants set the flags accordingly.

LuaJIT?2 (not listed) only provides floating-point addition.

Integer subtraction

Subtract:

SUB, SUBW (i64, i32) (RISC-V RV32I/RV64I)
i32.sub, i64.sub (WASM)
sub (LLVM)
sub subs cmp (ARM)
isub lsub (JVM)
sub (polymorphic) (CIL)

RISC-V does not offer a subtraction operation with an immediate, as it does for addition with ADDI.

ARM does not offer subtraction instructions with an immediate range of 1020, as it does for addition; the largest immediate offered is 508.

Otherwise, the comments on the integer addition operations in the previous section apply to these subtraction operations as well, particularly in that only the least-significant bits of the mathematical result are returned.

Integer multiplication

Multiply, lower bits:

MUL (i64, i32) (RISC-V M) destination register."
i32.mul, i64.mul (WASM)
mul (LLVM)
muls (ARM)
imul lmul (JVM)
mul (polymorphic) (CIL)

For all of these, if overflow occurs, the sign may or may not be correct. Neither RISC-V nor ARM offers an immediate-constant-accepting form of multiplication instruction. The only ARM Cortex M0 multiply instruction, MULS, sets flags on overflow. Otherwise, the comments on addition in the above section apply to these multiplication operations as well, particularly in that only the least-significant bits of the mathematical result are returned..

Floating point add, sub, mul, div

Add:

FADD.S, FADD.D (f32, f64) (RISC-V F)
f32_add, f64_add (WASM)
fadd (LLVM)
dadd fadd (JVM)
addvn addnv addvv (LuaJIT?2)
add (polymorphic) (CIL)

RISC-V supports multiple floating-point rounding modes and exception modes. RISC-V follows IEEE 754-2008. In RISC-V,

" Except when otherwise stated, if the result of a floating-point operation is NaN?, it is the canonical NaN?. The canonical NaN? has a positive sign and all significand bits clear except the MSB, a.k.a. the quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000. ... We considered propagating NaN? payloads, as is recommended by the standard, but this decision would have increased hardware cost. Moreover, since this feature is optional in the standard, it cannot be used in portable code. ... Operations on subnormal numbers are handled in accordance with the IEEE 754-2008 standard. In the parlance of the IEEE standard, tininess is detected after rounding. Detecting tininess after rounding results in fewer spurious underflow signals. " -- https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf

In LLVM, " The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment. Therefore, there is no attempt to create or preserve invalid operation (SNaN?) or division-by-zero exceptions.

The benefit of this exception-free assumption is that floating-point operations may be speculated freely without any other fast-math relaxations to the floating-point model.

Code that requires different behavior than this should use the Constrained Floating-Point Intrinsics. " -- https://llvm.org/docs/LangRef.html#floatenv

In WASM, " Floating-point arithmetic follows the IEEE 754-2008 standard, with the following qualifications:

    All operators use round-to-nearest ties-to-even, except where otherwise specified. Non-default directed rounding attributes are not supported.
    Following the recommendation that operators propagate NaN payloads from their operands is permitted but not required.
    All operators use “non-stop” mode, and floating-point exceptions are not otherwise observable. In particular, neither alternate floating-point exception handling attributes nor operators on status flags are supported. There is no observable difference between quiet and signalling NaNs.

...

Rounding

Rounding always is round-to-nearest ties-to-even, in correspondence with IEEE 754-2008 (Section 4.3.1).

...

NaN? Propagation

When the result of a floating-point operator other than fneg , fabs, or fcopysign

is a NaN?, then its sign is non-deterministic and the payload is computed as follows:

    If the payload of all NaN inputs to the operator is canonical (including the case that there are no NaN inputs), then the payload of the output is canonical as well.
    Otherwise the payload is picked non-determinsitically among all arithmetic NaNs; that is, its most significant bit is 1

and all others are unspecified. " -- https://webassembly.github.io/spec/core/exec/numerics.html (there are more details available at that link.

In WASM, furthermore, specific to addition,

" faddN(z1,z2)

If either z1 or z2 is a NaN?, then return an element of nansN{z1,z2}.
Else if both z1 and z2 are infinities of opposite signs, then return an element of nansN{}.
Else if both z1 and z2 are infinities of equal sign, then return that infinity.
Else if one of z1 or z2 is an infinity, then return that infinity.
Else if both z1 and z2 are zeroes of opposite sign, then return positive zero.
Else if both z1 and z2 are zeroes of equal sign, then return that zero.
Else if one of z1 or z2 is a zero, then return the other operand.
Else if both z1 and z2 are values with the same magnitude but opposite signs, then return positive zero.
Else return the result of adding z1 and z2, rounded to the nearest representable value. "

ARM Cortex M0 does not support floating-point arithmetic.

In JVM, " The result of an fadd instruction is governed by the rules of IEEE arithmetic:

If either value is NaN?, the result is NaN?.
The sum of two infinities of opposite sign is NaN?.
The sum of two infinities of the same sign is the infinity of that sign.
The sum of an infinity and any finite value is equal to the infinity.
The sum of two zeroes of opposite sign is positive zero.
The sum of two zeroes of the same sign is the zero of that sign.
The sum of a zero and a nonzero finite value is equal to the nonzero value.
The sum of two nonzero finite values of the same magnitude and opposite sign is positive zero.
In the remaining cases, where neither an infinity, nor a zero, nor NaN? is involved, and the values have the same sign or have different magnitudes, the sum is computed and rounded to the nearest representable value using IEEE 754 round-to-nearest mode. If the magnitude is too large to represent as a float, we say the operation overflows; the result is then an infinity of appropriate sign. If the magnitude is too small to represent as a float, we say the operation underflows; the result is then a zero of appropriate sign.

The Java Virtual Machine requires support of gradual underflow as defined by IEEE 754. Despite the fact that overflow, underflow, or loss of precision may occur, execution of an fadd instruction never throws a runtime exception. " -- http://www.cs.vsb.cz/benes/vyuka/pre/lab/jvm/fadd.htm

I can't find documentation on what guarantees/semantics LuaJIT?2 or Lua provides for floating-point addition.

In CIL, floating-point overflow returns +inf or -inf. 0 * infinity = NaN?.

Sub:

FSUB.S, FSUB.D (f32, f64) (RISC-V F)
f32_sub, f64_sub (WASM)
fsub (LLVM)
dsub fsub (JVM)
subvn subnv subvv (LuaJIT?2)
sub (polymorphic) (CIL)

Regarding RISC-V, i have no further comments specific to subtraction (beyond the general floating-point arithmetic comments in the previous section on 'add').

In WASM, " fsubN(z1,z2)

If either z1 or z2 is a NaN?, then return an element of nansN{z1,z2}.
Else if both z1 and z2 are infinities of equal signs, then return an element of nansN{}.
Else if both z1 and z2 are infinities of opposite sign, then return z1.
Else if z1 is an infinity, then return that infinity.
Else if z2 is an infinity, then return that infinity negated.
Else if both z1 and z2 are zeroes of equal sign, then return positive zero.
Else if both z1 and z2 are zeroes of opposite sign, then return z1.
Else if z2 is a zero, then return z1.
Else if z1 is a zero, then return z2 negated.
Else if both z1 and z2 are the same value, then return positive zero.
Else return the result of subtracting z2 from z1, rounded to the nearest representable value.

...

Note

Up to the non-determinism regarding NaNs?, it always holds that fsubN(z1,z2)=faddN(z1,fnegN(z2)) " -- https://webassembly.github.io/spec/core/exec/numerics.html

Regarding LLVM, i have no further comments specific to subtraction (beyond the general floating-point arithmetic comments in the previous section on 'add').

For JVM, " For float subtraction, it is always the case that a-b produces the same result as a+(-b). However, for the fsub instruction, subtraction from zero is not the same as negation, because if x is +0.0, then 0.0-x equals +0.0, but -x equals -0.0.

The Java Virtual Machine requires support of gradual underflow as defined by IEEE 754. Despite the fact that overflow, underflow, or loss of precision may occur, execution of an fsub instruction never throws a runtime exception. " -- [14]

How does the JVM's "0.0-+0.0 == +0.0 != -(0.0) == -0.0" relate to WASM's note that "Up to the non-determinism regarding NaNs?, it always holds that fsubN(z1,z2)=faddN(z1,fnegN(z2))"? I think that the WASM note is misleading, and the JVM note is clearer; i think that WASM's specification of fsub contains "...Else if both z1 and z2 are zeroes of equal sign, then return positive zero. Else if both z1 and z2 are zeroes of opposite sign, then return z1...", which will make 0.0-+0.0 == +0.0 in WASM also, like the JVM. And also like the JVM, WASM's fneg specification contains "...Else if z is a zero, then return that zero negated...", which will make -(0.0) == -0.0. On the other hand, although +0.0 is different from -0.0, they are equivalent in the sense that they compare as equal under the IEEE floating-point equality relation.

Regarding LuaJIT?2, as with addition, I have no data on the semantics of LuaJIT?2's subtraction.

In CIL, on sub, floating-point underflow returns 0.

Mul:

FMUL.S, FMUL.D (f32, f64) (RISC-V F)
f32_mul, f64_mul (WASM)
fmul (LLVM)
dmul fmul (JVM)
mulvn mulnv mulvv (LuaJIT?2)
mul (polymorphic) (CIL)

Regarding RISC-V, i have no further comments specific to multiplication (beyond the general floating-point arithmetic comments in the previous section on 'add').

In WASM, " fmulN(z1,z2)

If either z1 or z2 is a NaN?, then return an element of nansN{z1,z2}.
Else if one of z1 and z2 is a zero and the other an infinity, then return an element of nansN{}.
Else if both z1 and z2 are infinities of equal sign, then return positive infinity.
Else if both z1 and z2 are infinities of opposite sign, then return negative infinity.
Else if one of z1 or z2 is an infinity and the other a value with equal sign, then return positive infinity.
Else if one of z1 or z2 is an infinity and the other a value with opposite sign, then return negative infinity.
Else if both z1 and z2 are zeroes of equal sign, then return positive zero.
Else if both z1 and z2 are zeroes of opposite sign, then return negative zero.
Else return the result of multiplying z1 and z2, rounded to the nearest representable value. " -- https://webassembly.github.io/spec/core/exec/numerics.html

Regarding LLVM, i have no further comments specific to multiplication (beyond the general floating-point arithmetic comments in the previous section on 'add').

For JVM,

" The result of an fmul instruction is governed by the rules of IEEE arithmetic:

    If either value is NaN, the result is NaN.
    If neither value is NaN, the sign of the result is positive if both values have the same sign, and negative if the values have different signs.
    Multiplication of an infinity by a zero results in NaN.
    Multiplication of an infinity by a finite value results in a signed infinity, with the sign-producing rule just given.
    In the remaining cases, where neither an infinity nor NaN is involved, the product is computed and rounded to the nearest representable value using IEEE 754 round-to-nearest mode. If the magnitude is too large to represent as a float, we say the operation overflows; the result is then an infinity of appropriate sign. If the magnitude is too small to represent as a float, we say the operation underflows; the result is then a zero of appropriate sign.

The Java Virtual Machine requires support of gradual underflow as defined by IEEE 754. Despite the fact that overflow, underflow, or loss of precision may occur, execution of an fmul instruction never throws a runtime exception. " -- http://www.cs.vsb.cz/benes/vyuka/pre/lab/jvm/fmul.htm

In CIL, Floating-point overflow returns +inf or -inf. 0 * infinity = NaN?.

Div:

FDIV.S, FDIV.D (f32, f64) (RISC-V F)
f32_div, f64_div (WASM)
fdiv (LLVM)
ddiv fdiv (JVM)
divvn divnv divvv (LuaJIT?2)
div (CIL)

Regarding RISC-V, i have no further comments specific to division (beyond the general floating-point arithmetic comments in the previous section on 'add').

In WASM, " fdivN(z1,z2)

If either z1 or z2 is a NaN?, then return an element of nansN{z1,z2} .
Else if both z1 and z2 are infinities, then return an element of nansN{}.
Else if both z1 and z2 are zeroes, then return an element of nansN{z1,z2}.
Else if z1 is an infinity and z2 a value with equal sign, then return positive infinity.
Else if z1 is an infinity and z2 a value with opposite sign, then return negative infinity.
Else if z2 is an infinity and z1 a value with equal sign, then return positive zero.
Else if z2 is an infinity and z1 a value with opposite sign, then return negative zero.
Else if z1 is a zero and z2 a value with equal sign, then return positive zero.
Else if z1 is a zero and z2 a value with opposite sign, then return negative zero.
Else if z2 is a zero and z1 a value with equal sign, then return positive infinity.
Else if z2 is a zero and z1 a value with opposite sign, then return negative infinity.
Else return the result of dividing z1 by z2, rounded to the nearest representable value. " -- https://webassembly.github.io/spec/core/exec/numerics.html

Regarding LLVM, i have no further comments specific to division (beyond the general floating-point arithmetic comments in the previous section on 'add').

In JVM, " The result of an fdiv instruction is governed by the rules of IEEE arithmetic:

    If either value is NaN, the result is NaN.
    If neither value is NaN, the sign of the result is positive if both values have the same sign, negative if the values have different signs.
    Division of an infinity by an infinity results in NaN.
    Division of an infinity by a finite value results in a signed infinity, with the sign-producing rule just given.
    Division of a finite value by an infinity results in a signed zero, with the sign-producing rule just given.
    Division of a zero by a zero results in NaN; division of zero by any other finite value results in a signed zero, with the sign-producing rule just given.
    Division of a nonzero finite value by a zero results in a signed infinity, with the sign-producing rule just given.
    In the remaining cases, where neither an infinity, nor a zero, nor NaN is involved, the quotient is computed and rounded to the nearest float using IEEE 754 round-to-nearest mode. If the magnitude is too large to represent as a float, we say the operation overflows; the result is then an infinity of appropriate sign. If the magnitude is too small to represent as a float, we say the operation underflows; the result is then a zero of appropriate sign.

The Java Virtual Machine requires support of gradual underflow as defined by IEEE 754. Despite the fact that overflow, underflow, division by zero, or loss of precision may occur, execution of an fdiv instruction never throws a runtime exception. " -- http://www.cs.vsb.cz/benes/vyuka/pre/lab/jvm/fdiv.htm

In CIL, " Floating-point division is per IEC 60559:1989. Division of a finite number by 0 produces the correctly signed infinite value and 0 / 0 = NaN?, infinity / infinity = NaN?, anything / infinity = 0. " -- [15]

Shifts

All of RISC-V, WASM, LLVM, ARM, CIL (everything in this study except for LuaJIT?2) provide 32-bit left shift, logical/unsigned right shift, and arithmetic/signed right shift. RISC-V, WASM, LLVM, CIL also provide 64-bit variants.

Shift left:

SLL, SLLI, SLLW, SLLIW (i64, i64 immediate, i32, i32 immediate) (RISC-V RV32I/RV64I)
i32_shl, i64_shl (WASM)
shl (LLVM)
lsls (ARM)
ishl lshl (JVM)
shl (CIL)

Shift right logical/shift right unsigned:

SRL, SRLI, SRLW, SRLIW (i64, i64 immediate, i32, i32 immediate) (RISC-V RV32I/RV64I)
i32.shr_u, i64.shr_u (WASM)
lshr (LLVM)
lsrs (ARM)
iushr lushr (JVM)
shr.un (CIL)

Shift right arithmetic/shift right signed:

SRA, SRAI, SRAW, SRAIW (i64, i64 immediate, i32, i32 immediate) (RISC-V RV32I/RV64I)
i32.shr_s, i64.shr_s, (WASM)
ashr (LLVM)
asrs (ARM)
ishr lshr (JVM)
shr (CIL)

RISC-V (64-bit version) provides both immediate shifts (where the shift amount is a constant but the thing to be shifted is still taken from a register) and register shifts (where both the shift amount and the thing to be shifted are in registers), and 32-bit and 64-bit variants of shifts. The shift amount is limited to the range 0-63 (inclusive) for the 64-bit variants, and 0-31 for the 32-bit variants. The 32-bit variants produce signed 32-bit results.

WASM provides 32-bit and 64-bit shifts, where both the thing to be shifted and the shift amount are taken off the stack. The shift amount is limited to 0-31 and 0-63 (inclusive) by taking (shift amount mod 32) or mod 64 as appropriate.

LLVM provides (polymorphic) shifts, where both the thing to be shifted and the shift amount are SSA variables. The shift amount is limited to the less than the number of bits in the thing to be shifted.

ARM provides both register and immediate 32-bit shifts. If the shift length is 32 or more, all bits are cleared. The carry flag is given the last bit shifted out.

JVM provides 32-bit and 64-bit shifts similarly to WASM.

LuaJIT?2 doesn't offer shifts.

CIL provides 32-bit and 64-bit (and also native integer) shifts similarly to WASM, except that the shift amount must be less than or equal to the bitwidth of the thing being shifted.

Logical

All of RISC-V, WASM, LLVM, ARM, JVM, CIL (everything in this study except for LuaJIT?2) provide bitwise AND, OR, and XOR. RISC-V provides one instruction which is 32- or 64- bits depending on the register size, where WASM and JVM provide separate instructions for each size. ARM only provides 32-bit.

Andreas Olofsson of Adapteva said on a blog that he felt that the immediate variants provided by RISC-V (ANDI, ORI, XORI) could have been left out (and were left out in his architecture, Epiphany) [16].

AND:

AND, ANDI (i64) (RISC-V RV32I/64I)
i32.and, i64.and (WASM)
and (LLVM)
ands (ARM)
tst (ARM) (same as ANDS but only updates flags; discards the result)
iand land (JVM)
and (CIL)

OR:

OR, ORI (i64) (RISC-V RV32I/64I)
i32.or, i64.or (WASM)
or (LLVM)
ors (ARM)
ior lor (JVM)
or (CIL)

XOR:

XOR, XORI (i64) (RISC-V RV32I/64I)
i32.xor, i64.xor (WASM)
xor (LLVM)
eor (ARM)
ixor lxor (JVM)
xor (CIL)

RISC-V has register and 12-bit sign-extended immediate variants of logical AND, OR, XOR -- unlike shifts, these are only offered in the register size (so 64-bit for the 64-bit variant of RISC-V). Note that XOR with an immediate constant of -1 is the NOT operation.

WASM has 32-bit and 64-bit variants of AND, OR, XOR.

LLVM has polymorphic AND, OR, XOR.

ARM has 32-bit AND, OR, XOR. On ARM, these operations mutate one of the two input registers, and also they may only be performed on the first 8 registers (r0 thru r7).

JVM has 32-bit and 64-bit AND, OR, XOR.

LuaJIT?2 lacks AND, OR, XOR (as instructions).

CIL has 32-bit and 64-bit (and 'native int') AND, OR, XOR.

Control flow

Jumps / unconditional branch

Branch to register / indirect branch

Every platform in this study except for LuaJIT? provides some form of indirect branch (where indirect means that the branch target is chosen at runtime rather than hardcoded as an immediate constant in the instruction stream).

Although LuaJIT?2 doesn't have an indirect branch instruction, it does have a higher-level form of indirect branch in its CALL instruction, which is indirect in that the function to be called is taken from a register, rather than specified as an immediate in the bytecode.

The two hardware processor ISAs in this study, RISC-V and ARM, provide what i call 'unconstrained' indirect branches, meaning that any valid location in the instruction stream (which can be expressed in the number of bits available in the branch instruction encoding) can be a branch target at runtime. By contrast, the other platforms with low-level indirect branches (WASM, LLVM, JVM, CIL) 'constrain' the branch target to be one of an enumerated set of potential targets determined statically.

JVM used to provide a higher-level unconstrained indirect branch in the form of 'ret' paired with 'jsr', but these JVM instructions are now deprecated, probably due to the difficulty it added to verification.

LLVM and JVM provide switch-like instructions (switch, indirectbr, lookupswitch); switch and lookupswitch are like a C switch statement, and indirectbr jumps to an address in a variable; JVM's tableswitch takes an index into a list of enumerated labels to be branched to.

RISC-V and ARM provide link register variants of their indirect branch instructions.

JALR (Jump & Link Register) (32b) (RISC-V RV32I): indirect branch (unconstrained)
BR_TABLE (WASM): indirect branch to one of an enumerated set of labels
switch (LLVM)
indirectbr (LLVM): indirect branch to one of an enumerated set of labels
BX BLX (ARM): indirect branch (unconstrained)
ret (JVM) (deprecated)
switch: lookupswitch tableswitch (JVM)
switch (CIL)

RISC-V has JALR. JALR adds a 12-bit signed immediate to the value in a user-specified register to get the destination address, writes the address of the next instruction after the JALR instruction to another user-specified register, and then jumps to the destination address. A two-instruction sequence of LUI+JALR can jump to any 32-bit absolute address. A two-instruction sequence of AUIPC+JALR can jump anywhere in a relative signed 32-bit range. When used with the zero register as base, JALR can jump to any absolute address in the lowest 2KiB? or the highest 2KiB?.

WASM has BR_TABLE. BR_TABLE is very unusual, and i'm not sure that i understand it. The i32 value at the top of stack is consumed and used as an index into the table. If the index is within the bounds of the table, then labelidx found at that index is chosen; otherwise a default labelidx is chosen. The labelidx is a 32-bit quantity. Then we do a "WASM BR" (see description of WASM's unconditional jump instruction BR, above) to that labelidx. The BR_TABLE also passes arguments in the jump; these arguments are given once for the entire BR_TABLE (the different entries within the table do not pass different labels [17]). So far, so good, why do i say it's unusual (although it's already a little unusual that values are passed, but those values cannot vary between the table entries)? Because WASM's BR cannot jump anywhere, it can only jump to the ends of enclosing blocks. The value of the labelidx refers to block nesting depth, NOT to positions in the instruction stream; see slide 'Table branch instruction' in [18]. Apparently this decision was made for weird historical reasons (see [19]). You actually can use this to choose to execute one of many branches of code, see [20].

LLVM has switch and indirectbr.

LLVM switch "specifies a table of values and destinations....switch.... uses three parameters: an integer comparison value 'value', a default 'label' destination, and an array of pairs of comparison value constants and ‘label’s. The table is not allowed to contain duplicate constant entries. When the ‘switch’ instruction is executed, this table is searched for the given value. If the value is found, control flow is transferred to the corresponding destination; otherwise, control flow is transferred to the default destination." -- https://llvm.org/docs/LangRef.html#switch-instruction

LLVM indirectbr "implements an indirect branch to a label within the current function, whose address is specified by “address”. Address must be derived from a blockaddress constant. The rest of the arguments indicate the full set of possible destinations that the address may point to...This destination list is required so that dataflow analysis has an accurate understanding of the CFG....Control transfers to the block specified in the address argument. All possible destination blocks must be listed in the label list, otherwise this instruction has undefined behavior. This implies that jumps to labels defined in other functions have undefined behavior as well....This is typically implemented with a jump through a register." -- https://llvm.org/docs/LangRef.html#indirectbr-instruction

(LLVM also support higher-level indirection with the CALL instruction, which can call a function pointer, and which supports tail calls (at least, on some platforms))

ARM's BX and BLX instructions jump to the address in a user-specified register. The BLX instruction additionally writes the address of the next instruction after the BLX instruction to the link register, which is a fixed/distinguished register (R14).

JVM has lookupswitch and tableswitch (it used to also have ret, but that is effectively deprecated).

JVM lookupswitch is followed by zero to three bytes of alignment padding (zeros). Next comes a 32-bit value 'default', which is a PC-relative signed offset. Next is a 32-bit signed integer 'npairs', which is the number of entries in the lookup table to follow. The table entries are sorted. The table entries are each a pair of two 32-bit signed integers (so each entry is 64-bits) of the form (key, target). A key to be searched is popped from the (operand) stack, and is compared against each of the keys in the table. If it matches one of them, a jump is made to the corresponding target (signed PC-relative offset); otherwise, a jump is made to the default offset. These jumps may not exit the current method.

JVM tableswitch is the same as lookupswitch except: The table consists only jump targets, not of (key, target) pairs. In place of 'npairs' to give the length of the table, there are two 32-bit signed values; 'low' and 'high'. The table length is high-low + 1. If the key on the stack is < low or > high (note: the range of valid keys is low-high), then a jump is made to 'default'. Otherwise, a jump is made to the (key - low)th entry in the table. Note that this is more efficient than lookupswitch as the table doesn't have to be searched for the key. Like lookupswitch, these jumps may not exit the current method.

(JVM also supports higher-level indirection with OOP method calls (using the suite of 'invoke*' instructions)).

(LuaJIT?2 does not have a low-level indirect jump instruction, however its CALL instructions are indirect; a reference to the function being called is in a register. Furthermore, the CALLT instruction is a tailcall. So LuaJIT?2 supports indirect control flow at a higher level.)

CIL has switch. CIL's switch is similar to JVM's tableswitch, but simpler (possibly at the expense of runtime performance, because it means the VM does less, so in some cases the user program would have to do more -- not sure how often that would occur though). There is a table length, which is an unsigned 32-bit integer N. Then there is a list of N 32-bit signed offsets (jump targets). The key is popped off of the stack and, if the key is less than N, a jump is made to the key-th signed offset in the table; otherwise, execution continues on without a jump (to the next instruction after the switch).

Conditional branches

LLVM, WASM, ARM, JVM, LuaJIT?2, CIL provide unary compare not-equal-to-zero or non-null or is-not-false.

Note that ARM's "compare and branches" really require two instructions, one to 'compare' and one to 'branch' based on the result of the compare, as seen by the state of the processor's flags.

WASM and LLVM instead provide separate comparison ops, and boolean conditional branch. These are similar to ARM in that there are two steps needed; but differ in that in ARM, the branch condition is expressed in the branch step, rather than in the compare step; and in ARM, the extra state of the flags is needed.

unary compare !=0 or nonnull or true (also, boolean conditional branch):

BR (conditional form) (LLVM)
BR_IF (WASM): "Executing the if instruction pops an i32 condition off the stack and either falls through to the next instruction or sets the program counter to after the else or end of the if."
(bne when used as unary compare) (ARM)
ifne ifnonnull (JVM)
ist istc (is truth-y) (LuaJIT?2)
brtrue brtrue.s brinst brinst.s (CIL)

RISC-V provides a similar facility without a dedicated instruction. The RISC-V branch instructions are BEQ, BNE, BLT, BGE, BLTU, BGEU. Since RISC-V has a zero register, BNE can be used to compare a second register to zero, and branch if it is not zero. BNE has a 12-bit immediate signed PC-relative offset, which is in units of 2 bytes, so it can jump to +-4KiB?.

LLVM's BR has both an unconditional and a conditional form. The unconditional form was treated above. The conditional form takes a boolean value and has two label immediate constants. If the boolean is TRUE, one label is jumped to, and otherwise the other label is jumped to.

WASM's BR_IF intsruction has a label immediate constant. It pops an i32 off the top of the stack. If the value is non-zero, a 'BR LABEL' is executed to the label constant. Otherwise, nothing happens (execution continues with the next instruction).

ARM has only one conditional instruction, B(conditional variant, sometimes denoted Bcond) [21]. This instruction can have one of 15 conditions (one of which is 'always', so effectively there are 14 conditions) [22]. One of the conditions is NE, which checks that the Z (zero) flag is non-zero. The branch range is -256 bytes to +254 bytes.

JVM ifne and ifnull pop the value (which must be an int or a reference, respectively) off the top of the stack and branch if it is not 0 or not null, respectively. The branch target is a signed 16-bit offset immediate constant. The branch must stay within the currently executing method. If the value is 0 or is null, execution continues with the next instruction after the branch instruction.

LuaJIT?2's IST and ISTC do not themselves contain a jump destination address, but they are always immediately followed by a JMP instruction (if the comparison test succeeds, the JMP is taken, and if the comparison test does not succeed, execution continues at the instruction after that JMP). IST jumps if the indicated variable's value is truthy (that is, anything except NIL or FALSE). ISTC first copies the value of the indicated variable to another variable, and then jumps if the indicated variable's value is truthy. The jump target is a signed 16-bit immediate (using a bias of 0x8000). The purpose of the copy in ISTC is to generate code using AND and OR (which, in Lua, return the original value of one of their operands).

CIL's BRTRUE pops a value off the stack of type 'native int'. If the value is nonzero then it jumps to a 32-bit signed immediate PC-relative offset (relative to the beginning of the instruction following the current instruction, that is). Otherwise, execution continues with the next instruction after BRTRUE. BRTRUE cannot be used to branch into or out of exception handled code (try, catch, filter, or finally blocks). BRTRUE has a short form, BRTRUE.S, with an 8-bit signed immediate offset. BRTRUE and BRTRUE.S have aliases, BRINST and BRINST.S, which are meant to operate on references (usually native pointers, i think?) and which jump if the reference is not null (that is, if the reference currently represents the instance of some object, hence the mnemonic 'BRINST').

Misc control flow

Every platform in this study except for LuaJIT?2 provides NOP (an instruction whose only purpose is to do nothing), although even in LuaJIT?2 other instructions could be synthesized which have no effect.

NOP:

NOP (RISC-V)
NOP (WASM)
donothing (LLVM Intrinsic)
nop (ARM)
nop (JVM)
nop (CIL)

Concordance of instructions supported by five platforms

In addition to the above, instructions for each of the following is provided by five platforms in this study:

Arithmetic:

integer division and remainder
conversions between signed integer and floating-point
conversions between floating-point quantities of different bitwidths

Memory access:

integer loads and stores

Atomics and sync:

FENCE or sync barrier or monitor or volatile instruction (or prefix)

Control flow:

branch on equality, inequality, less-than, greater-than-or-equal
subroutine support with some form of CALL, RETURN, and some way to do indirect CALLs

Misc:

breakpoints

Arithmetic

Constant loads

None.

Add, subtract, multiply, divide

RISC-V, WASM, LLVM, JVM, CIL provide integer division and remainder in both 32-bit and 64-bit.

RISC-V integer multiplication, division, and remainder are only included in the M extension, not the base integer instruction set.

Integer div:

DIV (i32) (RISC-V M)
DIVW (i64) (RISC-V M)
i32.div_s, i64.div_s, (WASM)
sdiv (LLVM)
idiv ldiv (JVM)
div (CIL)

Integer rem:

REM (i32) (RISC-V M)
REMW (i64) (RISC-V M)
i32.rem_s, i64.rem_s (WASM)
srem (LLVM)
irem lrem (JVM)
rem (CIL)

Shifts

None.

Logical

None.

Conversions

RISC-V, WASM, LLVM, JVM, CIL provide conversions between signed integer and floating-point, and also between floating-point bitwidths, all in both 32- and 64- bits.

RISC-V provides conversions from floats to integers (both signed and unsigned) with various rounding modes. LLVM provides 'constrained' intrinsic variants of floating-point truncation and extension which provide choice of rounding mode and exception handling mode. WASM only provides trunctation from float to integer (both signed and unsigned); but WASM also provides ceiling, floor, and nearest as separate unary operations on floats. LLVM provides conversion from float to integer (both signed and unsigned), with rounding to nearest. JVM, CIL provides conversion from float to integer, rounding-towards-zero.

Conversions from signed integer to floating-point

FCVT.S.W (convert i32 to f32) (RISC-V F)
FCVT.S.L (convert i64 to f32) (not present in RV32I) (RISC-V F)
FCVT.D.W (convert i32 to f64) (RISC-V D)
FCVT.D.L (convert i64 to f64) (RISC-V D)
f{32,64}.convert_i{32,64}_s (WASM)
sitofp (LLVM)
i2f i2d l2f l2d (JVM)
conv.r4 conv.r8 (polymorphic) (CIL)

Conversions from floating point to signed integer

FCVT.W.S (convert f32 to i32) (RISC-V F)
FCVT.L.S (convert f32 to i64) (not present in RV32I) (RISC-V F)
FCVT.W.D (convert f64 to i32) (RISC-V D)
FCVT.L.D (convert f64 to i64) (RISC-V D)
i{32,64}.trunc_f{32,64}_s (WASM)
fptosi (LLVM)
f2i d2i f2l d2l (JVM)
convert to numerical type: conv.i1 conv.i2 conv.i4 conv.i8 conv.r4 conv.r8 conv.u1 conv.u2 conv.u4 conv.u8 (CIL)
conv.i1 conv.i2 conv.i4 conv.i8 (CIL)

Floating-point conversions between different bitwidths

FCVT.D.S FCVT.S.D (RISC-V D)
f64_promote_f32 f32.demote_f64 (WASM)
fptrunc (LLVM)
fpext (LLVM)
experimental.constrained.fptrunc experimental.constrained.fpext (LLVM intrinsics)
f2d d2f (JVM)
convert to numerical type: conv.i1 conv.i2 conv.i4 conv.i8 conv.r4 conv.r8 conv.u1 conv.u2 conv.u4 conv.u8 (CIL)
conv.r4 conv.r8 (polymorphic) (CIL)

Memory access

Loads and stores

RISC-V, WASM, ARM, CIL support loads and stores from/to 8-bit , 16-bit, 32-bit, and 64-bit integers.

RISC-V, WASM, ARM, CIL provide integer loads from 8-bit, 16-bit, and 32-bit memory locations, and when loading a quantity smaller than the destination, both signed and unsigned are provided. RISC-V supports loading into whatever size the registers are (32- or 64-bits), and ARM supports loading into 32-bit registers, in contrast to WASM and CIL which support loading both 32- or 64-bit values onto the stack.

RISC-V, WASM, ARM, CIL support stores to 8-bit, 16-bit, 32-bit quantities. RISC-V supports storing to 64-bits if the registers are 64-bits, whereas WASM, CIL always supports stores of 64-bits. RISC-V supports storing from whatever size the registers are (obviously), whereas WASM, CIL supports stores from 32 or 64 bit values.

LLVM provides loads and stores of all of these data types, but as far as i can tell, the type of the value being loaded must match the type variable being loaded to/stored from (as opposed to e.g. RISC-V and WASM, which provide operations like LB and i64.load8_s to load an 8-bit value into a register/variable of 64-bit type) (todo is this correct?).

Andreas Olofsson of Adapteva indicated on a blog that he left out instructions equivalent to RISC-V's LB and LH from his Epiphany ISA, and that he regretted doing so [23].

Polymorphic loads and stores

load (LLVM)
store (LLVM)

Integer loads

Load 32-bit and 64-bit integers:

LD (i64) (RISC-V RV64I), LW (i32) (RISC-V RV32I), LWU (u32) (RISC-V RV64I)
i32.load, i64.load, i64.load32_s (WASM)
ldr (ARM)
ldind.i4 ldind.u4 ldind.i8 ldind.u8 (CIL)

Load 8-bit and 16-bit integers, unsigned:

LHU (u16) (RISC-V RV32I), LBU (u8) (RISC-V RV32I)
i64.load32_u (WASM), i64.load16_u, i32.load16_u, i64.load8_u, i32.load8_u (WASM)
ldrh ldrb (ARM)
ldind.u1 ldind.u2 (CIL)

Load 8-bit and 16-bit integers, signed:

LB (i8), LH (i16) (RISC-V RV32I),
i64.load8_s, i32.load8_s, i64.load16_s, i32.load16_s (WASM)
ldrsh ldrsb (ARM)
ldind.i1 ldind.i2 (CIL)

Integer stores

i64, i32:

SD (i64) (RISC-V RV64I), SW (i32) (RISC-V RV32I)
i64.store, i32.store, i64.store32 (WASM)
str (ARM)
stind.i4 stind.i8 (CIL)

i16:

SH (i16) (RISC-V RV32I)
i64.store16, i32.store16 (WASM)
strh (ARM)
stind.i2 (CIL)

i8:

SB (i8) (RISC-V RV32I)
i64.store8, i32.store8 (WASM)
strb (ARM)
stind.i1 (CIL)

Atomics and Sync

RISC-V, LLVM, ARM, JVM, CIL (everything in this study except for WASM and LuaJIT?2) provide various FENCE/sync barrier/monitor/volatile instructions/prefixes.

RISC-V alone provides FENCE instructions and data.

RISC-V atomics are only included in the A extension, not the base integer instruction set. FENCE and FENCE.I are in the base instruction set.

Andreas Olofsson of Adapteva said on a blog that he felt that the FENCE instruction(s) provided by RISC-V could have been left out (and were left out in his architecture, Epiphany), commenting "Benefit minimal in good SW imho" [24].

Fences

FENCE (Synch threads) (RISC-V RV32I)
FENCE.I (Synch Instr & Data) (RISC-V RV32I)
fence (LLVM)
isb dmb dsb (ARM)
sync: monitorenter monitorexit (JVM)
volatile. (CIL)

Control flow

Jumps (unconditional branch)

None.

Conditional branches

RISC-V, ARM, JVM, LuaJIT?2, CIL provide equality, inequality, less-than, greater-than-or-equal.

LuaJIT?2 also provides equality/inequality compares against constant (both immediate and constant table) numbers and 'primitives' (null/false/true).

branch: eq:

BEQ (Branch =) (i32) (RISC-V RV32I)
beq (ARM)
if_acmpeq if_icmpeq (JVM)
iseqv (LuaJIT?2)
iseqn iseqp (LuaJIT?2)
beq beq.s (CIL)

ne:

BNE (Branch !=) (i32) (RISC-V RV32I)
bne (ARM)
if_acmpne if_icmpne (JVM)
isnev (LuaJIT?2)
isnen isnep (LuaJIT?2)
bne.un bne.un.s (CIL)

lt:

BLT (Branch <) (i32) (RISC-V RV32I)
blt (ARM)
if_icmplt (JVM)
islt (LuaJIT?2)
blt blt.s blt.un blt.un.s (CIL)

ge:

BGE (Branch >=) (i32) (RISC-V RV32I)
bge (ARM)
if_icmpgt (JVM)
isge (LuaJIT?2)
bge bge.s bge.un bge.un.s (CIL)

Subroutines

WASM, LLVM, JVM, LuaJIT?2, CIL provide CALL or INVOKE, and RETURN, and (if the CALL/INVOKE was not indirect) some form of indirect CALL/INVOKE. Note that this is all of the instruction sets in this study except for the two hardware processor ISAs (RISC-V and ARM Cortex M0), neither of which have CALL.

JVM specializes return by type.

call:

CALL (WASM)
call (LLVM)
call (LuaJIT?2) (is there exception handling here?)

call with exception handling or other multiple return possibilities:

invoke callbr (LLVM)
invokeinterface invokespecial invokestatic invokevirtual (JVM)
calls: call callvirt (CIL)

return:

RETURN (WASM)
ret (LLVM)
areturn dreturn freturn ireturn lreturn return (JVM)
retm ret ret0 ret1 (LuaJIT?2)
ret (CIL)

indirect branch form of call:

CALL_INDIRECT (WASM): switch and branch to result
invokedynamic (JVM)
calli (CIL)
(CALL (LuaJIT?2) (already listed above))
(invoke) (already listed above) (LLVM)

Misc control flow

None.

Misc

RISC-V, LLVM, ARM, JVM, CIL provide breakpoint.

breakpoint:

EBREAK (RISC-V RV32I)
debugtrap (LLVM)
bkpt (ARM)
breakpoint (JVM)
break (CIL)

Concordance of instructions supported by four platforms

In addition to the above, instructions for each of the following is provided by four platforms in this study:

Arithmetic:

unsigned integer variants of division and remainder
floating point remainder/mod
compare instructions: integer less-than, floating-point equality and less-than
floating-point negation
conversions between unsigned integer and floating-point
integer conversions from 64-bit to 32-bit, and signed integer conversions from 32-bit to 64-bit

Memory access:

floating-point loads and stores

Control-flow:

'switch'-statement like indirect branching (already included in the counts above)
branch instructions for less-than-or-equals, greater-than, false or equals-zero or is-null

Data structures:

vectors/arrays, and also aggregates

Arithmetic

Constant loads

None.

Add, subtract, multiply, divide

RISC-V, WASM, LLVM, CIL have unsigned variants of integer division and remainder, in both 32-bit and 64-bit.

LLVM, JVM, LuaJIT?2, CIL have floating point remainder/mod.

Integer unsigned div:

DIVU (u32) (RISC-V M)
DIVUW (u64) (RISC-V M)
i32.div_u, i64.div_u (WASM)
udiv, (LLVM)
div.un (CIL)

Integer unsigned rem:

REMU (u32) (RISC-V M)
REMUW (u64) (RISC-V M)
i32.rem_u, i64.rem_u (WASM)
urem, (LLVM)
rem.un (CIL)

Floating-point remainder/mod:

frem (LLVM)
drem frem (JVM)
modvn modnv modvv (LuaJIT?2)
rem (polymorphic) (CIL)

Shifts

None.

Logical

None.

Compares

RISC-V, WASM, LLVM, CIL provide less-than, in integer signed and unsigned, 32-bit and 64-bit, also in float 32 and 64-bit, and provide integer equality in 32-bit and 64-bit.

RISC-V, WASM, LLVM, CIL provide float equality in 32 and 64-bit.

RISC-V, LLVM, CIL provide one instruction which is 32- or 64- bits depending on the register/value size, whereas WASM provides separate instructions for each size.

CIL also has separate pointer types and provides various comparisons over pointers. JVM provides compare-and-branch instructions for integers and pointers rather than separate compare instructions.

Andreas Olofsson of Adapteva said on a blog that he felt that the SLT* instructions provided by RISC-V could have been left out (and were left out in his architecture, Epiphany) [25].

Integer comparisons

Less than:

SLT (Set <) (i32) (RISC-V RV32I)
SLTI (Set < Immediate) (i32) (RISC-V RV32I)
SLTU (Set < Unsigned) (u32) (RISC-V RV32I)
SLTIU (Set < Imm Unsigned) (u32) (RISC-V RV32I)
i32_lt_s, i32_lt_u, i64_lt_s, i64_lt_u (WASM)
icmp sle, icmp ule (LLVM)
clt (polymorphic; also used for floating-point) clt.un (CIL)

Floating-point comparisons

Equality:

FEQ.S, FEQ.D (f32, f64) (RISC-V F): dest = (f32 == f32)
f32.eq, f64.eq (WASM)
fcmp eq and ordered (LLVM)
fcmp eq or unordered (LLVM)
ceq (polymorphic, also used for integer) (CIL)

Less than:

FLT.S, FLT.D (f32, f64) (RISC-V F): dest = (f32 < f32)
f32.lt, f64.lt (WASM)
fcmp lt and ordered (LLVM)
fcmp lt or unordered (LLVM)
clt (polymorphic; also used for integer) clt.un (CIL)

Note: " (Big parentheses: Until Lua 4.0, all order operators were translated to a single one, by translating a <= b to not (b < a). However, this translation is incorrect when we have a partial order, that is, when not all elements in our type are properly ordered. For instance, floating-point numbers are not totally ordered in most machines, because of the value Not a Number (NaN?). According to the IEEE 754 standard, currently adopted by virtually every hardware, NaN? represents undefined values, such as the result of 0/0. The standard specifies that any comparison that involves NaN? should result in false. That means that NaN? <= x is always false, but x < NaN? is also false. That implies that the translation from a <= b to not (b < a) is not valid in this case.) " -- https://www.lua.org/pil/13.2.html

Floating-point-specific

WASM, LLVM, JVM, LuaJIT?2 provide floating point negation.

f32_neg, f64_neg (WASM)
fneg (LLVM)
dneg fneg (JVM)
unm (LuaJIT?2)

Conversions

RISC-V, WASM, LLVM, CIL provide conversions between unsigned integers and floating-point, in both 32- and 64- bits.

WASM, LLVM, JVM, CIL provide integer conversions from 64-bit to 32-bit. WASM, LLVM, JVM, CIL provide integer conversions from signed 32-bit to 64-bit.

By contrast, RISC-V appears to take the approach that all numbers in integer registers are treated as signed integers of the bitwidth of the register, although when loading and storing integers operations for smaller bitwidths are provided. CIL also stores things in 32-bit or 64-bit slots in the stack, even though it has instructions to convert to bitwidths smaller than that.

Conversions from unsigned integer to floating-point

FCVT.S.WU (convert u32 to f32) (RISC-V F)
FCVT.S.LU (convert u64 to f32) (not present in RV32I) (RISC-V F)
FCVT.D.WU (convert u32 to f64) (RISC-V D)
FCVT.D.LU (convert u64 to f64) (RISC-V D)
f{32,64}.convert_i{32,64}_u (WASM)
uitofp (LLVM)
conv.r4 conv.r8 (polymorphic) (CIL)

Conversions from floating point to unsigned integer

FCVT.WU.S (convert f32 to u32) (RISC-V F)
FCVT.LU.S (convert f32 to u64) (not present in RV32I) (RISC-V F)
FCVT.WU.D (convert f64 to u32) (RISC-V D)
FCVT.LU.D (convert f64 to u64) (RISC-V D)
i{32,64}.trunc_f{32,64}_u (WASM)
fptoui (LLVM)
conv.u1 conv.u2 conv.u4 conv.u8 (polymorphic) (CIL)

Integer conversions from 64-bit to 32-bit

i32.wrap_i64 (WASM)
trunc (LLVM)
l2i (JVM)
conv.i4 conv.u4 (polymorphic) (CIL)

Integer conversions from 32-bit to 64-bit, signed

sign extend (signed source):

i64.extend_i32_s (WASM)
sext (LLVM)
i2l (JVM)
conv.i8 conv.u8 (CIL)

Memory access

Loads and stores

RISC-V, WASM, LLVM, CIL support floating point loads and stores from the heap of either 32 or 64 bits.

LLVM provides loads and stores of all of these data types, but as far as i can tell, the type of the value being loaded must match the type variable being loaded to/stored from (todo is this correct?).

Floating point loads:

FLD (f64) (RISC-V D), FLW (f32) (RISC-V F)
f32.load, f64.load (WASM)
ldind.r4 ldind.r8 (CIL)
load (LLVM)

Floating point stores:

FSD (f64) (RISC-V D), FSW (f32) (RISC-V F)
f64.store, f32.store (WASM)
stind.r4 stind.r8 (CIL)
store (LLVM)

Atomics and Sync

None.

Control flow

Jumps (unconditional branch)

WASM, LLVM, JVM, CIL provide indirect branching in the form of 'switch'-statement-like constructs of various kinds. WASM's br_table, JVM's tableswitch, and CIL's switch index into a jump table using the item on the top of the stack. LLVM's switch and JVM's lookupswitch contain a lookup table that map integer keys to potential jump destinations; at runtime the table is searched for the entry corresponding to an integer passed in an operand (or on the stack, in JVM's case).

Jump to register / indirect branch

JALR (Jump & Link Register) (32b) (RISC-V RV32I): indirect branch (unconstrained)
BR_TABLE (WASM): indirect branch to one of an enumerated set of labels
switch (LLVM)
indirectbr (LLVM): indirect branch to one of an enumerated set of labels
BX BLX (ARM): indirect branch (unconstrained)
ret (JVM) (deprecated)
switch: lookupswitch tableswitch (JVM)
switch (CIL)

Conditional branches

ARM, JVM, LuaJIT?2, CIL provide le, gt, ==0.

branch:

le:

ble (ARM)
if_icmple (JVM)
isle (LuaJIT?2)
ble ble.s ble.un ble.un.s (CIL)

gt:

bgt (ARM)
if_icmpge (JVM)
isgt (LuaJIT?2)
bgt bgt.s bgt.un bgt.un.s (CIL)

unary compare ==0 or null or false:

(beq when used as unary compare) (ARM)
ifeq ifnull (JVM)
isf isfc (is false-y) (LuaJIT?2)
brzero brzero.s brnull brnull.s brfalse brfalse.s (CIL)

Subroutines

None.

Misc control flow

None.

Data Structures

LLVM, JVM, CIL, LuaJIT?2 provide vectors/arrays and aggregates (JVM and CIL provides aggregates via OOP, which is listed separately below because less than 4 platforms provide OOP) (LuaJIT?2 tables serve as both arrays and aggregates, i think).

Vectors and arrays

creation:

anewarray newarray multianewarray (JVM)
newarr (CIL)

accessors:

extractelement insertelement (LLVM)
aaload aastore baload bastore caload castore daload dastore faload fastore iaload iastore laload lastore saload sastore (JVM)
ldelem ldelem.i ldelem.i1 ldelem.i2 ldelem.i4 ldelem.i8 ldelem.r4 ldelem.r8 ldelem.ref ldelem.u1 ldelem.u2 ldelem.u4 ldelem.u8 stelem stelem.i stelem.i1 stelem.i2 stelem.i4 stelem.i8 stelem.r4 stelem.r8 stelem.ref (CIL)

Tables and aggregates

Tables:

tnew tdup (LuaJIT?2)
tgetv tgets tgetb tsetv tsets tsetb tsetm (LuaJIT?2)
Aggregate Operations: extractvalue insertvalue getelementptr (LLVM)

Misc

None.

Concordance of instructions supported by three platforms

In addition to the above, instructions or intrinsics for each of the following is provided by three platforms in this study:

Arithmetic:

moves/copies (all register machines in this study provide MOVs)
addition and subtraction with overflow or carry, signed 32-bit (note: but only as intrinsics in LLVM)
integer negation or similar
logical NOT (either bitwise or boolean)
integer compares: equality, greater-than, unsigned greater-than
floating-point compares: less-than-or-equal-to, greater-than
floating-point specific: sqrt, copysign, min, max (note: but only as intrinsics in LLVM)
conversions:
- from larger integers to 8-bit and 16-bit integers
- from unsigned 32-bit integers to 64-bit integers
- coercive casting between integers and floating-points

Memory access:

variable loads and stores

Stack ops:

drop

Control flow:

exception handling
variable-length argument lists (variadic functions)
illegal/unreachable instruction marker

Allocation:

various allocation instructions

Data structures:

length

Misc:

supervisor call

Arithmetic

Constant loads

None.

Moves (copies)

Moves are provided by RISC-V, ARM, LuaJIT?2. RISC-V provides MOVs as pseudoinstructions. So, all register machine platforms in this study provide MOVs. Out of the platforms that do not provide MOVs: WASM, JVM, and CIL have a stack and LLVM has SSA variable assignment instead of MOVs.

(pseudoinstruction using ADDI or ORI) (RISC-V RV32I/RV64I)
mov movs (ARM)
mov (LuaJIT?2)

Add, subtract, multiply, divide

Integer addition and subtraction with overflow or carry, signed 32-bit, is provided by LLVM intrinsics, ARM, CIL.

JVM and CIL have integer negation and ARM has reverse subtraction.

Add with overflow/carry:

sadd.with.overflow.* uadd.with.overflow.* (LLVM arithmetic with overflow intrinsics)
adcs (ARM)
add.ovf (CIL)

Subtract with overflow/carry:

ssub.with.overflow.* usub.with.overflow.* (LLVM arithmetic with overflow intrinsics)
sbcs (ARM)
sub.ovf (CIL)

Negation and Reverse subtraction:

ineg lneg (JVM)
neg (also used for floating-point) (CIL)
rsbs (ARM)

Shifts

None.

Logical

ARM, CIL provide bitwise NOT. LuaJIT?2 provides boolean NOT.

NOT:

mvns (bitwise NOT) (ARM)
not (boolean NOT) (LuaJIT?2)
not (bitwise NOT) (CIL)

Compares

WASM, LLVM, CIL provide integer equality.

WASM, LLVM, CIL provide integer greater-than, both signed and unsigned, and floating-point greater-than.

RISC-V, WASM, LLVM provide floating-point less-than-or-equal-to.

Integer equality:

i32.eq i64.eq (WASM)
icmp eq (LLVM)
ceq (polymorphic; also used for floating-point) (CIL)

Integer greater-than:

i32_gt_s, i32_gt_u, i64_gt_u, i64_gt_s (WASM)
icmp sgt, icmp ugt (LLVM)
cgt cgt.un (polymorphic; also used for floating-point) (CIL)

Floating point greater-than:

f32.gt f64.gt (WASM)
fcmp (various, including gt) (LLVM)
cgt (polymorphic; also used for integer) (CIL)

Floating point less-than-or-equal-to:

FLE.S, FLE.D (f32, f64) (RISC-V F): dest = (f32 <= f32)
f32.le, f64.le (WASM)
fcmp le and ordered (LLVM)
fcmp le or unordered (LLVM)

Floating-point-specific

SQRT, COPYSIGN, MIN, MAX are provided by RISC-V, WASM, LLVM (but only as LLVM intrinsics).

Andreas Olofsson of Adapteva noted in a blog post that RISC-V's FSQRT instruction is "expensive" and that it was a "tough call" whether to include such an instruction in his Epiphany ISA [26].

In the same blog post, Andreas Olofsson of Adapteva questioned whether the FMIN and FMAX instructions provided by RISC-V are needed [27].

Andreas Olofsson of Adapteva said in a blog post that he felt that RISC-V's floating point sign instructions (including FSGNJ) were "not essential" [28].

Sqrt

FSQRT.S, FSQRT.D (f32, f64) (RISC-V F)
f32_sqrt, f64_sqrt (WASM)
sqrt.* experimental.constrained.sqrt (LLVM intrinsics)

Signs

copysign:

FSGNJ.S, FSGNJ.D (f32, f64) (RISC-V F): dest = float but overwrite sign with sign from another float)
f32_copysign, f64_copysign (WASM)
copysign.* (LLVM intrinsic)

Min, max

FMIN.S, FMIN.D (f32, f64) (RISC-V F): dest = min(f32, f32))
FMAX.S, FMAX.D (f32, f64) (RISC-V F): dest = max(f32, f32))
f32_min, f64_min, f32_max, f64_max (WASM)
minnum.* maxnum.* experimental.constrained.maxnum experimental.constrained.minnum minimum.* maximum.* (LLVM Intrinsics)

Conversions

LLVM, JVM, CIL provide conversions from larger integers to 8-bit and 16-bit integers.

WASM, LLVM, CIL provide conversions from unsigned 32-bit integers to 64-bit integers.

LLVM, RISC-V, WASM provide coercive casting between integer and floating-point values (that is, instructions which take a value thought to be of one type, and reinterpret its bit-pattern representation as if it were another type).

Integer conversions from larger bitwidth to 8-bit or 16-bit

trunc (LLVM)
i2s i2b (JVM)
conv.i1 conv.i2 conv.u1 conv.u2 (polymorphic) (note: the CIL evaluation stack actually holds nothing smaller than 32-bit integers; these merely truncate the integer) (CIL)

Integer conversions from unsigned 32-bit to 64-bit

zero extend (unsigned source):

i64.extend_i32_u (WASM)
zext (LLVM)
conv.i8 conv.u8 (CIL)

Coercions between integers and floating point of the same size

bitcast (polymorphic) (LLVM)

i32 to f32:

FMV.W.X (coerce i32 to f32) (RISC-V F)
f32_reinterpret_i32 (WASM)

f32 to i32:

FMV.X.W (coerce f32 to i32) (RISC-V F)
i32_reinterpret_f32 (WASM)

i64 to f64:

FMV.X.D (coerce f64 to i64) (RISC-V RV64F)
i64_reinterpret_f64 (WASM)

f64 to i64:

FMV.D.X (coerce i64 to f64) (RISC-V RV64F)
f64_reinterpret_i64 (WASM)

Memory access

Loads and stores (to/from memory)

Variable loads and stores

WASM, JVM, CIL support loads and stores to/from local variables. JVM specializes load/store instructions by type.

locals:

GET_LOCAL (WASM)
SET_LOCAL (WASM)
aload astore (JVM)
dload dstore (JVM)
fload fstore (JVM)
iload istore (JVM)
lload lstore (JVM)
ldloc ldloc.s stloc stloc.s (CIL)

Stack ops

WASM, JVM, CIL provides drop (called 'pop' in JVM and CIL).

drop:

DROP (WASM)
pop pop2 (JVM)
pop (CIL)

Atomics and Sync

None.

Control flow

Jumps (unconditional branch)

None.

Conditional branches

ARM and RISC-V and CIL provide unsigned versions of < and >=.

unsigned lt:

BLTU (Branch < Unsigned) (i32) (RISC-V RV32I)
blo (ARM)
blt.un blt.un.s (CIL)

unsigned ge:

BGEU (Branch >= Unsigned) (i32) (RISC-V RV32I)
BHS (ARM)
bge.un bge.un.s (CIL)

Subroutines

WASM, LLVM, LuaJIT?2(?) provide CALL instructions without exception handling. LLVM, JVM, CIL provide CALL instructions with exception handling (sometime called 'invoke'), as well as other exception handling instructions. Note that LLVM offers both CALL and INVOKE as separate instructions.

LLVM, LuiJIT?2, CIL provide instructions to support variable-length argument lists (variadic functions), as well as other argument handling.

JVM provides various forms of invoke for object-oriented purposes. JVM specializes return by type.

call:

CALL (WASM)
call (LLVM)
call (LuaJIT?2) (is there exception handling here?)

call with exception handling or other multiple return possibilities:

invoke callbr (LLVM)
invokeinterface invokespecial invokestatic invokevirtual (JVM)
calls: call callvirt (CIL)

variadic argument handling and other argument handling:

va_arg (LLVM)
Variable Argument Handling Intrinsics: va_start va_end va_copy (LLVM Intrinsics)
callm (and callmt) (LuaJIT?2)
varg (LuaJIT?2)
variadic argument: arglist (CIL)
argument handling: ldarg ldarg.0 ldarg.1 ldarg.2 ldarg.3 ldarg.s ldarga ldarga.s starg starg.s (CIL)

exception handling:

resume (LLVM)
catchswitch (LLVM)
catchret (LLVM)
cleanupret (LLVM)
landingpad catchpad cleanuppad (LLVM)
llvm.eh.typeid.for llvm.eh.begincatch llvm.eh.endcatch llvm.eh.exceptionpointer llvm.eh.sjlj.setjmp llvm.eh.sjlj.longjmp llvm.eh.sjlj.lsda llvm.eh.sjlj.callsite (LLVM exception handling intrisics)
athrow (JVM)
endfault endfilter endfinally leave leave.s rethrow throw (CIL)

Misc control flow

RISC-V, WASM, LLVM (and possibly others) provide ILLEGAL/UNREACHABLE instructions.

Illegal or unreachable instruction:

ILLEGAL (unnamed all-zero instruction) (RISC-V)
UNREACHABLE (WASM)
unreachable (LLVM)

Allocation

WASM, CIL, and LLVM provide memory allocation instructions.

WASM alone provides a linear memory which is growable. CIL provides heap allocation. LLVM alone provides stack allocation. LLVM alone provides various garbage collection, memory use markers, and ARC intrinsics.

See also the creation operations in in the Data Structures section, below, as this also usually causes allocation.

Linear memory sizing:

memory.size memory.grow (WASM)

heap allocation:

localloc (CIL)

stack frame:

alloca (LLVM)

gc and memory usage and ARC:

Accurate Garbage Collection Intrinsics: gcroot gcread gcwrite llvm.experimental.gc.statepoint llvm.experimental.gc.result llvm.experimental.gc.relocate (LLVM Intrinsics)
Memory Use Markers: lifetime.start lifetime.end invariant.start invariant.end launder.invariant.group strip.invariant.group (LLVM Intrinsics)
Objective-C ARC Runtime Intrinsics: objc.autorelease objc.autoreleasePoolPop objc.autoreleasePoolPush objc.autoreleaseReturnValue objc.copyWeak objc.destroyWeak objc.initWeak objc.loadWeak objc.loadWeakRetained objc.moveWeak objc.release objc.retain objc.retainAutorelease objc.retainAutoreleaseReturnValue objc.retainAutoreleasedReturnValue objc.retainBlock objc.storeStrong objc.storeWeak (LLVM Intrinsics)

Data Structures

JVM, LuaJIT?2, CIL provide a length operation.

length:

arraylength (JVM)
len (LuaJIT?2)
ldlen (CIL)

Misc

RISC-V, LLVM, ARM provide supervisor call. Note that this includes all of the hardware processor ISAs in this study.

supervisor call:

ECALL (RISC-V RV32I)
trap (LLVM)
SVC (ARM)

Continued at Target Languages Concordance part II

proj-plbook-plChTargetLanguagesConcordance

Cross-platform concordance of instructions from seven intermediate- or assembly- language architectures: RISC-V, WASM, LLVM, ARM Cortex M0, JVM, LuaJIT2, CLI

Introduction

Introduction to the platforms

Introduction to RISC-V

Introduction to WASM

Introduction to LLVM

Introduction to ARM Cortex

Introduction to JVM

Introduction to LuaJIT2

Introduction to CLI

Minutia

What is an instruction?

RISC-V conventions

RISC-V variants

RISC-V typing

ARM conventions

Polymorphic

Warnings, excuses, and qualifications

References

Concordance, divided into sections by popularity

Concordance of instructions supported by all seven platforms in this study

Arithmetic

Constant loads

Add, subtract, multiply

Control flow

Jumps / unconditional branch

Jump to immediate / direct branch

Unconditional indirect branches

Conditional branches

Comparisons

Concordance of instructions supported by six platforms

Arithmetic

Constant loads

Add, subtract, multiply, divide

Integer addition

Integer subtraction

Integer multiplication

Floating point add, sub, mul, div

Shifts

Logical

Control flow

Jumps / unconditional branch

Branch to register / indirect branch

Conditional branches

Misc control flow

Concordance of instructions supported by five platforms

Arithmetic

Constant loads

Add, subtract, multiply, divide

Shifts

Logical

Conversions

Conversions from signed integer to floating-point

Conversions from floating point to signed integer

Floating-point conversions between different bitwidths

Memory access

Loads and stores

Polymorphic loads and stores

Integer loads

Integer stores

Atomics and Sync

Fences

Control flow

Jumps (unconditional branch)

Conditional branches

Subroutines

Misc control flow

Misc

Concordance of instructions supported by four platforms

Arithmetic

Constant loads

Add, subtract, multiply, divide

Shifts

Logical

Compares

Integer comparisons

Floating-point comparisons

Floating-point-specific

Conversions