Continued from Target Languages Concordance part I

Concordance of instructions supported by two platforms

In addition to the above, instructions or intrinsics for each of the following is provided by two platforms in this study:

Arithmetic:

instructions to load various higher-level data structure constants such as strings
constant tables/constant pools
PC-relative arithmetic
multiplication with overflow (signed and unsigned, 32- and 64-bit)
unsigned, 64-bit variants of addition, subtraction, multiplication with overflow
right rotate
integer compares: inequality, greater-than-or-equal-to, less-than-or-equal-to,
floating point compares: inequality, greater-than-or-equal-to
clz ctz popcnt
byteswaps
integer conversions from 8-bit or 16-bit to larger
floating point:
- abs
- ceil, floor, trunc, nearest
- rounding modes and exception modes
- classify
- fused multiply-add
- pow

Memory access:

global variable loads/stores
short instructions to load/store the first 4 variables

Stack ops:

Atomics and Sync:

AMOs: SWAP, ADD, AND, OR, XOR, MIN, MAX, MINU, MAXU
either load-release/store-conditional, or compare-and-swap

Control flow:

jump with link register
unconstrained indirect branch
branches: <0, >=0, unsigned <, unsigned >=
select
tail calls
structured control flow loops
invoke instructions for object-oriented calling.

Data structures:

OOP
strings

Misc:

cycle counters
special registers
memory ops such as memcpy

Arithmetic

Constant loads

LuaJIT?2 and CIL have instructions to load various higher-level data structures such as strings.

JVM and LuaJIT?2 use constant tables/constant pools. (i think that LuaJIT?2's constant load instructions can be used for both immediate constants and constant tables, depending on if their argumens is negative or not). Note: CIL does not have a runtime-accessible constant table; it has a constant table for use at compile-time but "Compilers inspect this information, at compile time, when importing metadata, but the value of the constant itself, if used, becomes embedded into the CIL stream the compiler emits. There are no CIL instructions to access the Constant table at runtime" ([1] section II.22.9, page 216).

JVM provides instructions to push constants of null, or 0,1 sometimes 2, or -1...5, depending on type. CIL provides instructions to load -1..8 to i32 as well as null.

aconst_null dconst_0 dconst_1 fconst_0 fconst_1 fconst_2 iconst_m1 iconst_0 iconst_1 iconst_2 iconst_3 iconst_4 iconst_5 lconst_0 lconst_1 (JVM)
ldc ldc_w ldc2_w (JVM)
kstr (LuaJIT?2)
ldstr (CIL)
kdata kshort knum kpri knil (LuaJIT?2)
ldc.i4.0 ldc.i4.1 ldc.i4.2 ldc.i4.3 ldc.i4.4 ldc.i4.5 ldc.i4.6 ldc.i4.7 ldc.i4.8 ldc.i4.m1 ldc.i4.M1 ldnull (CIL)

PC-relative instructions

RISC-V has an instruction to load PC-relative constant, and ARM has PC-relative addition.

AUIPC (RISC-V RV32I)
adr (ARM)

Andreas Olofsson of Adapteva said on a blog that he is "not convinced that ((AUIPC)) is essential" [2], however a commenter, Chris, from the RISC-V project explained that it was useful [3].

Add, subtract, multiply, divide

LLVM, CIL, ARM have 32-bit signed integer addition and subtraction with overflow. Only LLVM, CIL have multiplication with overflow, as well as unsigned and 64-bit variants of addition, subtraction, multiplication with overflow.

LLVM offers 'constrained' floating point operations, which means that the rounding and exception modes are respected when those instructions are used. RISC-V provides a similar facility through its rounding mode and exception mode special registers; except that RISC-V does not provide a floating-point mod/remainder instruction.

integer addition with overflow (64-bit and unsigned variants):

sadd.with.overflow.* uadd.with.overflow.* (LLVM arithmetic with overflow intrinsics)
add.ovf add.ovf.un (CIL)

integer subtraction with overflow (64-bit and unsigned variants):

ssub.with.overflow.* usub.with.overflow.* (LLVM arithmetic with overflow intrinsics)
sub.ovf sub.ovf.un (CIL)

integer multiplication, lower bits, with overflow:

smul.with.overflow.* umul.with.overflow.* (LLVM arithmetic with overflow intrinsics)
mul.ovf mul.ovf.un (CIL)

floating point constrained operations (similar to corresponding RISC-V operations, not shown):

experimental.constrained.fadd (LLVM intrinsic)
experimental.constrained.fsub (LLVM intrinsic)
experimental.constrained.fmul (LLVM intrinsic)
experimental.constrained.fdiv (LLVM intrinsic)
experimental.constrained.frem (LLVM intrinsic) (no corresponding RISC-V operation)

Moves (copies)

None.

Shifts

WASM and ARM provide right rotate.

Rotate right:

i32_rotr, i64_rotr (WASM)
rotate: rors (ARM)

Logical

None.

Compares

WASM and LLVM provide inequality (integer and floating point), integer greater-than-or-equal-to, less-than-or-equal-to, floating point greater-than-or-equal-to (they also provide that for integer, but so do other platforms, so those are listed above rather than here).

Note that some other platforms provide these operations, but only as branches rather than vanilla compares.

Integer inequality:

i32.ne i64.ne (WASM)
icmp ne (LLVM)

Integer greater-than-or-equal-to, less-than-or-equal-to:

i32_ge_s, i32_ge_u, i64_ge_u, i64_ge_s (WASM)
i32_le_s, i32_le_u, i64_le_u, i64_le_s (WASM)
icmp sge, icmp uge, icmp sle, icmp ule (LLVM)

Floating point:

f32.ne f64.ne f32.ge f64.ge (WASM)
fcmp (various, including neq ge) (LLVM)

Misc integer arith

CLZ, CTZ, POPCNT are provided by WASM and LLVM (but only as LLVM intrinsics).

LLVM and ARM provide various byteswaps.

i32.clz, i64.clz (WASM): count leading zeros
ctlz.* (LLVM intrinsic)
i32.ctz, i64.ctz (WASM): count leading zeros
cttz.* (LLVM intrinsic)
i32.popcnt, i64.popcnt (WASM): number of bits set to 1
ctpop.* (LLVM intrinsic)

byte swap:

bswap.* (LLVM intrinsics)
rev rev16 revsh (ARM)

Floating-point-specific

WASM and LLVM provide abs(-olute-value), although LLVM abs is an intrinsic.

RISC-V and LLVM provide rounding in the form of conversion operations and rounding modes (LLVM requires the 'constrained' intrinsics to use these). WASM and LLVM provide rounding in the form of ceil, floor, trunc, nearest instructions (but WASM does not provide rounding mode control for other instructions; see [4] and [5]).

RISC-V and LLVM support IEEE exception flags and rounding modes (but LLVM only supports these with 'constrained' intrinsics. Because two platforms support this functionality it is included here, but note that some of the LLVM 'constrained' intrinsics are listed above, in the section 'Add, subtract, multiply, divide'.

Andreas Olofsson of Adapteva noted in a blog post that the Epiphany ISA does not include operations like RISC-V's exception flags, coercion operations, and rounding modes, because they were not needed for Epiphany's use case [6].

RISC-V provides an FCLASS instruction to report the attributes of a floating-point number. Andreas Olofsson of Adapteva said in a blog post that he felt that this instruction was "not essential" [7]. CIL provides ckfinite to check if a floating-point number is finite.

Fused multiply-add is provided by RISC-V and LLVM (but only as an LLVM intrinsic).

pow is provided by LLVM as an intrinsic and by LuaJIT?2.

some notes on the default rounding mode of other platforms: " WebAssembly? uses “non-stop” mode, and floating point exceptions are not otherwise observable. In particular, neither alternate floating point exception handling attributes nor the non-computational operators on status flags are supported. There is no observable difference between quiet and signalling NaN?. However, positive infinity, negative infinity, and NaN? are still always produced as result values to indicate overflow, invalid, and divide-by-zero conditions, as specified by IEEE 754-2008.

WebAssembly? uses the round-to-nearest ties-to-even rounding attribute, except where otherwise specified. Non-default directed rounding attributes are not supported. " -- [8]

" By default, LLVM optimization passes assume that the rounding mode is round-to-nearest and that floating-point exceptions will not be monitored. Constrained FP intrinsics are used to support non-default rounding modes and accurately preserve exception behavior " -- [9]

Fused multiply-add

FMADD.S, FMADD.D (f32, f64) (RISC-V F): rs1*rs2+rs3
fma.*, fmuladd.*, experimental.constrained.fma (LLVM intrinsics)

Signs

absolute value:

f32_abs, f64_abs (WASM)
fabs.* (LLVM Intrinsics)

Rounding

floor, ceiling, trunc:

f32.ceil, f32.floor, f64.ceil, f64.floor (WASM)
f32.trunc, f64.trunc (WASM)
floor.* ceil.* trunc.* experimental.constrained.ceil experimental.constrained.floor experimental.constrained.trunc (LLVM Intrinsics)

nearest int or rounding-mode determined rounding:

f32.nearest, f64.nearest (WASM)
rint.* nearbyint.* round.* lround.* llround.* experimental.constrained.rint experimental.constrained.nearbyint experimental.constrained.round (LLVM Intrinsics)

FRM (3-bit floating point rounding mode):

FRRM (RISC-V F): FRM= FRRM()
FSRM (RISC-V F): swap FRM: old_value = FSRM(new_value)
FSRMI (RISC-V F): swap FRM, immediate: old_value = FSRM(immediate new_value)

Classify

FCLASS.S, FCLASS.D (f32, f64) (RISC-V F): 10-bit mask with properties of input float = fclass(f32)
ckfinite (CIL)

Floating-point exceptions

FFLAGS (5-bit Accrued Exception Flags):

FRFLAGS (RISC-V F): read FFLAGS
FSFLAGS (RISC-V F): swap FFLAGS
FSFLAGSI (RISC-V F): swap FFLAGS, immediate

Pow

powi.* (LLVM Standard C Library Intrinsic)
pow (LuaJIT?2)

Misc floating-point

FCSR (32-bit floating point status and control register):

FRCSR (RISC-V F): 32-bit FCSR = FCSR()
FSCSR (RISC-V F): swap FCSR: old_value = FSCSR(new_value)

Conversions

LLVM and ARM provide instructions to convert from 8-bit or 16-bit quantities to larger ones.

Integer conversions from 8-bit or 16-bit to larger

sign extend:

sext (LLVM)
sxth sxtb (ARM)

zero extend:

zext (LLVM)
uxth uxtb (ARM)

Memory access

Loads and stores

Variable loads and stores

WASM and LuaJIT?2 support loads and stores to/from global variables. JVM and CIL provide short instructions to load/store the first 4 variables.

locals:

aload_0 aload_1 aload_2 aload_3 astore_0 astore_1 astore_2 astore_3 (JVM)
dload_0 dload_1 dload_2 dload_3 dstore_0 dstore_1 dstore_2 dstore_3 (JVM)
fload_0 fload_1 fload_2 fload_3 fstore_0 fstore_1 fstore_2 fstore_3 (JVM)
iload_0 iload_1 iload_2 iload_3 istore_0 istore_1 istore_2 istore_3 (JVM)
lload_0 lload_1 lload_2 lload_3 lstore_0 lstore_1 lstore_2 lstore_3 (JVM)
ldloc.0 ldloc.1 ldloc.2 ldloc.3 stloc.0 stloc.1 stloc.2 stloc.3 (CIL)

globals:

GET_GLOBAL SET_GLOBAL (WASM)
gget gset (LuaJIT?2)

Stack ops

JVM, CIL provide dup.

dup:

dup dup_x1 dup_x2 dup2 dup2_x1 dup2_x2 (JVM)
dup (CIL)

Atomics and Sync

RISC-V and LLVM provide the AMOs (Atomic Memory Operations): SWAP, ADD, AND, OR, XOR, MIN, MAX, MINU, MAXU.

RISC-V provides load-release/store-conditional. LLVM provides compare-and-swap. These are different operations but in some sense they are similar in that each can be used as a primitive upon which to build synchronization/consensus/atomicity.

C/C++ are not included in this concordance, but it's worth noting that (as of 2019) C/C++ atomics [10] [11] offer the following atomics (this is a rough summary): (compare and) exchange, load, store, test-and-set flag, clear flag, add, sub, or, xor, and, fence.

Non-Cortex ARM instructions are not included in this concordance, but ARMv8.1-A provides the AMOs: SWP, CAS, LDADD, LDCLR, LDEOR, LDSET, LDSMAX, LDSMIN, LDUMAX, LDUMIN ([12] slide 14) (and in [13] there is also talk of MemAtomicOp?_BIC (result = data AND NOT(value)) and MemAtomicOp?_ORR (result = data OR value), so perhaps those were already in there in some other update?).

Note that the intersection of the C/C++ AMOs with ARMv8.1-A AMOs with the set of AMOs provided by RISC-V and LLVM is (if you assume that ARM's BIC corresponds to an AND): swap, add, and, or, xor.

RISC-V Geneology lists only AMOSWAP and AMOADD as appearing "in at least three" of the instruction set architectures under consideration in that paper.

load-release and store-conditional (LR/SC)

LR.W, LR.D (RISC-V RV32A/RV64A)
SC.W, SC.D (RISC-V RV32A/RV64A)

Compare-and-swap (CAS)

cmpxchg (LLVM)

AMOs

swap:

AMOSWAP.W, AMOSWAP.D (RISC-V RV32A/RV64A)
atomicrmw xchg (LLVM)

add:

AMOADD.W, AMOADD.D (RISC-V RV32A/RV64A)
atomicrmw add (LLVM)

xor:

AMOXOR.W, AMOXOR.D (RISC-V RV32A/RV64A)
atomicrmw xor (LLVM)

and:

AMOAND.W, AMOAND.D (RISC-V RV32A/RV64A)
atomicrmw and (LLVM)

or:

AMOOR.W, AMOOR.D (RISC-V RV32A/RV64A)
atomicrmw or (LLVM)

min:

AMOMIN.W, AMOMIN.D (RISC-V RV32A/RV64A)
atomicrmw min (LLVM)

unsigned min:

AMOMINU.W, AMOMINU.D (RISC-V RV32A/RV64A)
atomicrmw umin (LLVM)

max:

AMOMAX.W, AMOMAX.D (RISC-V RV32A/RV64A)
atomicrmw max (LLVM)

unsigned max:

AMOMAXU.W, AMOMAXU.D (RISC-V RV32A/RV64A)
atomicrmw umax (LLVM)

Control flow

RISC-V and ARM (all of the hardware processor ISAs included in this study) both provide unconditional jump instructions which also place the source address in a 'link register'. In fact, all of RISC-V unconditional jumps do this. JVM used to provide a similar instruction with JSR, but this has been effectively deprecated.

RISC-V and ARM (all of the hardware processor ISAs included in this study) both provide unconditional indirect branch instructions to an arbitrary location in code. The other platforms do provide indirect jumps, but either require an enumerated set of all potential destinations, or provide indirect jumps for higher-level constructs only (such as LuaJIT?2, which provides the CALL instruction to call a function).

Jumps (unconditional branch)

Jump to immediate / direct branch

JAL (Jump and Link) (32b) (RISC-V RV32I)
BL (ARM)

Jump to register / indirect branch

JALR (Jump & Link Register) (32b) (RISC-V RV32I): indirect branch (unconstrained)
BLX (ARM)

Jump to register / indirect branch

JALR (Jump & Link Register) (32b) (RISC-V RV32I): indirect branch
BX BLX (ARM): indirect branch

Conditional branches

ARM and JVM provide <0, >=0.

Note: Instead of many conditional branches, WASM and LLVM instead provide separate comparison ops, and boolean conditional branch (listed above rather than here, because boolean conditional branch is instead grouped with branch-if-not-zero, which is offered by more than two platforms).

unary compare <0:

bmi (ARM)
iflt (JVM)

unary compare >=0:

bpl (ARM)
ifge (JVM)

Conditional non-branches

WASM and LLVM provide SELECT. Note that these are the same two platforms that separate compares and boolean-conditional-branch.

SELECT (WASM): "selects one of its first two operands based on whether its third operand is zero or not"
select (LLVM)

Subroutines

LuaJIT?2 and CIL provides tail calls.

WASM and Lua provide structured control flow loops.

JVM and CIL provide invoke instructions for object-oriented calling.

tailcall:

callt callmt (LuaJIT?2)
tail. (CIL)

Loops:

LOOP (WASM)
fori jfori forl iforl jforl iterl iiterl jiterl loop iloop jloop (LuaJIT?2)

OOP calls:

invokeinterface invokespecial invokestatic invokevirtual (JVM)
calls: callvirt (CIL)

Allocation

None.

Data Structures

JVM, CIL provide OOP data structures.

LuaJIT?2 and CIL provide strings (but not many operations specifically for them, as far as i can tell; operations not shown).

OOP

creation, memory:

new (JVM)
cpobj initobj newobj sizeof (CIL)

types:

checkcast instanceof (JVM)
castclass constrained isinst mkrefany (CIL)

accessors:

getfield getstatic putfield putstatic (JVM)
ldfld ldftn ldobj ldsfld ldtoken ldvirtftn refanytype refanyval stfld stobj stsfld (CIL)

Misc

Cycle counters are provided by RISC-V and LLVM (as LLVM intrinsics).

RISC-V and ARM provide Control-and-Status-Registers. Note that these are all of the hardware processor ISAs in our dataset.

Memory operations such as memcpy are provided by LLVM intrinsics and CIL.

cycle counters:

RISC-V RV32I Counters pseudo-instructions (note: these are in RV32I but not in RV32E) (RDCYCLE[H], RDTIME[H], RDINSTRRET[H])
readcyclecounter (LLVM Intrinsics)

special registers:

RISC-V RV32I Control and Status Register: CSRRW (Atomic Read/Write CSR), CSRRS (Atomic Read and Set Bits in CSR), CSRRC (Atomic Read and Clear Bits in CSR), CSRRWI (CSRRW immediate), CSRRSI (CSRRS immediate), CSRRCI (CSRRC immediate)
mrs msr (ARM)

memory:

Standard C Library Intrinsics: memcpy memmove memset.* (LLVM Intrinsics)
cpblk initblk (CIL)

Concordance of instructions supported by only one platform

Unlike the above sections for classes of instructions supported by n platforms, where n>1, here we won't bother to introduce/provide a list at the beginning of the section of the instructions classes in this section.

Arithmetic

Constant loads

RISC-V alone provides LUI and AUIPC.

Andreas Olofsson of Adapteva said on a blog that he is "not convinced that ((AUIPC)) is essential" [14], however a commenter, Chris, from the RISC-V project explained that it was useful [15].

Load upper bits:

LUI (i64, i32) (RISC-V RV32I)

Load PC-relative constant:

AUIPC (RISC-V RV32I)

Add, subtract, multiply, divide

RISC-V alone also provides integer multiplication instructions returning the high-order bits. Andreas Olofsson of Adapteva said on a blog that these operations had a "high expense/benefit ratio" [16].

LLVM intrinsics alone has saturated addition and subtraction (in both signed and unsigned forms); fixed point multiplication (in both signed and unsigned and signed saturated forms).

ARM alone has PC-relative addition.

JVM alone has increment (which operates directly on variables, not on the stack).

Integer addition

sadd.sat.* uadd.sat.* (LLVM Saturation Arithmetic Intrinsics)
ssub.sat.* usub.sat.* (LLVM Saturation Arithmetic Intrinsics)

Increment:

iinc (JVM)

PC-relative add:

adr (ARM)

Integer multiplication

Multiply, undefined upon overflow:

smul.fix.* umul.fix.* smul.fix.sat.* (LLVM fixed point arithmetic intrinsics)

Multiply, upper bits:

MULH (i64, i32) (RISC-V M): the same multiplication as MUL but returns the upper XLEN bits of the full 2×XLEN-bit product, for signed x signed
MULHU (i64, i32) (RISC-V M): the same multiplication as MUL but returns the upper XLEN bits of the full 2×XLEN-bit product, for unsigned x unsigned
MULHSU (i64, i32) (RISC-V M): the same multiplication as MUL but returns the upper XLEN bits of the full 2×XLEN-bit product, for signed x unsigned
MULW (i64) (RISC-V M): "MULW is only valid for RV64, and multiplies the lower 32 bits of the source registers, placing the sign-extension of the lower 32 bits of the result into the destination register. MUL can be used to obtain the upper 32 bits of the 64-bit product, but signed arguments must be proper 32-bit signed values, whereas unsigned arguments must have their upper 32 bits clear."

Shifts

WASM alone provides left rotate.

Rotate left:

i32_rotl, i64_rotl (WASM)

Logical

ARM alone also provides and bit-clear (bitwise-AND (x, bitwise-NOT (y)).

bics (ARM) (bitwise-AND (x, bitwise-NOT (y))

Compares

WASM alone provides equals-zero compares.

LLVM alone breaks out each floating point comparison into 'ordered' and 'unordered', which provide two options for what to do with NaNs?.

JVM alone provides trinary comparisons for floats, doubles, and longs.

Equals zero:

i32.eqz i64.eqz (WASM)

Trinary compare:

lcmp (JVM)

Floating-point comparisons

Trinary compare:

dcmpg dcmpl fcmpg fcmpl (JVM)

Other:

fcmp false (LLVM)
fcmp true (LLVM)
fcmp neither QNAN (LLVM)
fcmp either QNAN (LLVM)
fcmp (various) and ordered (LLVM)
fcmp (various) or unordered (LLVM)

Misc integer arith

LLVM alone provides bitreverse, and funnel shifts.

Other bit manipulation:

Bit Manipulation Intrinsics: bitreverse.* fshl.* fshr.* (LLVM intrinsics)

Floating-point-specific

RISC-V alone provides copy-negated-sign (FSGNJN) and XOR-sign, although these could be thought of as binary generalizations replacements for the unary sign operations negate and abs, which are provided by other systems.

Andreas Olofsson of Adapteva said in a blog post that he felt that RISC-V's floating point sign instructions (FSGNJ, FSGNJN, FSGNJX) were "not essential" [17].

RISC-V alone provides three fused multiply-add variants.

In a blog post, Andreas Olofsson of Adapteva questioned whether the FNMSUB fused multiply-add instruction variant provided by RISC-V is needed [18].

LLVM alone provides, as intrinsics only, certain other variants of min, max, and round, canonicalize, fma, and the standard C library math functions sin, cos, pow, exp, exp2, log, log10, log2.

Fused multiply-add

FMSUB.S, FMSUB.D (f32, f64) (RISC-V F): rs1*rs2-rs3
FNMADD.S, FNMADD.D (f32, f64) (RISC-V F): -rs1*rs2+rs3
FNMSUB.S, FNMSUB.D (f32, f64) (RISC-V F): -rs1*rs2-rs3

Signs

other binary sign ops:

FSGNJN.S, FSGNJN.D (f32, f64) (RISC-V F): dest = f32 but overwrite sign with negation of sign from another f32)
FSGNJX.S, FSGNJX.D (f32, f64) (RISC-V F): dest = f32 but xor sign with sign from another f32)
Standard C Library Intrinsics: powi.* sin.* cos.* pow.* exp.* exp2.* log.* log10.* log2.* experimental.constrained.pow experimental.constrained.powi experimental.constrained.sin experimental.constrained.cos experimental.constrained.exp experimental.constrained.exp2 experimental.constrained.log experimental.constrained.log10 experimental.constrained.log2 (LLVM Intrinsics)
canonicalize.* (LLVM intrinsic)

Conversions

CIL alone provides 'native' types (native int ('i'), native float ('r')).

CIL alone provides conversion variants with overflow detection.

LLVM alone provides intrinsics to convert to and from 16-bit floating point.

LLVM alone provides conversions between pointers and ints; however such conversions are unneeded in some other systems, where pointers are untyped or the same type as ints.

LLVM alone provides 'addrspacecast' to convert pointers between different 'address spaces'.

JVM alone provides conversions to chars, but not vice versa, as far as i can tell.

convert to native numerical type: conv.i conv.r.un conv.u (CIL)
convert to numerical type, with overflow: conv.ovf.i conv.ovf.i.un conv.ovf.i1 conv.ovf.i1.un conv.ovf.i2 conv.ovf.i2.un conv.ovf.i4 conv.ovf.i4.un conv.ovf.i8 conv.ovf.i8.un conv.ovf.u conv.ovf.u.un conv.ovf.u1 conv.ovf.u1.un conv.ovf.u2 conv.ovf.u2.un conv.ovf.u4 conv.ovf.u4.un conv.ovf.u8 conv.ovf.u8.un (CIL)
Half Precision Floating-Point: convert.to.fp16 convert.from.fp16 (LLVM intrinsics)
ptrtoint (LLVM)
inttoptr (LLVM)
addrspacecast (LLVM)
i2c (JVM)

Memory access

Loads and stores

ARM alone provides load multiple and store multiple.

CIL alone provides load/store 'native' integers (type 'i'), and load/store opaque references (type 'O').

Load/store multiple:

ldm stm (ARM)

Load/store native:

ldind.i stind.i (CIL)

Reference (pointer) loads and stores:

ldind.ref stind.ref (CIL)

Variable loads and stores

CIL alone provides loads of the addresses of local variables. WASM provides a function TEE_LOCAL which sets a local but then also returns its argument. LuaJIT?2 also provides loads and stores from/to upvalues.

ldloca ldloca.s (CIL)
TEE_LOCAL (like set_local but also returns its argument) (WASM)
uget usetv usets usetn usetp uclo (LuaJIT?2)

Stack ops

ARM alone provides push, pop. JVM alone provides swap.

push, pop:

push, pop (ARM)

swap:

swap (JVM)

Atomics and Sync

LLVM alone provides some rarely seen AMOs: sub nand fadd fsub.

atomicrmw sub (LLVM)
atomicrmw nand (LLVM)
atomicrmw fadd (LLVM)
atomicrmw fsub (LLVM)

Control flow

Conditional branches

ARM alone also provides branch-if-overflow, branch-if-not-overflow, unsigned >, unsigned <=.

JVM alone also provides >0, <=0.

LuaJIT?2 alone provides equality/inequality compares against strings.

bvs bvc bhi bls (ARM)
ifgt ifle (compare vs. 0) (JVM)
iseqs isnes (LuaJit?2)

Subroutines

LuaJIT?2 alone provides special instructions for iterators, and for closures.

iterators:

iterc itern isnext (LuaJIT?2)

closures:

fnew (LuaJIT?2)

Other structured control flow

WASM alone provides blocks and if/else.

BLOCK (WASM)
IF (WASM)
ELSE (WASM)
END (WASM)

Data Structures

LLVM alone provides a shufflevector operation.

CIL alone provides address-of-array-element and address-of-object-field accessors.

CIL alone provides polymorphic boxing and unboxing instructions.

LuaJIT?2 alone provides a cat (concatenate) operation (i think that LuaJIT?2 len is on strings and tables. I don't know if cat is only for strings or if it applies to tables also).

LLVM alone provides vector reduction ops, and masked vector ops.

array misc:

shufflevector (LLVM)

array accessors:

ldelema (CIL)

OOP accessors:

ldflda ldsflda (CIL)

misc:

box unbox unbox.any (CIL)
cat (concat) (LuaJIT?2)

vector reduction:

Experimental Vector Reduction Intrinsics: experimental.vector.reduce.add.* experimental.vector.reduce.fadd.* experimental.vector.reduce.mul.* experimental.vector.reduce.fmul.* experimental.vector.reduce.and.* experimental.vector.reduce.or.* experimental.vector.reduce.xor.* experimental.vector.reduce.smax.* experimental.vector.reduce.smin.* experimental.vector.reduce.umax.* experimental.vector.reduce.umin.* experimental.vector.reduce.fmax.* experimental.vector.reduce.fmin.* (LLVM)

masked vectors (LLVM Intrinsics):

Masked Vector Load and Store Intrinsics: masked.load.* masked.store.* (LLVM Intrinsics)
Masked Vector Gather and Scatter Intrinsics: masked.gather.* masked.scatter.* (LLVM Intrinsics)
Masked Vector Expanding Load and Compressing Store Intrinsics: masked.expandload.* masked.compressstore.* (LLVM Intrinsics)

Misc

LLVM alone provides phi, and various other intrinsics.

ARM alone provides interrupt handling, and 'event' hints for power saving.

JVM alone provides a compression instruction instruction (wide), and implementation-dependent instructions.

LuaJIT?2 alone provides function header instructions.

CIL provides various misc. prefixes such as readonly, no.rangecheck.

misc misc:

phi (LLVM)
Code Generator Intrinsics: returnaddress addressofreturnaddress sponentry frameaddress localescape localrecover read_register write_register stacksave stackrestore get.dynamic.area.offset prefetch pcmarker clear_cache instrprof.increment instrprof.increment.step instrprof.value.profile thread.pointer (LLVM Intrinsics)
Debugger Intrinsics: llvm.dbg.addr llvm.dbg.declare llvm.dbg.value (LLVM Intrinsics)
Trampoline Intrinsics: init.trampoline adjust.trampoline (LLVM Intrinsics)
Stack Map Intrinsics: llvm.experimental.stackmap llvm.experimental.patchpoint.* (LLVM Intrinsics)
Element Wise Atomic Memory Intrinsics: memcpy.element.unordered.atomic memmove.element.unordered.atomic memset.element.unordered.atomic (LLVM Intrinsics)
General Intrinsics: var.annotation ptr.annotation.* annotation.* codeview.annotation stackprotector stackguard objectsize expect assume ssa_copy type.test type.checked.load experimental.deoptimize experimental.guard experimental.widenable.condition load.relative sideeffect is.constant.* (LLVM Intrinsics)
interrupts: cpsid cpsie (ARM)
sev wfe wfi (ARM)
wide (JVM)
impdep1 impdep2 (JVM)
Function headers: funcf ifuncf jfuncf funcv ifuncv jfuncv funcc funccw func (LuaJIT?2)
no.typecheck/rangecheck/nullcheck (CIL)
readonly. unaligned. (CIL)

Full concordance

Arithmetic

Constant loads

WASM, JVM, CIL have instructions to directly load i32, i64, f32, f64 constants (but not unsigned?). RISC-V has various instructions that can be specialized to directly load i64 (or i32, if the chip is RV32I instead of RV64I); everything else must be synthesized/coerced.

Lua has f64 only, but LuaJIT?2 does have an instruction to load 16-bit immediate constants. LuaJIT?2 uses constant tables. LuaJIT?2 and CIL have instructions to load various higher-level data structures such as strings.

JVM uses a constant pool rather than immediate constants. JVM does provide immediate constants of 8- and 16-bits (extended to ints of 32 bits) via bipush and sipush.

JVM provides instructions to push constants of null, or 0,1 sometimes 2, or -1...5, depending on type. CIL provides instructions to load -1..8 to i32 as well as null.

RISC-V alone provides LUI.

LLVM doesn't need constant loads because constants can be assigned to a variable in the LLVM IR without an instruction.

(pseudoinstruction using ADDI or ORI) (i64, i32) (RISC-V RV32I/RV64I)
i32.const, i64.const, f32.const, f64.const (WASM)
bipush sipush (JVM)
aconst_null dconst_0 dconst_1 fconst_0 fconst_1 fconst_2 iconst_m1 iconst_0 iconst_1 iconst_2 iconst_3 iconst_4 iconst_5 lconst_0 lconst_1 (JVM)
ldc ldc_w ldc2_w (JVM)
kstr kdata kshort knum kpri knil (LuaJIT?2)
ldstr (CIL)
ldc.i4 ldc.i4.0 ldc.i4.1 ldc.i4.2 ldc.i4.3 ldc.i4.4 ldc.i4.5 ldc.i4.6 ldc.i4.7 ldc.i4.8 ldc.i4.m1 ldc.i4.M1 ldc.i4.s ldc.i8 ldc.r4 ldc.r8 ldnull (CIL)

Load upper bits:

LUI (i64, i32) (RISC-V RV32I)

Moves (copies)

Moves are provided by RISC-V, ARM, LuaJIT?2. RISC-V provides MOVs as pseudoinstructions. WASM, JVM, and CIL have a stack and LLVM has SSA assignment instead of MOVs.

(pseudoinstruction using ADDI or ORI) (RISC-V RV32I/RV64I)
mov movs (ARM)
mov (LuaJIT?2)

Add, subtract, multiply, divide

All of RISC-V, WASM, LLVM, ARM, JVM, CIL have 32-bit signed ADD and SUB. LuaJIT?2 has 64-bit signed ADD and SUB.

RISC-V, WASM, LLVM, JVM, CIL have all 4 combinations of integer, float, 64-bit, 32-bit variants of ADD and SUB.

All of RISC-V, WASM, LLVM, ARM, JVM, CIL provide an integer multiplication which returns the lower half of the resulting bits (the "low order bits"; equivalently, the result mod 2^bitwidth).

RISC-V, WASM, LLVM, JVM, CIL have 64-bit and 32-bit signed integer variants of DIV and REM, and 64-bit and 32-bit float variants of MUL and DIV. LuaJIT?2 provides only 64-bit variants of MUL, DIV, MOD.

RISC-V, WASM, LLVM, CIL have all 4 combinations of 64-bit, 32-bit, signed, unsigned integer variants, of DIV and REM, and 64-bit and 32-bit float variants of MUL and DIV.

Andreas Olofsson of Adapteva noted in a blog post that RISC-V's FDIV (floating point division) instruction is "expensive" and that it was a "tough call" whether to include such an instruction in his Epiphany ISA [20]. In the same blog post, he noted that Epiphany did not have integer division or remainder because they didn't fit Epiphany's intended use cases.

RISC-V floating point operations are only included in the floating point extensions, not the base integer instruction set. RISC-V integer multiplication, division, and remainder are only included in the M extension, not the base integer instruction set.

LLVM, JVM, LuaJIT?2, CIL have floating point remainder/mod.

LLVM intrinsics alone has saturated addition and subtraction (in both signed and unsigned forms); fixed point multiplication (in both signed and unsigned and signed saturated forms).

Integer addition and subtraction with overflow or carry, signed 32-bit, is provided by LLVM intrinsics, ARM, CIL. LLVM, CIL also have multiplication with overflow, as well as unsigned and 64-bit variants of addition, subtraction, multiplication with overflow.

LLVM offers 'constrained' floating point operations, which means that the rounding and exception modes are respected when those instructions are used.

JVM and CIL have integer negation and ARM has reverse subtraction.

JVM alone has increment (which operates directly on variables, not on the stack).

Polymorphic arithmetic

Used for both integers and floats

add sub mul div neg rem (CIL)

Integer division truncates towards zero.

Integer division or negation throws DivideByZeroException? if division by zero is attempted. Integer division or negation throws System.ArithmeticException? if the result cannot be represented in the result type, for example, for (the smallest representable integer value) / -1.

Integer negation is twos-complement negation. Neg(the most negative number) = the most negative number (note that in twos-complement, the true negation of the most negative number cannot be represented).

For integer remainder, " result = value1 rem value2 satisfies the following conditions:

result = value1 – value2*(value1 div value2)
0 <=

result

value2

sign(result) = sign(value1), where div is the division instruction, which truncates towards zero. " -- [21] page 381

Floating-point division is per IEC 60559:1989. Division of a finite number by 0 produces the correctly signed infinite value and 0 / 0 = NaN?, infinity / infinity = NaN?, anything / infinity = 0.

Negation of a floating point number cannot overflow. neg(NaN?) = NaN?.

Floating-point overflow returns +inf or -inf. For floating-point types, 0 * infinity = NaN?.

For floating-point remainder,

"rem is defined similarly as for integer operands, except that, if value2 is zero or value1 is infinity, result is NaN?. If value2 is infinity, result is value1. This definition is different from the one for floating-point remainder in the IEC 60559:1989 Standard. That Standard specifies that value1 div value2 is the nearest integer instead of truncating towards zero. System.Math.IEEERemainder(see Partition IV) provides the IEC 60559:1989 behavior.

...

Example:

+10 rem +6 is 4 (+10 div +6 = 1)
+10 rem -6 is 4 (+10 div -6 = -1)
-10 rem +6 is -4 (-10 div +6 = -1)
-10 rem -6 is -4 (-10 div -6 = 1)

For the various floating-point values of 10.0 and 6.0, rem gives the same values; System.Math.IEEERemainder, however, gives the following values.

System.Math.IEEERemainder(+10.0,+6.0) is -2 (+10.0 div +6.0 = 1.666...7)
System.Math.IEEERemainder(+10.0,-6.0) is -2 (+10.0 div -6.0 =-1.666...7)
System.Math.IEEERemainder(-10.0,+6.0) is 2 (-10.0 div +6.0 =-1.666...7)
System.Math.IEEERemainder(-10.0,-6.0) is 2 (-10.0 div -6.0 = 1.666...7) " -- [22] page 381

Integer addition

ADD, ADDI, ADDW, ADDIW (i64, i64 immediate, i32, i32 immediate) (RISC-V RV32I/RV64I)
i32.add, i64.add (WASM)
add (LLVM)
sadd.with.overflow.* uadd.with.overflow.* (LLVM arithmetic with overflow intrinsics)
sadd.sat.* uadd.sat.* (LLVM Saturation Arithmetic Intrinsics)
add adds adcs cmn (ARM)
iadd ladd (JVM)
add.ovf add.ovf.un (CIL)

Increment:

iinc (JVM)

Integer subtraction

Subtract:

SUB, SUBW (i64, i32) (RISC-V RV32I/RV64I)
i32.sub, i64.sub (WASM)
sub (LLVM)
ssub.with.overflow.* usub.with.overflow.* (LLVM arithmetic with overflow intrinsics)
ssub.sat.* usub.sat.* (LLVM Saturation Arithmetic Intrinsics)
sub subs sbcs cmp (ARM)
isub lsub (JVM)
sub.ovf sub.ovf.un (CIL)

Negation

ineg lneg (JVM)

Reverse subtract:

rsbs (ARM)

Integer multiplication

Multiply, lower bits:

MUL (i64, i32) (RISC-V M): "MUL performs an XLEN-bit×XLEN-bit multiplication and places the lower XLEN bits in the destination register."
i32.mul, i64.mul (WASM)
mul (LLVM)
smul.with.overflow.* umul.with.overflow.* (LLVM arithmetic with overflow intrinsics)
muls (ARM)
imul lmul (JVM)
mul.ovf mul.ovf.un (CIL)

Multiply, undefined upon overflow:

smul.fix.* umul.fix.* smul.fix.sat.* (LLVM fixed point arithmetic intrinsics)

Multiply, upper bits:

MULH (i64, i32) (RISC-V M): the same multiplication as MUL but returns the upper XLEN bits of the full 2×XLEN-bit product, for signed x signed
MULHU (i64, i32) (RISC-V M): the same multiplication as MUL but returns the upper XLEN bits of the full 2×XLEN-bit product, for unsigned x unsigned
MULHSU (i64, i32) (RISC-V M): the same multiplication as MUL but returns the upper XLEN bits of the full 2×XLEN-bit product, for signed x unsigned
MULW (i64) (RISC-V M): "MULW is only valid for RV64, and multiplies the lower 32 bits of the source registers, placing the sign-extension of the lower 32 bits of the result into the destination register. MUL can be used to obtain the upper 32 bits of the 64-bit product, but signed arguments must be proper 32-bit signed values, whereas unsigned arguments must have their upper 32 bits clear."

Integer division and remainder

Div:

DIV (i32) (RISC-V M)
DIVU (u32) (RISC-V M)
DIVW (i64) (RISC-V M)
DIVUW (u64) (RISC-V M)
i32.div_s, i32.div_u, i64.div_s, i64.div_u (WASM)
udiv, sdiv (LLVM)
idiv ldiv (JVM)
div.un (CIL)

Rem:

REM (i32) (RISC-V M)
REMU (u32) (RISC-V M)
REMW (i64) (RISC-V M)
REMUW (u64) (RISC-V M)
i32.rem_s, i32.rem_u, i64.rem_s, i64.rem_u (WASM)
urem, srem (LLVM)
irem lrem (JVM)
rem.un (CIL)

Floating point add, sub, mul, div

Add:

FADD.S, FADD.D (f32, f64) (RISC-V F)
f32_add, f64_add (WASM)
fadd (LLVM)
experimental.constrained.fadd (LLVM intrinsic)
dadd fadd (JVM)
addvn addnv addvv (LuaJIT?2)

Sub:

FSUB.S, FSUB.D (f32, f64) (RISC-V F)
f32_sub, f64_sub (WASM)
fsub (LLVM)
experimental.constrained.fsub (LLVM intrinsic)
dsub fsub (JVM)
subvn subnv subvv (LuaJIT?2)

Mul:

FMUL.S, FMUL.D (f32, f64) (RISC-V F)
f32_mul, f64_mul (WASM)
fmul (LLVM)
experimental.constrained.fmul (LLVM intrinsic)
dmul fmul (JVM)
mulvn mulnv mulvv (LuaJIT?2)

Div:

FDIV.S, FDIV.D (f32, f64) (RISC-V F)
f32_div, f64_div (WASM)
fdiv (LLVM)
experimental.constrained.fdiv (LLVM intrinsic)
ddiv fdiv (JVM)
divvn divnv divvv (LuaJIT?2)

Remainder/mod:

frem (LLVM)
experimental.constrained.frem (LLVM intrinsic)
drem frem (JVM)
modvn modnv modvv (LuaJIT?2)

Shifts

All of RISC-V, WASM, LLVM, ARM, CIL provide 32-bit left shift, logical/unsigned right shift, and arithmetic/signed right shift. RISC-V, WASM, LLVM, CIL also provide 64-bit variants.

WASM and ARM provide 32-bit right rotate. WASM alone provides left rotates (and 64-bit variant of right rotate).

Shift left:

SLL, SLLI, SLLW, SLLIW (i64, i64 immediate, i32, i32 immediate) (RISC-V RV32I/RV64I)
i32_shl, i64_shl (WASM)
shl (LLVM)
lsls (ARM)
ishl lshl (JVM)
shl (CIL)

Shift right logical/shift right unsigned:

SRL, SRLI, SRLW, SRLIW (i64, i64 immediate, i32, i32 immediate) (RISC-V RV32I/RV64I)
i32.shr_u, i64.shr_u (WASM)
lshr (LLVM)
lsrs (ARM)
iushr lushr (JVM)
shr.un (CIL)

Shift right arithmetic/shift right signed:

SRA, SRAI, SRAW, SRAIW (i64, i64 immediate, i32, i32 immediate) (RISC-V RV32I/RV64I)
i32.shr_s, i64.shr_s, (WASM)
ashr (LLVM)
asrs (ARM)
ishr lshr (JVM)
shr (CIL)

Rotate left:

i32_rotl, i64_rotl (WASM)

Rotate right:

i32_rotr, i64_rotr (WASM)
rotate: rors (ARM)

Logical

All of RISC-V, WASM, LLVM, ARM, JVM, CIL provide AND, OR, and XOR. RISC-V provides one instruction which is 32- or 64- bits depending on the register size, where WASM and JVM provide separate instructions for each size. ARM only provides 32-bit.

ARM, CIL provide bitwise NOT.

ARM alone also provides and bit-clear (bitwise-AND (x, bitwise-NOT (y)).

LuaJIT?2 provides boolean NOT.

Andreas Olofsson of Adapteva said on a blog that he felt that the immediate variants provided by RISC-V (ANDI, ORI, XORI) could have been left out (and were left out in his architecture, Epiphany) [23].

AND:

AND, ANDI (i64) (RISC-V RV32I/64I)
i32.and, i64.and (WASM)
and (LLVM)
ands (ARM)
tst (ARM) (same as ANDS but only updates flags; discards the result)
iand land (JVM)
and (CIL)

OR:

OR, ORI (i64) (RISC-V RV32I/64I)
i32.or, i64.or (WASM)
or (LLVM)
ors (ARM)
ior lor (JVM)
or (CIL)

XOR:

XOR, XORI (i64) (RISC-V RV32I/64I)
i32.xor, i64.xor (WASM)
xor (LLVM)
eor (ARM)
ixor lxor (JVM)
xor (CIL)

NOT:

mvns (bitwise NOT) (ARM)
not (boolean NOT) (LuaJIT?2)
not (bitwise NOT) (CIL)

misc:

bics (ARM) (bitwise-AND (x, bitwise-NOT (y))

Compares

In this section we are only talking about compares which produce a result value, not branch, which is treated in the control flow section.

RISC-V, WASM, LLVM provide < (less-than) in integer (signed and unsigned) and floating point. All of RISC-V and WASM and LLVM provide floating point == (equality), < (less-than) and <= (less-than-or-equal-to) in both 32- and 64-bit. RISC-V and LLVM provide one instruction which is 32- or 64- bits depending on the register/value size, whereas WASM provides separate instructions for each size.

WASM and LLVM also provide integer <=, >, >= (less-than-or-equal-to, greater-than, and greater-than-or-equal-to) in all of 32- and 64-bit and unsigned and signed. WASM and LLVM also provide integer ==, != (equality, inequality), and floating point !=, >, >= (inequality, greater-than, greater-than-or-equal-to), all in both 32- and 64-bit, although i believe these would be trivial to synthesize on RISC-V. WASM also provides ==0 (equality with zero), which would also be trivial to synthesize on RISC-V or LLVM.

CIL provides ==, < > in both integer and floating point, both 32- and 64-bit, and < > in both signed and unsigned.

WASM alone provides equals-zero compares.

LLVM also breaks out each floating point comparison into 'ordered' and 'unordered', which provide two options for what to do with NaNs?.

JVM alone provides trinary comparisons for floats, doubles, and longs. JVM provides compare-and-branch instructions for integers and pointers rather than separate compare instructions.

CIL also has separate pointer types and provides various comparisons over pointers through the same polymorphic compare instructions used for other types.

Andreas Olofsson of Adapteva said on a blog that he felt that the SLT* instructions provided by RISC-V could have been left out (and were left out in his architecture, Epiphany) [24].

Polymorphic

ceq (CIL)
cgt cgt.un (CIL)
clt clt.un (CIL)

Integer comparisons

Less than:

SLT (Set <) (i32) (RISC-V RV32I)
SLTI (Set < Immediate) (i32) (RISC-V RV32I)
SLTU (Set < Unsigned) (u32) (RISC-V RV32I)
SLTIU (Set < Imm Unsigned) (u32) (RISC-V RV32I)
i32_lt_s, i32_lt_u, i64_lt_s, i64_lt_u (WASM)
icmp slt, icmp ult (LLVM)

Greater than, greater than or equal to, less than or equal to:

i32_gt_s, i32_gt_u, i64_gt_u, i64_gt_s (WASM)
i32_ge_s, i32_ge_u, i64_ge_u, i64_ge_s (WASM)
i32_le_s, i32_le_u, i64_le_u, i64_le_s (WASM)
icmp sgt, icmp ugt, icmp sge, icmp uge, icmp sle, icmp ule (LLVM)

Equality and inequality:

i32.eq i64.eq i32.ne i64.ne (WASM)
icmp eq, icmp ne (LLVM)

Equals zero:

i32.eqz i64.eqz (WASM)

Trinary compare:

lcmp (JVM)

Floating-point comparisons

Equality:

FEQ.S, FEQ.D (f32, f64) (RISC-V F): dest = (f32 == f32)
f32.eq, f64.eq (WASM)
fcmp eq and ordered (LLVM)
fcmp eq or unordered (LLVM)

Less than:

FLT.S, FLT.D (f32, f64) (RISC-V F): dest = (f32 < f32)
f32.lt, f64.lt (WASM)
fcmp lt and ordered (LLVM)
fcmp lt or unordered (LLVM)

Less than or equal to:

FLE.S, FLE.D (f32, f64) (RISC-V F): dest = (f32 <= f32)
f32.le, f64.le (WASM)
fcmp le and ordered (LLVM)
fcmp le or unordered (LLVM)

Trinary compare:

dcmpg dcmpl fcmpg fcmpl (JVM)

Other:

f32.ne f64.ne f32.gt f64.gt f32.ge f64.ge (WASM)
fcmp false (LLVM)
fcmp true (LLVM)
fcmp neither QNAN (LLVM)
fcmp either QNAN (LLVM)
fcmp neq and ordered (LLVM)
fcmp neq or unordered (LLVM)
fcmp ge and ordered (LLVM)
fcmp ge or unordered (LLVM)
fcmp gt and ordered (LLVM)
fcmp gt or unordered (LLVM)

Misc integer arith

CLZ, CTZ, POPCNT are provided by WASM and LLVM (but only as LLVM intrinsics).

LLVM and ARM provide various byteswaps. LLVM alone provides bitreverse, and funnel shifts.

The pattern so far seems to be that with a few exceptions (multiply upper bits, ROTs, equals-zero), LLVM provides all of the integer and floating-point arithmetic that either RISC-V or WASM provides, and then some; although for some functionality LLVM requires intrinsics rather than ordinary instructions (floating-point rounding and error modes; CLZ/CTZ/POPCNT).

i32.clz, i64.clz (WASM): count leading zeros
ctlz.* (LLVM intrinsic)
i32.ctz, i64.ctz (WASM): count leading zeros
cttz.* (LLVM intrinsic)
i32.popcnt, i64.popcnt (WASM): number of bits set to 1
ctpop.* (LLVM intrinsic)

Other bit manipulation:

Bit Manipulation Intrinsics: bitreverse.* fshl.* fshr.* (LLVM intrinsics)

byte swap:

bswap.* (LLVM intrinsics)
rev rev16 revsh (ARM)

Floating-point-specific

SQRT, COPYSIGN, MIN, MAX are provided by RISC-V, WASM, LLVM (but only as LLVM intrinsics).

Andreas Olofsson of Adapteva noted in a blog post that RISC-V's FSQRT instruction is "expensive" and that it was a "tough call" whether to include such an instruction in his Epiphany ISA [25].

In the same blog post, Andreas Olofsson of Adapteva questioned whether the FMIN and FMAX instructions provided by RISC-V are needed [26].

RISC-V, WASM, LLVM each provide additional sign operations. RISC-V alone provides copy-negated-sign (FSGNJN) and XOR-sign. WASM and LLVM provide abs(-olute-value), although LLVM abs is an intrinsic. WASM, LLVM, JVM, LuaJIT?2 provide neg(-ate).

Andreas Olofsson of Adapteva said in a blog post that he felt that RISC-V's floating point sign instructions (FSGNJ, FSGNJN, FSGNJX) were "not essential" [27].

RISC-V and LLVM support IEEE exception flags and rounding modes (but LLVM only supports these with 'constrained' instrinsics.

RISC-V provides an FCLASS instruction to report the attributes of a floating-point number. Andreas Olofsson of Adapteva said in a blog post that he felt that this instruction was "not essential" [31]. CIL provides ckfinite to check if a floating-point number is finite.

Fused multiply-add is provided by RISC-V and LLVM (but only as an LLVM intrinsic). RISC-V alone provides three fused multiply-add variants.

In a blog post, Andreas Olofsson of Adapteva questioned whether the FNMSUB fused multiply-add instruction variant provided by RISC-V is needed [32].

Both RISC-V and WASM provide all floating-point operations in both 32- and 64-bits.

pow is provided by LLVM as an intrinsic and by LuaJIT?2.

LLVM alone provides, as intrinsics only, certain other variants of min, max, and round, canonicalize, fma, and the standard C library math functions sin, cos, pow, exp, exp2, log, log10, log2.

" WebAssembly? uses “non-stop” mode, and floating point exceptions are not otherwise observable. In particular, neither alternate floating point exception handling attributes nor the non-computational operators on status flags are supported. There is no observable difference between quiet and signalling NaN?. However, positive infinity, negative infinity, and NaN? are still always produced as result values to indicate overflow, invalid, and divide-by-zero conditions, as specified by IEEE 754-2008.

WebAssembly? uses the round-to-nearest ties-to-even rounding attribute, except where otherwise specified. Non-default directed rounding attributes are not supported. " -- [33]

Sqrt

FSQRT.S, FSQRT.D (f32, f64) (RISC-V F)
f32_sqrt, f64_sqrt (WASM)
sqrt.* experimental.constrained.sqrt (LLVM intrinsics)

Fused multiply-add

FMADD.S, FMADD.D (f32, f64) (RISC-V F): rs1*rs2+rs3
FMSUB.S, FMSUB.D (f32, f64) (RISC-V F): rs1*rs2-rs3
FNMADD.S, FNMADD.D (f32, f64) (RISC-V F): -rs1*rs2+rs3
FNMSUB.S, FNMSUB.D (f32, f64) (RISC-V F): -rs1*rs2-rs3
fma.*, fmuladd.*, experimental.constrained.fma (LLVM intrinsics)

Signs

copysign:

FSGNJ.S, FSGNJ.D (f32, f64) (RISC-V F): dest = float but overwrite sign with sign from another float)
f32_copysign, f64_copysign (WASM)
copysign.* (LLVM intrinsic)

other binary sign ops:

FSGNJN.S, FSGNJN.D (f32, f64) (RISC-V F): dest = f32 but overwrite sign with negation of sign from another f32)
FSGNJX.S, FSGNJX.D (f32, f64) (RISC-V F): dest = f32 but xor sign with sign from another f32)

absolute value:

f32_abs, f64_abs (WASM)
fabs.* (LLVM Intrinsics)

negation:

f32_neg, f64_neg (WASM)
fneg (LLVM)
dneg fneg (JVM)
unm (LuaJIT?2)

Rounding

floor, ceiling, trunc:

f32.ceil, f32.floor, f64.ceil, f64.floor (WASM)
f32.trunc, f64.trunc (WASM)
floor.* ceil.* trunc.* experimental.constrained.ceil experimental.constrained.floor experimental.constrained.trunc (LLVM Intrinsics)

nearest int or rounding-mode determined rounding:

f32.nearest, f64.nearest (WASM)
rint.* nearbyint.* round.* lround.* llround.* experimental.constrained.rint experimental.constrained.nearbyint experimental.constrained.round (LLVM Intrinsics)

FRM (3-bit floating point rounding mode):

FRRM (RISC-V F): FRM= FRRM()
FSRM (RISC-V F): swap FRM: old_value = FSRM(new_value)
FSRMI (RISC-V F): swap FRM, immediate: old_value = FSRM(immediate new_value)

Min, max

FMIN.S, FMIN.D (f32, f64) (RISC-V F): dest = min(f32, f32))
FMAX.S, FMAX.D (f32, f64) (RISC-V F): dest = max(f32, f32))
f32_min, f64_min, f32_max, f64_max (WASM)
minnum.* maxnum.* experimental.constrained.maxnum experimental.constrained.minnum minimum.* maximum.* (LLVM Intrinsics)

Classify

FCLASS.S, FCLASS.D (f32, f64) (RISC-V F): 10-bit mask with properties of input float = fclass(f32)
ckfinite (CIL)

Floating-point exceptions

FFLAGS (5-bit Accrued Exception Flags):

FRFLAGS (RISC-V F): read FFLAGS
FSFLAGS (RISC-V F): swap FFLAGS
FSFLAGSI (RISC-V F): swap FFLAGS, immediate

Pow

powi.* (LLVM Standard C Library Intrinsic)
pow (LuaJIT?2)

Misc floating-point

FCSR (32-bit floating point status and control register):

FRCSR (RISC-V F): 32-bit FCSR = FCSR()
FSCSR (RISC-V F): swap FCSR: old_value = FSCSR(new_value)
Standard C Library Intrinsics: powi.* sin.* cos.* pow.* exp.* exp2.* log.* log10.* log2.* experimental.constrained.pow experimental.constrained.powi experimental.constrained.sin experimental.constrained.cos experimental.constrained.exp experimental.constrained.exp2 experimental.constrained.log experimental.constrained.log10 experimental.constrained.log2 (LLVM Intrinsics)
canonicalize.* (LLVM intrinsic)

Conversions

RISC-V, WASM, LLVM, JVM, CIL provide conversions from signed integer to floating-point, in both 32- and 64- bits. RISC-V, WASM, LLVM provide conversions from unsigned integer to floating point, in both 32- and 64- bits. RISC-V, WASM, LLVM, CIL provide coercions between integer and floating point in both 32- and 64-bit.

RISC-V provides conversions from floats to integers (both signed and unsigned) with various rounding modes. LLVM provides 'constrained' intrinsic variants of floating-point truncation and extension which provide choice of rounding mode and exception handling mode. WASM only provides trunctation from float to integer (both signed and unsigned); but WASM also provides ceiling, floor, and nearest as separate unary operations on floats. LLVM provides conversion from float to integer (both signed and unsigned), with rounding to nearest. JVM, CIL provides conversion from float to integer, rounding-towards-zero.

WASM, LLVM, JVM, CIL provide conversions from 32-bit integers to 64-bit integers and vice versa, and also from 32-bit floats to 64-bit floats and vice versa. In RISC-V, the registers are either 32-bits (RV32I) or 64-bits (RV64I); if they are only 32-bits, then 64-bit quantities cannot be expressed, and if they are 64-bits, then all quantities are always stored in registers 64-bit format (i think?). CIL silently truncates high-order bits upon overflow when converting one integer type to another.

ARM, LLVM, CIL provide conversions from 8-bit and 16-bit integers to 32-bit, in both signed and unsigned variants. LLVM, CIL also provide conversions from other smaller bitwidths to other larger bitwidths. CIL also provides conversions to 8-bit and 16-bit integers from larger integers or from floating-point.

JVM provides conversions from 32-bit integers to 16-bit and 8-bit integers, and to chars, but not vice versa, as far as i can tell.

LLVM alone provides intrinsics to convert to and from 16-bit floating point.

LLVM alone provides conversions between pointers and ints; however such conversions are unneeded in RISC-V, which is untyped, and WASM, where pointers ('memarg') are just a pair of u32s ({offset u32, align u32}).

LLVM alone provides 'addrspacecast' to convert pointers between different 'address spaces'.

CIL alone provides conversion variants with overflow detection.

CIL alone provides 'native' types (native int ('i'), native float ('r')).

Polymorphic (on the source) conversions

convert to native numerical type: conv.i conv.r.un conv.u (CIL)
convert to numerical type: conv.i1 conv.i2 conv.i4 conv.i8 conv.r4 conv.r8 conv.u1 conv.u2 conv.u4 conv.u8 (CIL)
convert to numerical type, with overflow: conv.ovf.i conv.ovf.i.un conv.ovf.i1 conv.ovf.i1.un conv.ovf.i2 conv.ovf.i2.un conv.ovf.i4 conv.ovf.i4.un conv.ovf.i8 conv.ovf.i8.un conv.ovf.u conv.ovf.u.un conv.ovf.u1 conv.ovf.u1.un conv.ovf.u2 conv.ovf.u2.un conv.ovf.u4 conv.ovf.u4.un conv.ovf.u8 conv.ovf.u8.un (CIL)

Coercions between values of the same size

bitcast (LLVM)

Coercions between integers and floating point of the same size: i32 to f32:

FMV.W.X (coerce i32 to f32) (RISC-V F)
f32_reinterpret_i32 (WASM)

f32 to i32:

FMV.X.W (coerce f32 to i32) (RISC-V F)
i32_reinterpret_f32 (WASM)

i64 to f64:

FMV.X.D (coerce f64 to i64) (RISC-V RV64F)
i64_reinterpret_f64 (WASM)

f64 to i64:

FMV.D.X (coerce i64 to f64) (RISC-V RV64F)
f64_reinterpret_i64 (WASM)

Conversions from signed integer to floating-point

FCVT.S.W (convert i32 to f32) (RISC-V F)
FCVT.S.L (convert i64 to f32) (not present in RV32I) (RISC-V F)
FCVT.D.W (convert i32 to f64) (RISC-V D)
FCVT.D.L (convert i64 to f64) (RISC-V D)
f{32,64}.convert_i{32,64}_s (WASM)
sitofp (LLVM)
i2f i2d l2f l2d (JVM)

Conversions from unsigned integer to floating-point

FCVT.S.WU (convert u32 to f32) (RISC-V F)
FCVT.S.LU (convert u64 to f32) (not present in RV32I) (RISC-V F)
FCVT.D.WU (convert u32 to f64) (RISC-V D)
FCVT.D.LU (convert u64 to f64) (RISC-V D)
f{32,64}.convert_i{32,64}_u (WASM)
uitofp (LLVM)

Conversions from floating point to signed integer

FCVT.W.S (convert f32 to i32) (RISC-V F)
FCVT.L.S (convert f32 to i64) (not present in RV32I) (RISC-V F)
FCVT.W.D (convert f64 to i32) (RISC-V D)
FCVT.L.D (convert f64 to i64) (RISC-V D)
i{32,64}.trunc_f{32,64}_s (WASM)
fptosi (LLVM)
f2i d2i f2l d2l (JVM)

Conversions from floating point to unsigned integer

FCVT.WU.S (convert f32 to u32) (RISC-V F)
FCVT.LU.S (convert f32 to u64) (not present in RV32I) (RISC-V F)
FCVT.WU.D (convert f64 to u32) (RISC-V D)
FCVT.LU.D (convert f64 to u64) (RISC-V D)
i{32,64}.trunc_f{32,64}_u (WASM)
fptoui (LLVM)

Integer conversions from larger bitwidth to smaller

i32.wrap_i64 (WASM)
trunc (LLVM)
l2i i2s i2b (JVM)

Integer conversions from smaller bitwidth to larger

sign extend:

i64.extend_i32_s (WASM)
sext (LLVM)
sxth sxtb (ARM)
i2l (JVM)

zero extend:

i64.extend_i32_u (WASM)
zext (LLVM)
uxth uxtb (ARM)

Floating-point conversions between different bitwidths

FCVT.D.S FCVT.S.D (RISC-V D)
f64_promote_f32 f32.demote_f64 (WASM)
fptrunc (LLVM)
fpext (LLVM)
Half Precision Floating-Point: convert.to.fp16 convert.from.fp16 (LLVM intrinsics)
experimental.constrained.fptrunc experimental.constrained.fpext (LLVM intrinsics)
f2d d2f (JVM)

Ptr/int conversions

ptrtoint (LLVM)
inttoptr (LLVM)

Misc conversions

addrspacecast (LLVM)
i2c (JVM)

Memory access

Loads and stores

RISC-V, WASM, ARM, CIL provide integer loads from 8-bit, 16-bit, and 32-bit memory locations, and when loading a quantity smaller than the destination, both signed and unsigned are provided. RISC-V supports loading into whatever size the registers are (32- or 64-bits), and ARM and CIL supports loading into 32-bit registers (CIL also supports loading 64-bit integers as 64-bits), in contrast to WASM which supports explictly loading any quantity as 32- or 64-bits. RISC-V, WASM, CIL support floating point loads and stores of either 32 or 64 bits.

RISC-V, WASM, ARM, CIL support stores to 8-bit, 16-bit, 32-bit quantities. RISC-V supports storing to 64-bits if the registers are 64-bits, whereas WASM, CIL always supports stores of 64-bits. RISC-V supports storing from whatever size the registers are (obviously), whereas WASM, CIL supports stores from 32 or 64 bit values.

LLVM provides loads and stores of all of these data types, but as far as i can tell, the type of the value being loaded must match the type variable being loaded to/stored from (as opposed to e.g. RISC-V and WASM, which provide operations like LB and i64.load8_s to load an 8-bit value into a register/variable of 64-bit type) (todo is this correct?).

ARM alone provides load multiple and store multiple.

CIL alone provides load/store 'native' integers (type 'i'), and load/store opaque references (type 'O').

Andreas Olofsson of Adapteva indicated on a blog that he left out instructions equivalent to RISC-V's LB and LH from his Epiphany ISA, and that he regretted doing so [35].

Polymorphic loads and stores

load (LLVM)
store (LLVM)

Integer loads

Load 32-bit and 64-bit integers:

LD (i64) (RISC-V RV64I), LW (i32) (RISC-V RV32I), LWU (u32) (RISC-V RV64I)
i32.load, i64.load, i64.load32_s (WASM)
ldr (ARM)
ldind.i4 ldind.u4 ldind.i8 ldind.u8 (CIL)

Load 8-bit and 16-bit integers, unsigned:

LHU (u16) (RISC-V RV32I), LBU (u8) (RISC-V RV32I)
i64.load32_u (WASM), i64.load16_u, i32.load16_u, i64.load8_u, i32.load8_u (WASM)
ldrh ldrb (ARM)
ldind.u1 ldind.u2 ldind.u4 ldind.u8 (CIL)

Load 8-bit and 16-bit integers, signed:

LB (i8), LH (i16) (RISC-V RV32I),
i64.load8_s, i32.load8_s, i64.load16_s, i32.load16_s (WASM)
ldrsh ldrsb (ARM)
ldind.i1 ldind.i2 ldind.i4 ldind.i8 (CIL)

Load multiple:

ldm (ARM)

Load native:

ldind.i (CIL)

Floating point loads

FLD (f64) (RISC-V D), FLW (f32) (RISC-V F)
f32.load, f64.load (WASM)
ldind.r4 ldind.r8 (CIL)

Reference (pointer) loads and stores

ldind.ref (CIL)
stind.ref (CIL)

Integer stores

i64, i32:

SD (i64) (RISC-V RV64I), SW (i32) (RISC-V RV32I)
i64.store, i32.store, i64.store32 (WASM)
str (ARM)
stind.i4 stind.i8 (CIL)

i16:

SH (i16) (RISC-V RV32I)
i64.store16, i32.store16 (WASM)
strh (ARM)
stind.i2 (CIL)

i8:

SB (i8) (RISC-V RV32I)
i64.store8, i32.store8 (WASM)
strb (ARM)
stind.i1 (CIL)

Store multiple:

stm (ARM)

native:

stind.i (CIL)

Floating point stores

FSD (f64) (RISC-V D), FSW (f32) (RISC-V F)
f64.store, f32.store (WASM)
stind.r4 stind.r8 (CIL)

Variable loads and stores

WASM, JVM, CIL support loads and stores to/from local variables. WASM and LuaJIT?2 also supports loads and stores to/from global variables. WASM alone also provides a function TEE_LOCAL which sets a local but then also returns its argument. JVM specializes load/store instructions by type. JVM and CIL provide short instructions to load/store the first 4 variables. LuaJIT?2 alone also provides loads and stores from/to upvalues. CIL alone provides loads of the addresses of local variables.

locals:

GET_LOCAL (WASM)
SET_LOCAL (WASM)
TEE_LOCAL (like set_local but also returns its argument) (WASM)
aload aload_0 aload_1 aload_2 aload_3 astore astore_0 astore_1 astore_2 astore_3 (JVM)
dload dload_0 dload_1 dload_2 dload_3 dstore dstore_0 dstore_1 dstore_2 dstore_3 (JVM)
fload fload_0 fload_1 fload_2 fload_3 fstore fstore_0 fstore_1 fstore_2 fstore_3 (JVM)
iload iload_0 iload_1 iload_2 iload_3 istore istore_0 istore_1 istore_2 istore_3 (JVM)
lload lload_0 lload_1 lload_2 lload_3 lstore lstore_0 lstore_1 lstore_2 lstore_3 (JVM)
ldloc ldloc.0 ldloc.1 ldloc.2 ldloc.3 ldloc.s stloc stloc.0 stloc.1 stloc.2 stloc.3 stloc.s (CIL)
ldloca ldloca.s (CIL)

globals:

GET_GLOBAL (WASM)
SET_GLOBAL (WASM)
gget gset (LuaJIT?2)

upvalues:

uget usetv usets usetn usetp uclo (LuaJIT?2)

Stack ops

WASM, JVM, CIL provides drop (called 'pop' in JVM and CIL). ARM alone provides push, pop. JVM, CIL provide dup. JVM alone provides swap.

push, pop:

push, pop (ARM)

dup:

dup dup_x1 dup_x2 dup2 dup2_x1 dup2_x2 (JVM)
dup (CIL)

drop:

DROP (WASM)
pop pop2 (JVM)
pop (CIL)

swap:

swap (JVM)

Atomics and Sync

RISC-V, LLVM, ARM, JVM, CIL provide various FENCE/sync barrier/monitor/volatile instructions/prefixes. RISC-V and LLVM provide the AMOs (Atomic Memory Operations): SWAP, ADD, AND, OR, XOR, MIN, MAX, MINU, MAXU.

RISC-V alone provides load-release, store-conditional, FENCE instructions and data.

RISC-V atomics are only included in the A extension, not the base integer instruction set. FENCE and FENCE.I are in the base instruction set.

LLVM alone provides compare-and-swap, and more AMOs: sub nand fadd fsub.

Andreas Olofsson of Adapteva said on a blog that he felt that the FENCE instruction(s) provided by RISC-V could have been left out (and were left out in his architecture, Epiphany), commenting "Benefit minimal in good SW imho" [36].

load-release and store-conditional (LR/SC)

LR.W, LR.D (RISC-V RV32A/RV64A)
SC.W, SC.D (RISC-V RV32A/RV64A)

Compare-and-swap (CAS)

cmpxchg (LLVM)

AMOs

swap:

AMOSWAP.W, AMOSWAP.D (RISC-V RV32A/RV64A)
atomicrmw xchg (LLVM)

add:

AMOADD.W, AMOADD.D (RISC-V RV32A/RV64A)
atomicrmw add (LLVM)

xor:

AMOXOR.W, AMOXOR.D (RISC-V RV32A/RV64A)
atomicrmw xor (LLVM)

and:

AMOAND.W, AMOAND.D (RISC-V RV32A/RV64A)
atomicrmw and (LLVM)

or:

AMOOR.W, AMOOR.D (RISC-V RV32A/RV64A)
atomicrmw or (LLVM)

min:

AMOMIN.W, AMOMIN.D (RISC-V RV32A/RV64A)
atomicrmw min (LLVM)

unsigned min:

AMOMINU.W, AMOMINU.D (RISC-V RV32A/RV64A)
atomicrmw umin (LLVM)

max:

AMOMAX.W, AMOMAX.D (RISC-V RV32A/RV64A)
atomicrmw max (LLVM)

unsigned max:

AMOMAXU.W, AMOMAXU.D (RISC-V RV32A/RV64A)
atomicrmw umax (LLVM)

other:

atomicrmw sub (LLVM)
atomicrmw nand (LLVM)
atomicrmw fadd (LLVM)
atomicrmw fsub (LLVM)

Fences

FENCE (Synch threads) (RISC-V RV32I)
FENCE.I (Synch Instr & Data) (RISC-V RV32I)
fence (LLVM)
isb dmb dsb (ARM)
sync: monitorenter monitorexit (JVM)
volatile. (CIL)

Control flow

Jumps (unconditional branch)

All of RISC-V, WASM, LLVM, ARM, JVM, LuaJIT?2, CIL provide unconditional jumps to an immediate/a label. RISC-V and ARM provide an unconstrained indirect branch (JALR), CIL provides an unconstrained indirect branch to a method (jmp), and WASM, LLVM, JVM, CIL provide an indirect branch to target constrained to one of an enumerated set of possible targets (BR_TABLE and switch and indirectbr and JVM's switches and CIL's switch). JVM used to provide other indirect branches (ret) but this is now deprecated, probably due to the difficulty it added to verification. LLVM and JVM provide switch-like instructions (switch, indirectbr, lookupswitch); switch and lookupswitch are like a C switch statement, and indirectbr jumps to an address in a variable; JVM's tableswitch takes an index into a list of enumerated labels to be branched to; however all of them must contain an enumerated set of labels representing all possible jump targets. RISC-V and ARM provide link register variants of these branch instructions.

Although LuaJIT?2 doesn't have an indirect branch instruction, it does have a higher-level CALL instruction, which is indirect (that is, the function to be called is taken from a register, rather than specified as an immediate in the bytecode).

Jump to immediate / direct branch

JAL (Jump and Link) (32b) (RISC-V RV32I)
BR (WASM)
BR (unconditional form) (LLVM)
B BL (ARM)
goto goto_w (JVM)
jsr jsr_w (JVM) (deprecated)
jmp (LuaJIT?2)
br br.s (CIL)

Jump to register / indirect branch

JALR (Jump & Link Register) (32b) (RISC-V RV32I): indirect branch (unconstrained)
BR_TABLE (WASM): indirect branch to one of an enumerated set of labels
switch (LLVM)
indirectbr (LLVM): indirect branch to one of an enumerated set of labels
BX BLX (ARM): indirect branch (unconstrained)
ret (JVM) (deprecated)
switch: lookupswitch tableswitch (JVM)
jmp (CIL) (unconstrained, but jumps to method)
switch (CIL)

Conditional branches

RISC-V provides ==, !=, < and >=, but not <= or >. LuaJIT?2, ARM, JVM, CIL provides these and also <=, >, ==0, !=0.

ARM and JVM also provide <0, >=0.

ARM and RISC-V provide unsigned versions of < and >=.

JVM also provides >0, <=0, and for references (pointers), ==null, !=null.

ARM alone also provides branch-if-overflow, branch-if-not-overflow, unsigned >, unsigned <=. Note that ARM's "compare and branches" really require two instructions, one to 'compare' and one to 'branch' based on the result of the compare, as seen by the state of the processor's flags.

LuaJIT?2 provides equality/inequality compares against constant (both immediate and constant table) strings, numbers, and 'primitives' (null/false/true).

WASM and LLVM instead provide separate comparison ops, and boolean conditional branch. These are similar to ARM in that there are two steps needed; but differ in that in ARM, the branch condition is expressed in the branch step, rather than in the compare step; and in ARM, the extra state of the flags is needed.

branch: eq:

BEQ (Branch =) (i32) (RISC-V RV32I)
beq (ARM)
if_acmpeq if_icmpeq (JVM)
iseqv (LuaJIT?2)
iseqs iseqn iseqp (LuaJIT?2)
beq beq.s (CIL)

ne:

BNE (Branch !=) (i32) (RISC-V RV32I)
bne (ARM)
if_acmpne if_icmpne (JVM)
isnev (LuaJIT?2)
isnes isnen isnep (LuaJIT?2)
bne.un bne.un.s (CIL)

lt:

BLT (Branch <) (i32) (RISC-V RV32I)
blt (ARM)
if_icmplt (JVM)
islt (LuaJIT?2)
blt blt.s (CIL)

ge:

BGE (Branch >=) (i32) (RISC-V RV32I)
bge (ARM)
if_icmpgt (JVM)
isge (LuaJIT?2)
bge bge.s (CIL)

le:

ble (ARM)
if_icmple (JVM)
isle (LuaJIT?2)
ble ble.s (CIL)

gt:

bgt (ARM)
if_icmpge (JVM)
isgt (LuaJIT?2)
bgt bgt.s (CIL)

unsigned lt:

BLTU (Branch < Unsigned) (i32) (RISC-V RV32I)
blo (ARM)
blt.un blt.un.s (CIL)

unsigned ge:

BGEU (Branch >= Unsigned) (i32) (RISC-V RV32I)
BHS (ARM)
bge.un bge.un.s (CIL)

unsigned le:

ble.un ble.un.s (CIL)

unsigned gt:

bgt.un bgt.un.s (CIL)

unary compare ==0 or null or false:

(beq when used as unary compare) (ARM)
ifeq ifnull (JVM)
isf isfc (is false-y) (LuaJIT?2)
brzero brzero.s brnull brnull.s brfalse brfalse.s (CIL)

unary compare !=0 or nonnull or true (also, boolean conditional branch):

(bne when used as unary compare) (ARM)
ifne ifnonnull (JVM)
ist istc (is truth-y) (LuaJIT?2)
brtrue brtrue.s brinst brinst.s (CIL)
BR_IF (WASM): "Executing the if instruction pops an i32 condition off the stack and either falls through to the next instruction or sets the program counter to after the else or end of the if."
BR (conditional form) (LLVM)

unary compare <0:

bmi (ARM)
iflt (JVM)

unary compare >=0:

bpl (ARM)
ifge (JVM)

other:

bvs bvc bhi bls (ARM)
ifgt ifle (compare vs. 0) (JVM)

Conditional non-branches

WASM and LLVM provide SELECT.

SELECT (WASM): "selects one of its first two operands based on whether its third operand is zero or not"
select (LLVM)

Subroutines

WASM, LLVM, JVM, LuaJIT?2, CIL provide CALL or INVOKE, and RETURN. WASM, JVM, CIL alone provide various forms of CALL_INDIRECT. LLVM, JVM, CIL provide exception handling.

LuaJIT?2 and CIL provides tail calls.

LuaJIT?2 alone provides special instructions for iterators, and for closures.

JVM provides various forms of invoke for object-oriented purposes. JVM specializes return by type.

LLVM, LuaLIT?2, CIL provide argument handling, sometimes variadic.

call:

CALL (WASM)
call (LLVM)
call (LuaJIT?2) (is there exception handling here?)

call with exception handling or other multiple return possibilities:

invoke, callbr (LLVM)
invokeinterface invokespecial invokestatic invokevirtual (JVM)
calls: call callvirt (CIL)

variadic argument handling and argument handling:

va_arg (LLVM)
Variable Argument Handling Intrinsics: va_start va_end va_copy (LLVM Intrinsics)
callm (and callmt) (LuaJIT?2)
varg (LuaJIT?2)
argument handling: arglist ldarg ldarg.0 ldarg.1 ldarg.2 ldarg.3 ldarg.s ldarga ldarga.s starg starg.s (CIL)

tailcall:

callt callmt (LuaJIT?2)
tail. (CIL)

iterators:

iterc itern isnext (LuaJIT?2)

return:

RETURN (WASM)
ret (LLVM)
areturn dreturn freturn ireturn lreturn return (JVM)
retm ret ret0 ret1 (LuaJIT?2)
ret (CIL)

exception handling:

resume (LLVM)
catchswitch (LLVM)
catchret (LLVM)
cleanupret (LLVM)
landingpad catchpad cleanuppad (LLVM)
llvm.eh.typeid.for llvm.eh.begincatch llvm.eh.endcatch llvm.eh.exceptionpointer llvm.eh.sjlj.setjmp llvm.eh.sjlj.longjmp llvm.eh.sjlj.lsda llvm.eh.sjlj.callsite (LLVM exception handling intrisics)
athrow (JVM)
endfault endfilter endfinally leave leave.s rethrow throw (CIL)

closures:

fnew (LuaJIT?2)

indirect branch form of call:

CALL_INDIRECT (WASM): switch and branch to result
invokedynamic (JVM)
calli (CIL)

Other structured control flow

WASM and Lua provide structured control flow. WASM alone provides blocks and if/else, and both provide loops.

BLOCK (WASM)
IF (WASM)
ELSE (WASM)
END (WASM)

Loops:

LOOP (WASM)
fori jfori forl iforl jforl iterl iiterl jiterl loop iloop jloop (LuaJIT?2)

Misc control flow

RISC-V, WASM, LLVM (and possibly others) provide ILLEGAL/UNREACHABLE instructions. RISC-V, WASM, LLVM, ARM, JVM, NOP provide NOP.

Illegal or unreachable instruction:

ILLEGAL (unnamed all-zero instruction) (RISC-V)
UNREACHABLE (WASM)
unreachable (LLVM)

NOP:

NOP (RISC-V)
NOP (WASM)
donothing (LLVM Intrinsic)
nop (ARM)
nop (JVM)
nop (CIL)

Allocation

WASM alone provides a linear memory which is growable. CIL provides heap allocation. LLVM alone provides stack allocation. LLVM alone provides various garbage collection, memory use markers, and ARC intrinsics.

See also the creation operations in in the Data Structures section, below, as this also usually causes allocation.

Linear memory sizing:

memory.size memory.grow (WASM)

heap allocation:

localloc (CIL)

stack frame:

alloca (LLVM)

gc and memory usage and ARC:

Accurate Garbage Collection Intrinsics: gcroot gcread gcwrite llvm.experimental.gc.statepoint llvm.experimental.gc.result llvm.experimental.gc.relocate (LLVM Intrinsics)
Memory Use Markers: lifetime.start lifetime.end invariant.start invariant.end launder.invariant.group strip.invariant.group (LLVM Intrinsics)
Objective-C ARC Runtime Intrinsics: objc.autorelease objc.autoreleasePoolPop objc.autoreleasePoolPush objc.autoreleaseReturnValue objc.copyWeak objc.destroyWeak objc.initWeak objc.loadWeak objc.loadWeakRetained objc.moveWeak objc.release objc.retain objc.retainAutorelease objc.retainAutoreleaseReturnValue objc.retainAutoreleasedReturnValue objc.retainBlock objc.storeStrong objc.storeWeak (LLVM Intrinsics)

Data Structures

LLVM, JVM, CIL, LuaJIT?2 provide vectors/arrays and aggregates (JVM and CIL provides aggregates via OOP) (LuaJIT?2 tables serve as both arrays and aggregates, i think).

JVM, LuaJIT?2, CIL provide a length operation.

CIL alone provides address-of-array-element and address-of-object-field accessors. LLVM alone provides a shufflevector operation.

JVM, CIL provide OOP data structures.

LLVM alone provides vector reduction ops, and masked vector ops.

LuaJIT?2 alone provides a cat (concatenate) operation (i think that LuaJIT?2 len is on strings and tables. I don't know if cat is only for strings or if it applies to tables also).

LuaJIT?2 and CIL provide strings (but not many operations specifically for them, as far as i can tell).

CIL alone provides polymorphic boxing and unboxing instructions.

Vectors and arrays

creation:

anewarray newarray multianewarray (JVM)
newarr (CIL)

length:

arraylength (JVM)
(len) (LuaJIT?2) (also below in section 'Lua len and concat', b/c polymorphic)
ldlen (CIL)

accessors:

extractelement insertelement (LLVM)
aaload aastore baload bastore caload castore daload dastore faload fastore iaload iastore laload lastore saload sastore (JVM)
ldelem ldelem.i ldelem.i1 ldelem.i2 ldelem.i4 ldelem.i8 ldelem.r4 ldelem.r8 ldelem.ref ldelem.u1 ldelem.u2 ldelem.u4 ldelem.u8 stelem stelem.i stelem.i1 stelem.i2 stelem.i4 stelem.i8 stelem.r4 stelem.r8 stelem.ref (CIL)
ldelema (CIL)

misc:

shufflevector (LLVM)

OOP

creation, memory:

new (JVM)
cpobj initobj newobj sizeof (CIL)

types:

checkcast instanceof (JVM)
castclass constrained isinst mkrefany (CIL)

accessors:

getfield getstatic putfield putstatic (JVM)
ldfld ldftn ldobj ldsfld ldtoken ldvirtftn refanytype refanyval stfld stobj stsfld (CIL)
ldflda ldsflda (CIL)

misc:

box unbox unbox.any (CIL)

Lua len and concat

Lua len and concat (in LuaJIT?2, is concat only for strings, or is it generic?):

len (LuaJIT?2)
cat (concat) (LuaJIT?2)

Tables and aggregates

Tables:

tnew tdup (LuaJIT?2)
tgetv tgets tgetb tsetv tsets tsetb tsetm (LuaJIT?2)
Aggregate Operations: extractvalue insertvalue getelementptr (LLVM)

Misc data structures

Experimental Vector Reduction Intrinsics: experimental.vector.reduce.add.* experimental.vector.reduce.fadd.* experimental.vector.reduce.mul.* experimental.vector.reduce.fmul.* experimental.vector.reduce.and.* experimental.vector.reduce.or.* experimental.vector.reduce.xor.* experimental.vector.reduce.smax.* experimental.vector.reduce.smin.* experimental.vector.reduce.umax.* experimental.vector.reduce.umin.* experimental.vector.reduce.fmax.* experimental.vector.reduce.fmin.* (LLVM)

masked vectors (LLVM Intrinsics):

Masked Vector Load and Store Intrinsics: masked.load.* masked.store.* (LLVM Intrinsics)
Masked Vector Gather and Scatter Intrinsics: masked.gather.* masked.scatter.* (LLVM Intrinsics)
Masked Vector Expanding Load and Compressing Store Intrinsics: masked.expandload.* masked.compressstore.* (LLVM Intrinsics)

Misc

Cycle counters are provided by RISC-V and LLVM (as LLVM intrinsics).

RISC-V, LLVM, ARM provide supervisor call.

RISC-V, LLVM, ARM, JVM, CIL provide breakpoint.

RISC-V and ARM provide Control-and-Status-Registers.

Memory operations such as memcpy are provided by LLVM intrinsics and CIL.

LLVM alone provides phi, and various other intrinsics.

ARM alone provides interrupt handling, and 'event' hints for power saving.

JVM alone provides a compression instruction instruction (wide), and implementation-dependent instructions.

LuaJIT?2 alone provides function header instructions.

CIL provides various misc. prefixes such as readonly, no.rangecheck.

cycle counters:

RISC-V RV32I Counters pseudo-instructions (note: these are in RV32I but not in RV32E) (RDCYCLE[H], RDTIME[H], RDINSTRRET[H])
readcyclecounter (LLVM Intrinsics)

supervisor call:

ECALL (RISC-V RV32I)
trap (LLVM)
SVC (ARM)

breakpoint:

EBREAK (RISC-V RV32I)
debugtrap (LLVM)
bkpt (ARM)
breakpoint (JVM)
break (CIL)

special registers:

RISC-V RV32I Control and Status Register: CSRRW (Atomic Read/Write CSR), CSRRS (Atomic Read and Set Bits in CSR), CSRRC (Atomic Read and Clear Bits in CSR), CSRRWI (CSRRW immediate), CSRRSI (CSRRS immediate), CSRRCI (CSRRC immediate)
mrs msr (ARM)

memory:

Standard C Library Intrinsics: memcpy memmove memset.* (LLVM Intrinsics)
cpblk initblk (CIL)

misc misc:

phi (LLVM)
Code Generator Intrinsics: returnaddress addressofreturnaddress sponentry frameaddress localescape localrecover read_register write_register stacksave stackrestore get.dynamic.area.offset prefetch pcmarker clear_cache instrprof.increment instrprof.increment.step instrprof.value.profile thread.pointer (LLVM Intrinsics)
Debugger Intrinsics: llvm.dbg.addr llvm.dbg.declare llvm.dbg.value (LLVM Intrinsics)
Trampoline Intrinsics: init.trampoline adjust.trampoline (LLVM Intrinsics)
Stack Map Intrinsics: llvm.experimental.stackmap llvm.experimental.patchpoint.* (LLVM Intrinsics)
Element Wise Atomic Memory Intrinsics: memcpy.element.unordered.atomic memmove.element.unordered.atomic memset.element.unordered.atomic (LLVM Intrinsics)
General Intrinsics: var.annotation ptr.annotation.* annotation.* codeview.annotation stackprotector stackguard objectsize expect assume ssa_copy type.test type.checked.load experimental.deoptimize experimental.guard experimental.widenable.condition load.relative sideeffect is.constant.* (LLVM Intrinsics)
interrupts: cpsid cpsie (ARM)
sev wfe wfi (ARM)
wide (JVM)
impdep1 impdep2 (JVM)
Function headers: funcf ifuncf jfuncf funcv ifuncv jfuncv funcc funccw func (LuaJIT?2)
no.typecheck/rangecheck/nullcheck (CIL)
readonly. unaligned. (CIL)

Continued at Target Languages Concordance part III

proj-plbook-plChTargetLanguagesConcordance2

Concordance of instructions supported by two platforms

Arithmetic

Constant loads

PC-relative instructions

Add, subtract, multiply, divide

Moves (copies)

Shifts

Logical

Compares

Misc integer arith

Floating-point-specific

Fused multiply-add

Signs

Rounding

Classify

Floating-point exceptions

Pow

Misc floating-point

Conversions

Integer conversions from 8-bit or 16-bit to larger

Memory access

Loads and stores

Variable loads and stores

Stack ops

Atomics and Sync

load-release and store-conditional (LR/SC)

Compare-and-swap (CAS)

AMOs

Control flow

Jumps (unconditional branch)

Jump to immediate / direct branch

Jump to register / indirect branch

Jump to register / indirect branch

Conditional branches

Conditional non-branches

Subroutines

Allocation

Data Structures

OOP

Misc

Concordance of instructions supported by only one platform

Arithmetic

Constant loads

Add, subtract, multiply, divide

Integer addition

Integer multiplication

Shifts

Logical

Compares

Floating-point comparisons

Misc integer arith

Floating-point-specific

Fused multiply-add

Signs

Conversions

Memory access

Loads and stores

Variable loads and stores

Stack ops

Atomics and Sync

Control flow

Conditional branches

Subroutines

Other structured control flow

Data Structures

Misc

Full concordance

Arithmetic

Constant loads

Moves (copies)

Add, subtract, multiply, divide

Polymorphic arithmetic

Integer addition

Integer subtraction

Integer multiplication

Integer division and remainder

Floating point add, sub, mul, div

Shifts

Logical