Continued from Target Languages Concordance part II

Instruction lists from each platform

When instruction counts are given, we count mnemonics. Sometimes similar mnemonics are grouped together and counted as one.

RISC-V instruction list

RV32I (base 32-bit integer set; 47 instructions):

constant loads: (pseudoinstruction for loading constants using addi or ori) lui auipc
add, subtract: add addi sub
shifts: sll slli srl srli sra srai
logical: and andi or ori xor xori
compares: slt slti sltu sltiu
loads and stores: lw lh lhu lb lbu sw sh sb
atomics and sync: fence fence.i
jumps: jal jalr
conditional branches: beq bne blt bge bltu bgeu
misc control flow: (unnamed all-zero illegal pseudoinstruction) (nop pseudoinstruction)
misc: ecall ebreak
misc Control and Status Register: CSRRW (Atomic Read/Write CSR), CSRRS (Atomic Read and Set Bits in CSR), CSRRC (Atomic Read and Clear Bits in CSR), CSRRWI (CSRRW immediate), CSRRSI (CSRRS immediate), CSRRCI (CSRRC immediate)
misc (RV32I Counters pseudo-instructions (note: these are in RV32I but not in RV32E): RDCYCLE RDCYCLEH, RDTIME RDTIMEH, RDINSTRRET RDINSTRRETH)

RV64I (base 64-bit integer set; 59 instructions) (note: RV64I includes everything in RV32I, but adapted for 64-bit, plus these) (12 new instructions and 3 new encodings of old instructions) (note: the instructions with 'W' at the end of their name are 32-bit versions of the instructions, since the un-suffixed instructions inherited from RV32I change to 64-bit in RV64I; the way i think about this is that un-suffixed instructions operate with whatever bitwidth the registers are, which is 32-bits in RV32I and 64-bits in RV64I, unless they are specifically made to operate on a certain bitwidth, in which case this is indicated with a suffix to the instruction name):

add, subtract: addw addiw subw
shifts: sllw slliw srlw srliw sraw sraiw
loads and stores: ld sd lwu
(there are also new encodings of SLLI SRLI SRAI)

RV32M (multiply extension; 8 instructions):

mul mulh mulhu mulhsu div divu rem remu

RV64M (multiply extension; 13 instructions) (note: RV64M includes everything in RV32M, but adapted for 64-bit, plus these) (5 new instructions):

mulw divw divuw remw remuw

RV32A (32-bit atomics extension; 11 instructions):

lr.w sc.w amoswap.w amoadd.w amoxor.w amoand.w amoor.w amomin.w amomax.w amominu.w amomaxu.w

RV64A (64-bit atomics extension; 22 instructions) (note: RV64A includes everything in RV32A, plus these) (11 new instructions):

new instructions with names the same as RV32A instruction names but with .d suffixes instead of .w, and with similiar functionality but 64-bit

RV32F (32-bit/single-precision floating point extension for RV32I: 26 instructions):

Add, subtract, multiply, divide: fadd.s fsub.s fmul.s fdiv.s
compares: feq.s flt.s fle.s
floating-point specific:
- sqrt: fsqrt.s
- fused multiply-add: fmadd.s fmsub.s fnmadd.s fnmsub.s
- signs: fsgnj.s fsgnjn.s fsgnjx.s
- min, max: fmin.s fmax.s
- classify: fclass.s
- rounding: (pseudoinstructions: frrm fsrm fsrmi)
- exceptions: (pseudoinstructions: frflags fsflags fsflagsi)
- misc fp: (pseudoinstructions: frcsr fscsr)
conversions:
- fmv.w.x fmv.x.w
- fcvt.s.w fcvt.s.wu fcvt.w.s fcvt.wu.s
loads and stores: flw fsw

RV64F (32-bit/single-precision floating point extension for RV64I: 30 instructions) (note: RV64F includes everything in RV32F, plus these) (4 new instructions):

conversions: fcvt.s.l fcvt.s.lu fcvt.l.s fcvt.lu.s

RV32D (64-bit/double-precision floating point extension for RV64I: 26 instructions):

new instructions with names the same as most of the RV32F instruction names but with .d suffixes instead of .s, and with similiar functionality but 64-bit; EXCEPT:
there are no new fmv* instructions here (because the 64-bit double-precision floating point won't fit in 32-bit integer registers)
there are two new instructions for converting between 32-bit/single-precision floating point and 64-bit/double-precision floating point: fcvt.s.d fcvt.d.s

RV64D (64-bit/double-precision floating point extension for RV64I: 32 instructions) (note: RV64D includes everything in RC32D, plus these) (6 new instructions):

fmv.c.d fmv.d.x fcvt.d.l fcvt.d.lu fcvt.l.d fcvt.lu.d

(so, RV64IMAFD, otherwise known as RV64G, contains 156 instructions in total)

WASM instruction list (172 instructions)

control:

branches: br br_if br_table
subroutines: return call call_indirect
other structured control: block loop if else end
control misc: unreachable nop

parametric:

drop select

constant loads:

i32.const i64.const f32.const f64.const

loads and stores:

loads: i32.load i64.load f32.load f64.load i32.load8_s i32.load8_u i32.load16_s i32.load16_u i64.load8_s i64.load8_u i64.load16_s i64.load16_u i64.load32_s i64.load32_u
stores: i32.store i64.store f32.store f64.store i32.store8 i32.store16 i64.store8 i64.store16 i64.store32
variables: local.get local.set local.tee global.get global.set

comparisons:

integer comparisons: i32.eqz i32.eq i32.ne i32.lt_s i32.lt_u i32.le_s i32.le_u i32.gt_s i32.gt_u i32.ge_s i32.ge_u, and corresponding instructions for i64
floating point comparisons: f32.eq f32.ne f32.lt f32.le f32.gt f32.ge, and corresponding instructions for f64

arithmetic:

add, subtract, multiply, divide:
- i32.add i32.sub i32.mul i32.div_s i32.div_u i32.rem_s i32.rem_u, and corresponding instructions for i64
- f32.add f32.sub f32.mul f32.div, and corresponding instructions for f64
logical: i32.and i32.or i32.xor, and corresponding instructions for each i64
shifts: i32.shl i32.shr_s i32.shr_u i32.rotl i32.rotr, and corresponding instructions for i64
misc bitwise arithmetic: i32.clz i32.ctz i32.popcnt, and corresponding instructions for i64
floating-point specific: f32.abs f32.neg f32.ceil f32.floor f32.trunc f32.nearest f32.sqrt f32.min f32.max f32.copysign, and corresponding instructions for f64

conversions:

f{32,64}.convert_i{32,64}_{s,u}, i{32,64}.trunc_f{32,64}_{s,u}
i32.reinterpret_f32 i64.reinterpret_f64 f32.reinterpret_i32 f64.reinterpret_i64
i32.wrap_i64 i64.extend_i32_s i64.extend_i32_u
f32.demote_f64 f64.promote_f32

allocation: memory.size memory.grow

LLVM instruction list

LLVM instructions (64 instructions):

Terminator Instructions: ret br switch indirectbr invoke callbr resume catchswitch catchret cleanupret unreachable
Unary Operations: fneg
Binary Operations: add fadd sub fsub mul fmul udiv sdiv fdiv urem srem frem
Bitwise Binary Operations: shl lshr ashr and or xor
Vector Operations: extractelement insertelement shufflevector
Aggregate Operations: extractvalue insertvalue
Memory Access and Addressing Operations: alloca load store fence cmpxchg atomicrmw getelementptr
Conversion Operations: trunc zext sext fptrunc fpext fptoui fptosi uitofp sitofp ptrtoint inttoptr bitcast addrspacecast
Other: icmp fcmp phi select call va_arg landingpad catchpad cleanuppad

LLVM intrinsics (185 intrinsics, if the families denoted by the '*'s below are each grouped together and counted as one):

Variable Argument Handling Intrinsics: va_start va_end va_copy
Accurate Garbage Collection Intrinsics: gcroot gcread gcwrite llvm.experimental.gc.statepoint llvm.experimental.gc.result llvm.experimental.gc.relocate
Code Generator Intrinsics: returnaddress addressofreturnaddress sponentry frameaddress localescape localrecover read_register write_register stacksave stackrestore get.dynamic.area.offset prefetch pcmarker readcyclecounter clear_cache instrprof.increment instrprof.increment.step instrprof.value.profile thread.pointer
Standard C Library Intrinsics: memcpy memmove memset.* sqrt.* powi.* sin.* cos.* pow.* exp.* exp2.* log.* log10.* log2.* fma.* fabs.* minnum.* maxnum.* minimum.* maximum.* copysign.* floor.* ceil.* trunc.* rint.* nearbyint.* round.*
Bit Manipulation Intrinsics: bitreverse.* bswap.* ctpop.* ctlz.* cttz.* fshl.* fshr.*
Arithmetic with Overflow Intrinsics: sadd.with.overflow.* uadd.with.overflow.* ssub.with.overflow.* usub.with.overflow.* smul.with.overflow.* umul.with.overflow.*
Saturation Arithmetic Intrinsics: sadd.sat.* uadd.sat.* ssub.sat.* usub.sat.*
Fixed Point Arithmetic Intrinsics: smul.fix.* umul.fix.*
Specialised Arithmetic Intrinsics: canonicalize.* fmuladd.*
Experimental Vector Reduction Intrinsics: experimental.vector.reduce.add.* experimental.vector.reduce.fadd.* experimental.vector.reduce.mul.* experimental.vector.reduce.fmul.* experimental.vector.reduce.and.* experimental.vector.reduce.or.* experimental.vector.reduce.xor.* experimental.vector.reduce.smax.* experimental.vector.reduce.smin.* experimental.vector.reduce.umax.* experimental.vector.reduce.umin.* experimental.vector.reduce.fmax.* experimental.vector.reduce.fmin.*
Half Precision Floating-Point Intrinsics: convert.to.fp16 convert.from.fp16
Debugger Intrinsics: llvm.dbg.addr llvm.dbg.declare llvm.dbg.value
Exception Handling Intrinsics: llvm.eh.typeid.for llvm.eh.begincatch llvm.eh.endcatch llvm.eh.exceptionpointer llvm.eh.sjlj.setjmp llvm.eh.sjlj.longjmp llvm.eh.sjlj.lsda llvm.eh.sjlj.callsite
Trampoline Intrinsics: init.trampoline adjust.trampoline
Masked Vector Load and Store Intrinsics: masked.load.* masked.store.*
Masked Vector Gather and Scatter Intrinsics: masked.gather.* masked.scatter.*
Masked Vector Expanding Load and Compressing Store Intrinsics: masked.expandload.* masked.compressstore.*
Memory Use Markers: lifetime.start lifetime.end invariant.start invariant.end launder.invariant.group strip.invariant.group
Constrained Floating-Point Intrinsics: experimental.constrained.fadd experimental.constrained.fsub experimental.constrained.fmul experimental.constrained.fdiv experimental.constrained.frem experimental.constrained.fma experimental.constrained.fptrunc experimental.constrained.fpext
Constrained libm-equivalent Intrinsics: experimental.constrained.sqrt experimental.constrained.pow experimental.constrained.powi experimental.constrained.sin experimental.constrained.cos experimental.constrained.exp experimental.constrained.exp2 experimental.constrained.log experimental.constrained.log10 experimental.constrained.log2 experimental.constrained.rint experimental.constrained.nearbyint experimental.constrained.maxnum experimental.constrained.minnum experimental.constrained.ceil experimental.constrained.floor experimental.constrained.round experimental.constrained.trunc
General Intrinsics: var.annotation ptr.annotation.* annotation.* codeview.annotation trap debugtrap stackprotector stackguard objectsize expect assume ssa_copy type.test type.checked.load donothing experimental.deoptimize experimental.guard experimental.widenable.condition load.relative sideeffect is.constant.*
Stack Map Intrinsics: llvm.experimental.stackmap llvm.experimental.patchpoint.*
Element Wise Atomic Memory Intrinsics: memcpy.element.unordered.atomic memmove.element.unordered.atomic memset.element.unordered.atomic
Objective-C ARC Runtime Intrinsics: objc.autorelease objc.autoreleasePoolPop objc.autoreleasePoolPush objc.autoreleaseReturnValue objc.copyWeak objc.destroyWeak objc.initWeak objc.loadWeak objc.loadWeakRetained objc.moveWeak objc.release objc.retain objc.retainAutorelease objc.retainAutoreleaseReturnValue objc.retainAutoreleasedReturnValue objc.retainBlock objc.storeStrong objc.storeWeak

ARM Cortex M0 instruction list (59 instructions)

Moves: mov movs
add, subtract, multiply: add adds adcs adr sub subs sbcs rsbs muls
compare: cmp cmn
logical: ands eors orrs bics mvns tst
shift: lsls lsrs asrs
rotate: rors
load, store: ldr ldrh ldrb ldrsh ldrsb ldm str strh strb stm
push, pop: push pop
branch: b bx blx
branch (32-bit Thumb instruction): bl
extend: sxth sxtb uxth uxtb
reverse: rev rev16 revsh
state change: svc cpsid cpsie bkpt
state change: mrs msr (32-bit Thumb instructions)
hint: sev wfe wfi nop
barriers (32-bit Thumb instructions): isb dmb dsb

All instructions are 16-bit Thumb (Thumb-1) instructions except for the 32-bit Thumb instructions (Thumb-2) indicated.

These instructions are: "all of the 16-bit Thumb instructions from ARMv7-M excluding CBZ, CBNZ and IT" plus "the 32-bit Thumb instructions BL, DMB, DSB, ISB, MRS and MSR" [1].

Note: the ARM instruction mnemonics listed in [2] often have the letter 'S' at the end of them; for instance, 'ANDS' is the mnemonic for logical AND. In addition, sometimes there are two variants of an instruction, one with an 'S' and one without, for instance, MOVS and MOV. In these cases, the 'S' suffix means that the flags are updated. This 'S' suffix causes the instruction listing that we use here (from [3]) to differ slightly from (a) the instruction listing in https://en.wikipedia.org/wiki/ARM_Cortex-M#Instruction_sets and (b) the instruction names in the headings (but not the bodies) of sections within [4], both of which use mnemonics without this 'S' suffix (and combine mnemonics that differ only in the inclusion of this 'S').

Note: the set of mnemonics found at [5] is identical to that found at [6] except that the latter includes YIELD. [7] also includes YIELD. We do not include YIELD here because it is not in [8], and because [9] indicates in a footnote that it executes as NOP.

Note: as of this writing, the set of mnemonics found at [10] includes just one 'CPS' whereas the other sources have both CPSID and CPSIE. Here we include both CPSID and CPSIE.

JVM instruction list (206 instructions)

loading constants:
- aconst_null
- bipush
- dconst_0 dconst_1
- fconst_0 fconst_1 fconst_2
- iconst_m1 iconst_0 iconst_1 iconst_2 iconst_3 iconst_4 iconst_5
- lconst_0 lconst_1
- ldc ldc_w ldc2_w
- sipush
addition, subtraction, multiplication, division:
- dadd fadd iadd ladd
- dsub fsub isub lsub
- dmul fmul imul lmul
- ddiv fdiv idiv ldiv
- drem frem irem lrem
- dneg fneg ineg lneg
comparisons:
- dcmpg dcmpl
- fcmpg fcmpl
- lcmp
arrays:
- anewarray newarray multianewarray arraylength
- aaload aastore
- baload bastore
- caload castore
- daload dastore
- faload fastore
- iaload iastore
- laload lastore
- saload sastore
OOP:
- new checkcast instanceof
- getfield getstatic putfield putstatic
- invokedynamic invokeinterface invokespecial invokestatic invokevirtual
- areturn dreturn freturn ireturn lreturn return
variable loads/stores:
- aload aload_0 aload_1 aload_2 aload_3 astore astore_0 astore_1 astore_2 astore_3
- dload dload_0 dload_1 dload_2 dload_3 dstore dstore_0 dstore_1 dstore_2 dstore_3
- fload fload_0 fload_1 fload_2 fload_3 fstore fstore_0 fstore_1 fstore_2 fstore_3
- iload iload_0 iload_1 iload_2 iload_3 istore istore_0 istore_1 istore_2 istore_3
- lload lload_0 lload_1 lload_2 lload_3 lstore lstore_0 lstore_1 lstore_2 lstore_3
variable indexed operations: iinc
exception handling: athrow
stack ops:
- dup dup_x1 dup_x2 dup2 dup2_x1 dup2_x2
- pop pop2
- swap
conversions:
- d2f d2i d2l
- f2d f2i f2l
- i2b i2c i2d i2f i2l i2s
- l2d l2f l2i
jumps: goto goto_w jsr jsr_w ret
switch: lookupswitch tableswitch
compare-and-branch:
- if_acmpeq if_acmpne if_icmpeq if_icmpge if_icmpgt if_icmple if_icmplt if_icmpne
- ifeq ifge ifgt ifle iflt ifne
- ifnonnull ifnull
bitwise logical:
- iand ior ixor
- land lor lxor
shifts:
- ishl ishr iushr
- lshl lshr lushr
sync: monitorenter monitorexit
misc: breakpoint impdep1 impdep2 nop wide

note: jsr, jsr_w, ret have effectively been deprecated; see [11]

JVM instruction list discussion:

Many of the JVM's instructions are organized around the types reference (also called address ('a'), or objectref), array, byte, char, double, float, integer, long, short.

The array type is a data structure. There are instructions to create new arrays and to get their length. For each of the, there are instructions to load items of that type from arrays and to store them into arrays.

There are a few types for which few instructions are provided that are specific to that type. These are byte, char, and short. Each of these have instructions to load and store the type from/into an array. Each of these have instructions to convert an integer into a value of the type. Byte and short have instructions to push an immediate constant of that type onto the stack.

This leaves what i'll refer to as the 'main types': reference, double, float, integer, long.

The main numerical types can be grouped into floating-point (double, float), and integral (integer, long). Out of the main numerical types, integer is the primary workhorse.

The reference type is a pointer. It has a distinguished element 'null'. This is the only type that can be thrown. Like the main numerical types, there are instructions to load references from/store them to variables, and return them from methods. There are compare-and-branch instructions that branch based on the equality, or lack of equality, of two references, and that branch based on whether or not a reference is null.

Each of the main numerical types has instructions to push constants 0 and 1 in that type onto the stack. Floats also have an instruction to push 2, and integers also have instructions to push -1, 3, 4, and 5. Each of the main numerical types has instructions for addition, subtraction, multiplication, division, remainder. Except for integers, they each have compare instructions, which push a bool or integer to indicate the result of the comparison (presumably integers don't need this because presumably the use of these comparison results is as inputs to compare-and-branch instructions, which take integer arguments directly). The floating-point (double and float) types have two variants of these compare instructions with different behavior only on NaNs?. Each of the main numerical types has instructions to load from/store to local variables. Each of the main numerical types has instructions to convert to each of the other main numerical types, and integers also can be converted to bytes, chars, and shorts. The integral types has bitwise logical and bitshift operators.

Integers have compare-and-branch instructions that branch based on the inequality of two integers; ==, !=, <=, <, >, >=. There are also compare-and-branch instructions that branch based on the inequality of one integer and zero (again, ==, !=, <=, <, >, >=). There is an increment operator that operates directly on integer variables.

There are instructions for loading constants from the constant table. There are a number of polymorphic stack instructions; various 'dup's, pops, and swap. Instructions for control flow include jumps and switches. There are many OOP instructions for working with objects, and for invoking and returning from methods. Finally, there are also synchronization and miscellaneous operations.

LuaJIT2 instruction list (94 instructions)

Comparison: islt isge isle isgt iseqv isnev iseqs isnes iseqn isnen iseqp isnep Unary Test and Copy: istc isfc ist isf Unary: mov not unm len binary: addvn subvn mulvn divvn modvn addnv subnt mulnv divnv modnv addvv subvv mulvv divvv modvv pow cat constant: kstr kdata kshort knum kpri knil Upvalue and Function: uget usetv usets usetn usetp uclo fnew table: tnew tdup gget gset tgetv tgets tgetb tsetv tsets tsetb tsetm Calls and Vararg Handling: callm call callmt callt iterc itern varg isnext Returns: retm ret ret0 ret1 Loops and branches: fori jfori forl iforl jforl iterl iiterl jiterl loop iloop jloop jmp Function headers: funcf ifuncf jfuncf funcv ifuncv jfuncv funcc funccw func

Discussion:

As an optimized instruction set, LuaJIT?2 includes various 'immediate' instructions, such as TGETB and TSETB, which index into a table data structure with an 8-bit immediate constant integer index.

The LuaJIT?2 instruction set is based upon the Lua instruction set, version 5.1 of which is documented at http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf .

CIL instruction list (229 instructions)

addition, subtraction, multiplication, division:

add add.ovf add.ovf.un div div.un mul mul.ovf mul.ovf.un neg rem rem.un sub sub.ovf sub.ovf.un

logical:

and not or xor

function calling:

calls: call calli callvirt
argument handling: arglist ldarg ldarg.0 ldarg.1 ldarg.2 ldarg.3 ldarg.s ldarga ldarga.s starg starg.s
returns: ret
tailcall: tail.

conditional branches:

beq beq.s bge bge.s bge.un bge.un.s bgt bgt.s bgt.un bgt.un.s ble ble.s ble.un ble.un.s blt blt.s blt.un blt.un.s bne.un bne.un.s brfalse brfalse.s brinst brinst.s brnull brnull.s brtrue brtrue.s brzero brzero.s

misc:

break no.typecheck/rangecheck/nullcheck nop readonly. unaligned. volatile.

oop:

box castclass constrained cpobj initobj isinst ldfld ldflda ldftn ldobj ldsfld ldsflda ldstr ldtoken ldvirtftn mkrefany newobj refanytype refanyval sizeof stfld stobj stsfld unbox unbox.any

arrays:

ldelem ldelem.i ldelem.i1 ldelem.i2 ldelem.i4 ldelem.i8 ldelem.r4 ldelem.r8 ldelem.ref ldelem.u1 ldelem.u2 ldelem.u4 ldelem.u8 ldelema ldlen newarr stelem stelem.i stelem.i1 stelem.i2 stelem.i4 stelem.i8 stelem.r4 stelem.r8 stelem.ref

jump:

br br.s jmp switch

comparisons:

ceq cgt cgt.un clt clt.un

floating-point specific:

ckfinite

conversions:

conv.i conv.i1 conv.i2 conv.i4 conv.i8 conv.ovf.i conv.ovf.i.un conv.ovf.i1 conv.ovf.i1.un conv.ovf.i2 conv.ovf.i2.un conv.ovf.i4 conv.ovf.i4.un conv.ovf.i8 conv.ovf.i8.un conv.ovf.u conv.ovf.u.un conv.ovf.u1 conv.ovf.u1.un conv.ovf.u2 conv.ovf.u2.un conv.ovf.u4 conv.ovf.u4.un conv.ovf.u8 conv.ovf.u8.un conv.r.un conv.r4 conv.r8 conv.u conv.u1 conv.u2 conv.u4 conv.u8

misc memory ops:

cpblk initblk

stack ops:

dup pop

exception handling:

endfault endfilter endfinally leave leave.s rethrow throw

constant loads:

ldc.i4 ldc.i4.0 ldc.i4.1 ldc.i4.2 ldc.i4.3 ldc.i4.4 ldc.i4.5 ldc.i4.6 ldc.i4.7 ldc.i4.8 ldc.i4.m1 ldc.i4.M1 ldc.i4.s ldc.i8 ldc.r4 ldc.r8 ldnull

loads and stores:

loads: ldind.i ldind.i1 ldind.i2 ldind.i4 ldind.i8 ldind.r4 ldind.r8 ldind.ref ldind.u1 ldind.u2 ldind.u4 ldind.u8
stores: stind.i stind.i1 stind.i2 stind.i4 stind.i8 stind.r4 stind.r8 stind.ref
variables: ldloc ldloc.0 ldloc.1 ldloc.2 ldloc.3 ldloc.s ldloca ldloca.s stloc stloc.0 stloc.1 stloc.2 stloc.3 stloc.s

allocation:

localloc

shifts:

shl shr shr.un

Misc tables

List of conditionals by type, including both compares and compare-and-branches

In most of the previous sections, we separated comparison operations (with no control flow) from conditional compare-and-branch operations. Since some platforms have a greater variety of compare instructions and fewer compare-and-branch instructions, and others have fewer compare instructions and a greater variety of compare-and-branch instructions, this makes it more difficult to see the popularity of different comparison relations.

So, in this section we group instructions by which relation they test, regardless of whether they are comparison operations or compare-and-branch operations.

For each grouping, we give a count of the platforms which offer an instruction in that group. Since ARM doesn't have floating-point instructions and LuaJIT?2 doesn't have integer instructions, these counts are never more than 6. If the count is 5, we indicate which platforms are missing (which will always be one of ARM, LuaJIT?2, and then one other platform).

Equality

Equals

integer equals (6):

branch int: BEQ (Branch =) (i32) (RISC-V RV32I)
compare int: i32.eq i64.eq (WASM)
compare int: icmp eq (LLVM)
branch int: beq (ARM)
branch int: if_acmpeq if_icmpeq (JVM)
branch int: beq beq.s (polymorphic) (CIL)
compare int: ceq (polymorphic) (CIL)

float equals (6):

compare float: FEQ.S, FEQ.D (f32, f64) (RISC-V F): dest = (f32 == f32)
compare float: f32.eq, f64.eq (WASM)
compare float: fcmp eq and ordered (LLVM)
compare float: fcmp eq or unordered (LLVM)
compare float: trinary: dcmpg dcmpl fcmpg fcmpl (JVM)
branch float: iseqv iseqn (LuaJIT?2)
branch float: beq beq.s (CIL)
compare float: ceq (polymorphic) (CIL)

note: RISC-V, JVM's branches don't work on floats. LuaJIT?'s branches are only on floats (and pris and strings)

Not-equals

integer not-equals (6):

branch int: BNE (Branch !=) (i32) (RISC-V RV32I)
cmp int: i32.ne i64.ne (WASM)
cmp int: icmp ne (LLVM)
branch int: bne (ARM)
branch int: if_acmpne if_icmpne (JVM)
branch int: bne.un bne.un.s (CIL)

float not-equals (5) (missing RISC-V, ARM):

cmp float: f32.ne f64.ne (WASM)
cmp float: fcmp neq and ordered, fcmp neq or unordered (LLVM)
compare float: trinary: dcmpg dcmpl fcmpg fcmpl (JVM)
branch float: isnev (LuaJIT?2)
branch float: isnes isnen isnep (LuaJIT?2)
branch float: bne.un bne.un.s (CIL)

Comparison Inequalities

Less-than

integer less-than (6):

branch: BLT (Branch <) (i32) (RISC-V RV32I)
compare: SLT (Set <) (i32) (RISC-V RV32I)
compare: SLTI (Set < Immediate) (i32) (RISC-V RV32I)
compare: i32_lt_s, i64_lt_s (WASM)
compare: icmp slt (LLVM)
branch: blt (ARM)
branch: if_icmplt (JVM)
branch: blt blt.s (CIL)
compare: clt (CIL)

float less-than (6):

compare: FLT.S, FLT.D (f32, f64) (RISC-V F): dest = (f32 < f32)
compare: f32.lt, f64.lt (WASM)
compare: fcmp lt and ordered (LLVM)
compare: fcmp lt or unordered (LLVM)
compare: trinary: dcmpg dcmpl fcmpg fcmpl (JVM)
branch: islt (LuaJIT?2)
branch: blt blt.s (CIL)
compare: clt (CIL)

unsigned integer less-than (5) (missing JVM, LuaJIT?2):

branch: BLTU (Branch < Unsigned) (i32) (RISC-V RV32I)
compare: SLTU (Set < Unsigned) (u32) (RISC-V RV32I)
compare: SLTIU (Set < Imm Unsigned) (u32) (RISC-V RV32I)
compare: i32_lt_u, i64_lt_u (WASM)
compare: icmp ult (LLVM)
branch: blo (ARM)
branch: blt.un blt.un.s (CIL)
compare: clt.un (CIL)

Less-than-or-equal-to

integer less-than-or-equal-to (5) (missing RISC-V, LuaJIT?2):

compare: i32_le_s, i64_le_s (WASM)
compare: icmp sle (LLVM)
branch: ble (ARM)
branch: if_icmple (JVM)
branch: ble ble.s (CIL)

float less-than-or-equal-to (5) (missing ARM, JVM):

compare: FLE.S, FLE.D (f32, f64) (RISC-V F): dest = (f32 <= f32)
compare: f32.le, f64.le (WASM)
compare: fcmp le and ordered (LLVM)
compare: fcmp le or unordered (LLVM)
branch: isle (LuaJIT?2)
branch: ble ble.s (CIL)

unsigned integer less-than-or-equal-to (4):

compare: i32_le_u, i64_le_u (WASM)
compare: icmp ule (LLVM)
branch: bls (ARM)
branch: ble.un ble.un.s (CIL)

Greater-than-or-equal-to

integer greater-than-or-equal-to (6):

branch: BGE (Branch >=) (i32) (RISC-V RV32I)
compare: i32_ge_s, i64_ge_s (WASM)
compare: icmp sge (LLVM)
branch: bge (ARM)
branch: if_icmpgt (JVM)
branch: bge bge.s (CIL)

float greater-than-or-equal-to (4):

compare: f32.ge f64.ge (WASM)
compare: fcmp ge and ordered, fcmp ge or unordered (LLVM)
branch: isge (LuaJIT?2)
branch: bge bge.s (CIL)

unsigned integer greater-than-or-equal-to (5) (missing JVM, LuaJIT?2):

branch: BGEU (Branch >= Unsigned) (i32) (RISC-V RV32I)
compare: i32_ge_u, i64_ge_u (WASM)
icmp uge (LLVM)
branch: BHS (ARM)
branch: bge.un bge.un.s (CIL)

Greater-than

integer greater-than (5) (missing RISC-V, LuaJIT?2):

compare: i32_gt_s, i64_gt_s (WASM)
compare: icmp sgt (LLVM)
branch: bgt (ARM)
branch: if_icmpge (JVM)
branch: bgt bgt.s (CIL)
compare: cgt (CIL)

float greater-than (5) (missing RISC-V, ARM):

compare: f32.gt f64.gt (WASM)
compare: fcmp gt and ordered, fcmp gt or unordered (LLVM)
compare: trinary: dcmpg dcmpl fcmpg fcmpl (JVM)
branch: isgt (LuaJIT?2)
branch: bgt bgt.s (CIL)
compare: cgt (CIL)

unsigned integer greater-than (4):

compare: i32_gt_u, i64_gt_u (WASM)
compare: icmp ugt (LLVM)
branch: bhi (ARM)
branch: bgt.un bgt.un.s (CIL)
compare: cgt.un (CIL)

Compares against zero

Equals-zero

unary equals-zero, or null, or false:

integer (4) (or, 6 if RISC-V and LLVM are counted):

compare: i32.eqz i64.eqz (WASM)
branch: (beq when used as unary compare) (ARM)
branch: ifeq ifnull (JVM)
branch: brzero brzero.s brnull brnull.s brfalse brfalse.s (CIL)

notes:

This comparision is not needed in RISC-V because RISC-V can always use immediate zero
This comparision is not needed in LLVM because LLVM can always just compare to a constant.

float equals-zero, or null, or false (1) (or, 2 if LLVM is counted):

branch: isf isfc (is false-y) (LuaJIT?2)

note: This comparision is not needed in LLVM because LLVM can always just compare to a constant.

Not-equals-zero

unary not-equals-zero or non-null or true (also, boolean conditional branch) (5) (or, 6 if RISC-V is counted): integer:

branch: BR (conditional form) (LLVM)
branch: BR_IF (WASM): "Executing the if instruction pops an i32 condition off the stack and either falls through to the next instruction or sets the program counter to after the else or end of the if."
branch: (bne when used as unary compare) (ARM)
branch: ifne ifnonnull (JVM)
branch: brtrue brtrue.s brinst brinst.s (CIL)

notes:

This comparision is not needed in RISC-V because RISC-V can always use immediate zero
Although LLVM can always compare to a constant, this is present in LLVM presumaly because it is LLVM's primary branching construct

float not-equals-zero (1) (or, 2 if LLVM is counted):

branch: ist istc (is truth-y) (LuaJIT?2)

note: This comparision is not needed in LLVM because LLVM can always just compare to a constant.

Other comparison operations

unary integer compare <0:

bmi (ARM)
iflt (JVM)

unary integer compare >=0:

bpl (ARM)
ifge (JVM)

other integer:

bvs bvc (ARM)
ifgt ifle (compare vs. 0) (JVM)

other float compare:

fcmp false (LLVM)
fcmp true (LLVM)
fcmp neither QNAN (LLVM)
fcmp either QNAN (LLVM)

Trinary integer compare:

lcmp (JVM)

Summary of the platform counts in the above table

Integer

6 equals
6 not-equals
6 less-than
5 less-than-or-equal-to (missing RISC-V, LuaJIT?2)
6 greater-than-or-equal-to
5 greater-than (missing RISC-V, LuaJIT?2)
4 (or 6) equals-zero
5 (or 6) not-equals-zero

Floating-point

6 equals
5 not-equals (missing RISC-V, ARM)
6 less-than
5 less-than-or-equal-to (missing ARM, JVM)
4 greater-than-or-equal-to
5 greater-than (missing RISC-V, ARM)
1 (or 2) equals-zero
1 (or 2) not-equals-zero

Sum over both of integer, floating-point

12 equals
11 not-equals
12 less-than
10 less-than-or-equal-to
10 greater-than-or-equal-to
10 greater-than
5 (or 8) equals-zero
6 (or 8) not-equals-zero

Count of platforms with either integer or float of operation

7 equals
7 not-equals
7 less-than
7 less-than-or-equal-to
7 greater-than-or-equal-to
6 greater-than
5 (or 7) equals-zero
6 (or 7) not-equals-zero

Unsigned integer

5 less-than (missing JVM, LuaJIT?2)
4 less-than-or-equal-to
5 greater-than-or-equal-to (missing JVM, LuaJIT?2)
4 greater-than

'Other comparison operators' omitted because they are not very popular.

Calling conventions

Here we only look at the two processor ISAs; the higher-level platforms are at a higher level of abstraction that do not require caller/callee-saving of registers.

Floating-point and vector registers will not be discussed; therefore references below to 'registers' may be read as 'integer registers'.

Registers listed with a special purpose (for example, a link/return address register or stack pointer), or argument-passing/return-value-passing registers, are not counted as either caller-saved or callee-saved (other sources might list the link register and argument-passing and return-value-passing as caller-saved, and the stack pointer as callee-saved, for example).

RISC-V

32 registers total.

Arguments are passed in 8 registers, x10-x17. Return values are passed in 2 registers, x10-x11.

7 registers are caller-saved: x5, x6, x7 and x28-x31. 11 registers are callee-saved: x9, x18-x27. The return address is in register x1. The stack pointer is in register x2. The remaining 3 registers are: zero register x0, global pointer x3, thread pointer x4, frame pointer x8.

References:

https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf Chapter 20, table 20.1

ARM Cortex M0

16 registers total.

Arguments are passed in 4 registers, r0 thru r3. Return values are passed in 2 registers, r0-r1.

1 register is caller-saved: r12. 7 registers are callee-saved: r4-r8 and r10-r11. r13 is the stack pointer, r14 is the link register, r15 is the program counter, r9 is the platform-specific 'platform register'.

References:

Others

For comparion, some other potentially relevant calling conventions include 64-bit ARM (AArch64), Microsoft x64, System V AMD64 , and x86 (32-bit) cdecl:

ARM 64-bit (32 registers total) passes arguments and return values in 8 registers, 0-7. 7 registers are caller-saved (9-15). 10 registers are callee-saved (19-28). The remaining registers are: the "Indirect result location register" (8), the "Intra-Procedure-call scratch register" (16 and 17), the platform-specific register (18), the frame pointer (29), the link register (30), and the stack pointer (SP).

The Microsoft x64 convention on x86_64 (16 registers total) passes arguments in 4 registers (RCX, RDX, R8, R9), and returns values in 1 (separate) register (RAX). 2 registers are caller-saved (R10, R11). 8 registers are callee-saved (RBX, RBP, RDI, RSI, R12, R13, R14, and R15). RSP is the stack pointer.

The System V AMD64 on x86_64 (16 registers total) convention passes arguments in 6 registers (RSI, RDX, RCX, R8, R9, R10), and returns arguments in 2 registers (RAX, RDX; RDX is used for values greater than 64 bits). 2 registers (RDI, R11) are caller-saved. 6 registers are callee-saved (RBX, RBP, and R12–R15). RSP is the stack pointer.

The cdecl calling convention on 32-bit x86 (8 registers total) passes no arguments in registers, and returns arguments in 1 register (EAX). 2 registers are caller-saved (ECX, and EDX). 4 registers are callee-saved (EBX, EBP, ESI, EDI). ESP is the stack pointer.

Summary table

==
total registers	argument registers	caller-saved	callee-saved	other	name
8	1	2	4	1 (SP)	x86 32-bit cdecl
16	5	2	8	1 (SP)	x86_64 microsoft
16	7	2	6	1 (SP)	x86_64 System V AMD64
16	4	1	7	4 (SP, LR, ..)	ARM 32-bit
32	8	7	10	7 (SP, LR, FP, PC, ..)	ARM 64-bit
32	8	7	11	6 (SP, LR, FP, ..)	RISC-V
==

References

ARM64: https://en.wikipedia.org/wiki/Calling_convention#ARM_(A64)
Microsoft x64: https://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions
System V AMD64: https://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions, https://wiki.osdev.org/System_V_ABI
cdecl: [12]

Discussion

From most popular to least

In the following, when we use terms like 'every platform' or 'whenever..', we are implicitly referring only to the 7 platforms in this study.

Every platform has facilities for:

specifying constants/literals
addition, subtraction, multiplication
a jump (sometimes called "unconditional branch") to a statically known immediate or label
an indirect jump (to a location determined at runtime)
some form of branch-if-not-equal-to-zero/branch-if-true
the following comparisons, on both integers and floats, if supported: equality, less-than
the following comparisons, on integers if an integer platform, or on floats if not: not-equals, greater-than-or-equal-to, equals-zero, not-equals-zero

The most commonform of arithmetic is signed 32-bit integer, however one high-level platform (LuaJIT?2) only supports 64-bit floating point. 5 of the 7 platforms support all four combinations of 32- and 64-bit, integer and floating point. Some of the integer platforms support unsigned integer operations throughout, but others only support unsigned operations in some places.

Whenever integer arithmetic is supported, 3 bitwise shifts and three bitwise logical operations are supported:

bitwise shifts: left shift, right shift signed (shift right arithmetic), right shift unsigned (shift right logical)
bitwise logical: and, or, xor

Whenever floating-point arithmetic is supported, addition, subtraction, multiplication, division are all found.

All platforms that support both integer and floating point support conversions between signed integer and floating-point, and all platforms that support both 32-bit and 64-bit floating point support conversions between floating-point quantities of different bitwidths.

NOP is in all platforms except for LuaJIT?.

Most of the platforms also support:

conversions between 32- and 64-bit
integer division and remainder
floating point negation, remainder/mod
integer and floating-point loads and stores
FENCE or sync barrier or monitor or volatile instruction (or prefix)
compare instructions: integer less-than, floating-point equality and less-than
branch on equality, inequality, less-than-or-equals, less-than, greater-than-or-equal, greater-than, false/equals-zero/is-null
'switch'-statement like indirect branching
subroutine support with some form of CALL, RETURN, and some way to do indirect CALLs
breakpoints
data structures: vectors/arrays, aggregates
some unsigned operations

All register machines have MOV and all stack machines have DROP (also called POP). Both hardware processor ISAs and none of the others have:

unconstrained low-level indirect jumps, and none of the others do, although CIL has an unconstrained high-level indirect jump
link registers.
special registers

Both hardware processor ISAs have supervisor call.

Instructions or intrinsics for each of the following is provided by three platforms in this study:

Arithmetic:

addition and subtraction with overflow or carry, signed 32-bit (note: but only as intrinsics in LLVM)
integer negation or similar
logical NOT (either bitwise or boolean)
integer compares: equality, greater-than, unsigned greater-than
floating-point compares: less-than-or-equal-to, greater-than
floating-point specific: sqrt, copysign, min, max (note: but only as intrinsics in LLVM)
conversions:
- from larger integers to 8-bit and 16-bit integers
- from unsigned 32-bit integers to 64-bit integers
- coercive casting between integers and floating-points

Memory access:

variable loads and stores

Control flow:

exception handling
variable-length argument lists (variadic functions)
an illegal/unreachable instruction

Allocation:

various allocation instructions

Data structures:

length

Instructions or intrinsics for each of the following is provided by two platforms in this study:

Arithmetic:

instructions to load various higher-level data structure constants such as strings
constant tables/constant pools
PC-relative arithmetic
multiplication with overflow (signed and unsigned, 32- and 64-bit)
unsigned, 64-bit variants of addition, subtraction, multiplication with overflow
right rotate
integer compares: inequality, greater-than-or-equal-to, less-than-or-equal-to,
floating point compares: inequality, greater-than-or-equal-to
clz ctz popcnt
byteswaps
integer conversions from 8-bit or 16-bit to larger
floating point:
- abs
- ceil, floor, trunc, nearest
- rounding modes and exception modes
- classify
- fused multiply-add
- pow

Memory access:

global variable loads/stores
short instructions to load/store the first 4 variables

Stack ops:

Atomics and Sync:

AMOs: SWAP, ADD, AND, OR, XOR, MIN, MAX, MINU, MAXU
either load-release/store-conditional, or compare-and-swap

Control flow:

jump with link register
unconstrained indirect branch
branches: <0, >=0, unsigned <, unsigned >=
select
tail calls
structured control flow loops
invoke instructions for object-oriented calling.

Data structures:

OOP
strings

Misc:

cycle counters
memory ops such as memcpy

Arithmetic

Every platform has ways to specify constants/literals. Most platforms specify these only as immediates or directly in the IR, but two of them also have constant pools.

The most common number type if 32-bit signed integers, but 64-bit floats are also popular, and most platforms offer all of 32- and 64- bit, integer and float. All platforms offer some way to convert between the various types that they have. Some platforms offer coercive casting between types.

Every platform has ways to add, subtract, and multiply. Most platforms also offer division.

Every platform with integers offers at least 6 bitwise operations: shift left, shift right unsigned, shift right signed, and, or, xor.

Most platforms offer some unsigned operations and some offer all unsigned operations.

Most platforms offer compares of integer less-than, floating-point equality, and floating-point less-than (WASM and LLVM use these for branching, see below, but RISC-V and CIL also offer them). Some platforms offer more compares.

Some platforms offer additional arithmetic operations:

integer: negation, right rotate, bit count ops (clz ctz popcnt), PC-relative arithmetic
floating-point: negation, sqrt, copysign, max, min, abs, rounding ops (ceil, floor, trunc, nearest), classify, fused multiply-add, pow

Two floating-point platforms offer rounding and exception modes.

Memory access

Most platforms have some form of integer and floating-point loads and stores. Some platforms also support local variables loads/stores.

Some platforms have memory allocation instructions, but they differ widely.

Two platforms have global variable loads/stores.

Register and stack ops

All register machines have copy/MOV and all stack machines have DROP (also called POP). Most stack machines have DUP.

Atomics and sync

Most platforms have some form of atomic or sync functionality, but they differ widely.

A few platforms offer AMOs (SWAP, ADD, AND, OR, XOR, MIN, MAX, MINU, MAXU) and either load-release/store-conditional, or compare-and-swap.

Control flow

Every platform have unconditional jump. Every platform has some form of branch-if-true, or branch-if-not-equal-to-zero, or branch-if-non-null.

Every platform has some form of unconditional indirect jump. Most platforms offer some form of 'switch'-statement-like indirect jump. The two hardware processor ISAs have unconstrained low-level indirect jumps, and the other platforms require all low-level indirect jumps to specify a list of all possible jump targets, although some of them offer higher-level unconstrained indirect jumps (to functions or methods).

Most platforms have branch on equality, inequality, less-than-or-equals, less-than, greater-than-or-equal, greater-than, false/equals-zero/is-null. Some platforms have branch on <0, >=0, unsigned <, unsigned >=.

Many platforms have compare-and-branch but some platforms require two instructions for these; either (a compare instruction which places a boolean in a register and then a branch instruction (WASM and LLVM do this)), or (an arithmetic instruction which sets a flag register and then a branch instruction which reads those flags (ARM does this)). Some platforms (WASM and LLVM, the same ones which split compares and boolean-conditional-branches) offer a non-branching SELECT operation.

Most platforms have subroutine support, with some form of CALL, RETURN, and some way to do indirect CALLs.

Many platforms have restrictions on jumping across function boundaries.

Some platforms have:

exception handling
variable-length argument lists (variadic functions)
an illegal/unreachable instruction
tail calls
structured control flow loops

Comparisons

This section considers both (non-control-flow) compares and compare-and-branches together, to try and get insight into which types of comparison are the most popular.

Every platform in this study offers both of the following comparisons on integers if they support integers, and on floats if they support floats:

equals
less-than

Every integer platform has facilities for all of following comparisons on integers, and the only non-integer platform (LuaJIT?2) also supports all of these on floats:

not-equals
greater-than-or-equal-to
equals-zero
not-equals-zero (all platforms have branching instructions for this one; see previous section)

You might be wondering what's so special about greater-than-or-equal-to; i actually think it might just be on this list due to chance. You see, RISC-V has integer compare-and-branch operations for beq, bne, blt, bge (equals, not-equals, less-than, greater-than-or-equal-to). As noted above, every platform in this study has less-than on integers if they support integers, and on floating-point if they support floats. Greater-than-or-equal-to is not like that though; only 4 of the six floating-point-capable platforms have a floating-point greater-than-or-equal-to primitive. For example, in floating-point, RISC-V has equals, less-than, and less-than-or-equal-to. I don't know why they chose to have greater-than-or-equal-to for integers and less-than-or-equal-to for floats; for integers, it's really just a convention, because by reversing the integer arguments in the RISC-V instruction you can get greater-than and less-than-or-equal-to in addition to less-than and greater-than-or-equal-to. But my conclusion is that greater-than-or-equal-to is probably not that special.

If we had a much larger sample size (many more than 7 platforms) maybe we could get to the bottom of this, but from what we have here all that i can really conclude is that:

equality seems to be especially popular (everyone has it)
not-equals-zero seems to be especially popular (everyone has a branching instruction based on this or something like it)
everyone has at least some of less-than/less-than-or-equal-to-zero/greater-than-or-equal-to-zero/greater-than. Out of these, less-than seems to be more popular than the others (everyone has it), and greater-than MIGHT be slightly less popular than the others.
in addition to equals, which is very popular, non-equals seems to be pretty popular
in addition to not-equal-to-zero, which is very popular, equals-zero seems to be pretty popular, but only for integers

Data structures

Most platforms have vectors/arrays, and aggregates.

Some platforms have a 'length' instruction.

A few platforms have OOP, and strings.

Misc

Most platforms have NOP and breakpoint instructions.

The two hardware processor ISA platforms have link registers, special registers, and supervisor call.

A few platforms have cycle counters, and memory operations such as memcpy.

Related work

RISC-V Geneology

RISC-V Geneology surveys 18 instruction set architectures prior to RISC-V, "chosen primarily from earlier UC Berkeley RISC architectures and major proprietary RISC instruction sets".

They present a matrix of which instructions in each instruction set correspond to which RISC-V instructions, allowing me to present a count of the analogs of each instruction below.

The paper lists 98 instructions which have analogs that "appear in at least three" prior ISAs, which is (with counts of their prior analogs in parentheses) (grouped by function by me):

immediates: LUI(8)
control flow: JAL(14) JALR(16) BEQ(17) BNE(17) BLT(13) BGE(13) BLTU(8) BGEU(8)
loads and stores: LB(11) LH(13) LW(18) LBU(11) LHU(12) SB(14) SH(14) SW(18) FLW(15) FSW(15) FLD(14) FSD(14)
integer add/sub/mul/div: ADD(18) ADDI(17) SUB(18) MUL(12) MULH(7) MULHU(9) DIV(8) DIVU(7) REMU(7)
shifts: SLTI(4) SLTIU(3) SLLI(12) SRLI(11) SRAI(12) SLL(17) SLT(4) SLTU(3) SRL(16) SRA(17)
logical: XORI(14) ORI(14) ANDI(15) XOR(18) OR(18) AND(18)
concurrency: FENCE(7) FENCE.I(4) LR.W(8) SC.W(8) AMOSWAP.W(3) AMOADD.W(3)
misc: SCALL(16) SBREAK(9) RDCYCLE(3) RDTIME(3)
floating point add/sub/mul/div: FADD.S(16) FSUB.S(15) FMUL.S(16) FDIV.S(16) FADD.D(16) FSUB.D(16) FMUL.D(16) FDIV.D(16)
floating point fused multiply/add: FMADD.S(5) FMSUB.S(3) FNMSUB.S(4) FNMADD.S(4) FMADD.D(4) FMSUB.D(3) FNMSUB.D(4) FNMADD.D(4)
floating point sqrt: FSQRT.S(10) FSQRT.D(10)
floating point signs: FSGNJ.S(3) FSGNJN.S(3) FSGNJ.D(3) FSGNJN.D(2)
floating point conversions: FCVT.W.S(14) FCVT.S.W(13) FCVT.S.D(11) FCVT.D.S(10) FCVT.W.D(14) FCVT.D.W(13) FMV.X.S(8) FMV.S.X(6)
floating point comparisons: FEQ.S(11) FLT.S(11) FLE.S(9) FEQ.D(11) FLT.D(11) FLE.D(9)
floating point misc: RCSR(8) FRRM(6) FRFLAGS(8) FSRMI(7) FSFLAGSI(8)

(note: the paper lists RDINSTRET as having at least 3 analogs, but in their matrix it has only 1; perhaps they made a mistake with that one; so the list above only has 97 instructions, not 98).

The instructions with >=3 analogs but <8 are:

integer add/sub/mul/div: MULH(7) DIVU(7) REMU(7)
shifts: SLTI(4) SLTIU(3) SLT(4) SLTU(3)
concurrency: FENCE(7) FENCE.I(4) AMOSWAP.W(3) AMOADD.W(3)
misc: RDCYCLE(3) RDTIME(3)
floating point fused multiply/add: FMADD.S(5) FMSUB.S(3) FNMSUB.S(4) FNMADD.S(4) FMADD.D(4) FMSUB.D(3) FNMSUB.D(4) FNMADD.D(4)
floating point signs: FSGNJ.S(3) FSGNJN.S(3) FSGNJ.D(3) FSGNJN.D(2)
floating point conversions: FMV.S.X(6)
floating point misc: FRRM(6) FSRMI(7)

Here is the subset of the instructions with at least 8 prior analogs found:

immediates: LUI (8)
control flow: JAL(14) JALR(16) BEQ(17) BNE(17) BLT(13) BGE(13) BLTU(8) BGEU(8)
loads and stores: LB(11) LH(13) LW(18) LBU(11) LHU(12) SB(14) SH(14) SW(18) FLW(15) FSW(15) FLD(14) FSD(14)
integer add/sub/mul/div: ADD(18) ADDI(17) SUB(18) MUL(12) MULHU(9) DIV(8)
shifts: SLLI(12) SRLI(11) SRAI(12) SLL(17) SRL(16) SRA(17)
logical: XORI(14) ORI(14) ANDI(15) XOR(18) OR(18) AND(18)
concurrency: LR.W(8) SC.W(8)
misc: SCALL(16) SBREAK(9)
floating point add/sub/mul/div: FADD.S(16) FSUB.S(15) FMUL.S(16) FDIV.S(16) FADD.D(16) FSUB.D(16) FMUL.D(16) FDIV.D(16)
floating point sqrt: FSQRT.S(10) FSQRT.D(10)
floating point conversions: FCVT.W.S(14) FCVT.S.W(13) FCVT.S.D(11) FCVT.D.S(10) FCVT.W.D(14) FCVT.D.W(13) FMV.X.S(8)
floating point comparisons: FEQ.S(11) FLT.S(11) FLE.S(9) FEQ.D(11) FLT.D(11) FLE.D(9)
floating point misc: RCSR(8) FRFLAGS(8) FSFLAGSI(8)

Note that now we have lost the unsigned div/rem instructions, all concurrency except LR/SC, cycle counts, fused multiply/add, and floating point sign instructions (perhaps some of the prior ISAs had instructions like FABS and FNEG, which are only assembler pseudoinstructions in RISC and hence not listed here, to replace the RISC-V sign instructions).

The instructions with >=8 analogs but <11 are:

immediates: LUI (8)
control flow: BLTU(8) BGEU(8)
integer add/sub/mul/div: MULHU(9) DIV(8)
concurrency: LR.W(8) SC.W(8)
misc: SBREAK(9)
floating point sqrt: FSQRT.S(10) FSQRT.D(10)
floating point conversions: FCVT.D.S(10) FMV.X.S(8)
floating point comparisons: FLE.S(9) FLE.D(9)
floating point misc: RCSR(8) FRFLAGS(8) FSFLAGSI(8)

Here is the subset of the instructions with at least 11 prior analogs found:

control flow: JAL(14) JALR(16) BEQ(17) BNE(17) BLT(13) BGE(13)
loads and stores: LB(11) LH(13) LW(18) LBU(11) LHU(12) SB(14) SH(14) SW(18) FLW(15) FSW(15) FLD(14) FSD(14)
integer add/sub/mul/div: ADD(18) ADDI(17) SUB(18) MUL(12)
shifts: SLLI(12) SRLI(11) SRAI(12) SLL(17) SRL(16) SRA(17)
logical: XORI(14) ORI(14) ANDI(15) XOR(18) OR(18) AND(18)
misc: SCALL(16)
floating point add/sub/mul/div: FADD.S(16) FSUB.S(15) FMUL.S(16) FDIV.S(16) FADD.D(16) FSUB.D(16) FMUL.D(16) FDIV.D(16)
floating point conversions: FCVT.W.S(14) FCVT.S.W(13) FCVT.S.D(11) FCVT.W.D(14) FCVT.D.W(13)
floating point comparisons: FEQ.S(11) FLT.S(11) FEQ.D(11) FLT.D(11)

Note that now we have lost LUI, unsigned compare-and-branch instructions, the high-bits-mul instruction, division, all concurrency (perhaps some of the prior ISAs had other concurrency mechanisms though), breakpoints, sqrt, floating point LE comparison, and floating-point exception and rounding mode instructions.

The instructions with >=11 analogs but <16 are:

control flow: JAL(14) BLT(13) BGE(13)
loads and stores: LB(11) LH(13) LBU(11) LHU(12) SB(14) SH(14) FLW(15) FSW(15) FLD(14) FSD(14)
integer add/sub/mul/div: MUL(12)
shifts: SLLI(12) SRLI(11) SRAI(12)
logical: XORI(14) ORI(14) ANDI(15)
floating point add/sub/mul/div: FSUB.S(15)
floating point conversions: FCVT.W.S(14) FCVT.S.W(13) FCVT.S.D(11) FCVT.W.D(14) FCVT.D.W(13)
floating point comparisons: FEQ.S(11) FLT.S(11) FEQ.D(11) FLT.D(11)

Here is the subset of the instructions with at least 16 prior analogs found:

control flow: JALR(16) BEQ(17) BNE(17)
loads and stores: LW(18) SW(18)
integer add/sub/mul/div: ADD(18) ADDI(17) SUB(18)
shifts: SLL(17) SRL(16) SRA(17)
logical: XOR(18) OR(18) AND(18)
misc: SCALL(16)
floating point add/sub/mul/div: FADD.S(16) FMUL.S(16) FDIV.S(16) FADD.D(16) FSUB.D(16) FMUL.D(16) FDIV.D(16)

Note that now we have lost JAL, compare-and-branch instructions except for (not)-equality predicates, loads and stores of all word sizes except for 32-bit, multiplication, immediate shift and logical (in fact, all immediates except for ADDI; presumably all ISAs provide some way to load immediates, however), and all of the floating-point loads/stores, conversions, and comparisons.

Let's take stock of what remains; these appear to be the common core instructions. I've left out the floating point add/sub/mul/div, because these appear to me to be pretty useless without any floating-point loads/stores, conversion, and comparisons; presumably older ISAs had some way to use them however.

no load immediate instructions (presumably every ISA provides some way to load immediates, however; RISC-V could use ADDI in conjunction with its zero register)
jumps (only indirect jump provided, but presumably direct jump can be synthesized on ISAs missing it): JALR
branches on (in)equality: BEQ BNE (presumably all ISAs also provide some way to compare less-than or less-than-or-equal, however)
32-bit loads/stores: LW SW
add/sub: ADD SUB
immediate addition (also used in RISC-V for an assembler MV pseudoinstruction): ADDI
shifts and logical: SLL SRL SRA XOR OR AND
system calls: SCALL

This is 15 instructions.

TODO

goals for this document:

Revise the 'concordance' of RISC-V, WASM, ARM Cortex M0, LLVM, JVM, LuaJIT?2 instructions into a readable list of instructions, grouped by type of purpose, referencing their analogs in each of those systems (hence the word 'concordance'), with a description of the semantics of each instruction (and how the various systems differ).

todo:

provide more details for the most common instructions ("provide description of the semantics of each instruction (and how the various systems differ)."
reformat (right now the newlines in my source document aren't showing). Switch to asciidoc.
add TOC
correct the little TODOs throughout
re-read, edit
ask relevant communities (RISC-V, WASM, etc) for (a) resolution of my questions (grep for question marks), (b) errors, (c) other comments, (d) pointers to writeups that are similar, or that express opinions about which instructions could be omitted, and about how each ISA should be/could have been changed. Maybe put on Gitlab. Add CC BY license. Maybe add to Wikipedia.
i'd like to add ARM64, QBE, MIR, ColdFire? (representing 68k), and the subset of x86 matching other instructions here

proj-plbook-plChTargetLanguagesConcordance3