Continued from Target Languages Concordance part II
Instruction lists from each platform
When instruction counts are given, we count mnemonics. Sometimes similar mnemonics are grouped together and counted as one.
RISC-V instruction list
RV32I (base 32-bit integer set; 47 instructions):
- constant loads: (pseudoinstruction for loading constants using addi or ori) lui auipc
- add, subtract: add addi sub
- shifts: sll slli srl srli sra srai
- logical: and andi or ori xor xori
- compares: slt slti sltu sltiu
- loads and stores: lw lh lhu lb lbu sw sh sb
- atomics and sync: fence fence.i
- jumps: jal jalr
- conditional branches: beq bne blt bge bltu bgeu
- misc control flow: (unnamed all-zero illegal pseudoinstruction) (nop pseudoinstruction)
- misc: ecall ebreak
- misc Control and Status Register: CSRRW (Atomic Read/Write CSR), CSRRS (Atomic Read and Set Bits in CSR), CSRRC (Atomic Read and Clear Bits in CSR), CSRRWI (CSRRW immediate), CSRRSI (CSRRS immediate), CSRRCI (CSRRC immediate)
- misc (RV32I Counters pseudo-instructions (note: these are in RV32I but not in RV32E): RDCYCLE RDCYCLEH, RDTIME RDTIMEH, RDINSTRRET RDINSTRRETH)
RV64I (base 64-bit integer set; 59 instructions) (note: RV64I includes everything in RV32I, but adapted for 64-bit, plus these) (12 new instructions and 3 new encodings of old instructions) (note: the instructions with 'W' at the end of their name are 32-bit versions of the instructions, since the un-suffixed instructions inherited from RV32I change to 64-bit in RV64I; the way i think about this is that un-suffixed instructions operate with whatever bitwidth the registers are, which is 32-bits in RV32I and 64-bits in RV64I, unless they are specifically made to operate on a certain bitwidth, in which case this is indicated with a suffix to the instruction name):
- add, subtract: addw addiw subw
- shifts: sllw slliw srlw srliw sraw sraiw
- loads and stores: ld sd lwu
- (there are also new encodings of SLLI SRLI SRAI)
RV32M (multiply extension; 8 instructions):
- mul mulh mulhu mulhsu div divu rem remu
RV64M (multiply extension; 13 instructions) (note: RV64M includes everything in RV32M, but adapted for 64-bit, plus these) (5 new instructions):
- mulw divw divuw remw remuw
RV32A (32-bit atomics extension; 11 instructions):
- lr.w sc.w amoswap.w amoadd.w amoxor.w amoand.w amoor.w amomin.w amomax.w amominu.w amomaxu.w
RV64A (64-bit atomics extension; 22 instructions) (note: RV64A includes everything in RV32A, plus these) (11 new instructions):
- new instructions with names the same as RV32A instruction names but with .d suffixes instead of .w, and with similiar functionality but 64-bit
RV32F (32-bit/single-precision floating point extension for RV32I: 26 instructions):
- Add, subtract, multiply, divide: fadd.s fsub.s fmul.s fdiv.s
- compares: feq.s flt.s fle.s
- floating-point specific:
- sqrt: fsqrt.s
- fused multiply-add: fmadd.s fmsub.s fnmadd.s fnmsub.s
- signs: fsgnj.s fsgnjn.s fsgnjx.s
- min, max: fmin.s fmax.s
- classify: fclass.s
- rounding: (pseudoinstructions: frrm fsrm fsrmi)
- exceptions: (pseudoinstructions: frflags fsflags fsflagsi)
- misc fp: (pseudoinstructions: frcsr fscsr)
- conversions:
- fmv.w.x fmv.x.w
- fcvt.s.w fcvt.s.wu fcvt.w.s fcvt.wu.s
- loads and stores: flw fsw
RV64F (32-bit/single-precision floating point extension for RV64I: 30 instructions) (note: RV64F includes everything in RV32F, plus these) (4 new instructions):
- conversions: fcvt.s.l fcvt.s.lu fcvt.l.s fcvt.lu.s
RV32D (64-bit/double-precision floating point extension for RV64I: 26 instructions):
- new instructions with names the same as most of the RV32F instruction names but with .d suffixes instead of .s, and with similiar functionality but 64-bit; EXCEPT:
- there are no new fmv* instructions here (because the 64-bit double-precision floating point won't fit in 32-bit integer registers)
- there are two new instructions for converting between 32-bit/single-precision floating point and 64-bit/double-precision floating point: fcvt.s.d fcvt.d.s
RV64D (64-bit/double-precision floating point extension for RV64I: 32 instructions) (note: RV64D includes everything in RC32D, plus these) (6 new instructions):
- fmv.c.d fmv.d.x fcvt.d.l fcvt.d.lu fcvt.l.d fcvt.lu.d
(so, RV64IMAFD, otherwise known as RV64G, contains 156 instructions in total)
WASM instruction list (172 instructions)
control:
- branches: br br_if br_table
- subroutines: return call call_indirect
- other structured control: block loop if else end
- control misc: unreachable nop
parametric:
constant loads:
- i32.const i64.const f32.const f64.const
loads and stores:
- loads: i32.load i64.load f32.load f64.load i32.load8_s i32.load8_u i32.load16_s i32.load16_u i64.load8_s i64.load8_u i64.load16_s i64.load16_u i64.load32_s i64.load32_u
- stores: i32.store i64.store f32.store f64.store i32.store8 i32.store16 i64.store8 i64.store16 i64.store32
- variables: local.get local.set local.tee global.get global.set
comparisons:
- integer comparisons: i32.eqz i32.eq i32.ne i32.lt_s i32.lt_u i32.le_s i32.le_u i32.gt_s i32.gt_u i32.ge_s i32.ge_u, and corresponding instructions for i64
- floating point comparisons: f32.eq f32.ne f32.lt f32.le f32.gt f32.ge, and corresponding instructions for f64
arithmetic:
- add, subtract, multiply, divide:
- i32.add i32.sub i32.mul i32.div_s i32.div_u i32.rem_s i32.rem_u, and corresponding instructions for i64
- f32.add f32.sub f32.mul f32.div, and corresponding instructions for f64
- logical: i32.and i32.or i32.xor, and corresponding instructions for each i64
- shifts: i32.shl i32.shr_s i32.shr_u i32.rotl i32.rotr, and corresponding instructions for i64
- misc bitwise arithmetic: i32.clz i32.ctz i32.popcnt, and corresponding instructions for i64
- floating-point specific: f32.abs f32.neg f32.ceil f32.floor f32.trunc f32.nearest f32.sqrt f32.min f32.max f32.copysign, and corresponding instructions for f64
conversions:
- f{32,64}.convert_i{32,64}_{s,u}, i{32,64}.trunc_f{32,64}_{s,u}
- i32.reinterpret_f32 i64.reinterpret_f64 f32.reinterpret_i32 f64.reinterpret_i64
- i32.wrap_i64 i64.extend_i32_s i64.extend_i32_u
- f32.demote_f64 f64.promote_f32
allocation: memory.size memory.grow
LLVM instruction list
LLVM instructions (64 instructions):
- Terminator Instructions: ret br switch indirectbr invoke callbr resume catchswitch catchret cleanupret unreachable
- Unary Operations: fneg
- Binary Operations: add fadd sub fsub mul fmul udiv sdiv fdiv urem srem frem
- Bitwise Binary Operations: shl lshr ashr and or xor
- Vector Operations: extractelement insertelement shufflevector
- Aggregate Operations: extractvalue insertvalue
- Memory Access and Addressing Operations: alloca load store fence cmpxchg atomicrmw getelementptr
- Conversion Operations: trunc zext sext fptrunc fpext fptoui fptosi uitofp sitofp ptrtoint inttoptr bitcast addrspacecast
- Other: icmp fcmp phi select call va_arg landingpad catchpad cleanuppad
LLVM intrinsics (185 intrinsics, if the families denoted by the '*'s below are each grouped together and counted as one):
- Variable Argument Handling Intrinsics: va_start va_end va_copy
- Accurate Garbage Collection Intrinsics: gcroot gcread gcwrite llvm.experimental.gc.statepoint llvm.experimental.gc.result llvm.experimental.gc.relocate
- Code Generator Intrinsics: returnaddress addressofreturnaddress sponentry frameaddress localescape localrecover read_register write_register stacksave stackrestore get.dynamic.area.offset prefetch pcmarker readcyclecounter clear_cache instrprof.increment instrprof.increment.step instrprof.value.profile thread.pointer
- Standard C Library Intrinsics: memcpy memmove memset.* sqrt.* powi.* sin.* cos.* pow.* exp.* exp2.* log.* log10.* log2.* fma.* fabs.* minnum.* maxnum.* minimum.* maximum.* copysign.* floor.* ceil.* trunc.* rint.* nearbyint.* round.*
- Bit Manipulation Intrinsics: bitreverse.* bswap.* ctpop.* ctlz.* cttz.* fshl.* fshr.*
- Arithmetic with Overflow Intrinsics: sadd.with.overflow.* uadd.with.overflow.* ssub.with.overflow.* usub.with.overflow.* smul.with.overflow.* umul.with.overflow.*
- Saturation Arithmetic Intrinsics: sadd.sat.* uadd.sat.* ssub.sat.* usub.sat.*
- Fixed Point Arithmetic Intrinsics: smul.fix.* umul.fix.*
- Specialised Arithmetic Intrinsics: canonicalize.* fmuladd.*
- Experimental Vector Reduction Intrinsics: experimental.vector.reduce.add.* experimental.vector.reduce.fadd.* experimental.vector.reduce.mul.* experimental.vector.reduce.fmul.* experimental.vector.reduce.and.* experimental.vector.reduce.or.* experimental.vector.reduce.xor.* experimental.vector.reduce.smax.* experimental.vector.reduce.smin.* experimental.vector.reduce.umax.* experimental.vector.reduce.umin.* experimental.vector.reduce.fmax.* experimental.vector.reduce.fmin.*
- Half Precision Floating-Point Intrinsics: convert.to.fp16 convert.from.fp16
- Debugger Intrinsics: llvm.dbg.addr llvm.dbg.declare llvm.dbg.value
- Exception Handling Intrinsics: llvm.eh.typeid.for llvm.eh.begincatch llvm.eh.endcatch llvm.eh.exceptionpointer llvm.eh.sjlj.setjmp llvm.eh.sjlj.longjmp llvm.eh.sjlj.lsda llvm.eh.sjlj.callsite
- Trampoline Intrinsics: init.trampoline adjust.trampoline
- Masked Vector Load and Store Intrinsics: masked.load.* masked.store.*
- Masked Vector Gather and Scatter Intrinsics: masked.gather.* masked.scatter.*
- Masked Vector Expanding Load and Compressing Store Intrinsics: masked.expandload.* masked.compressstore.*
- Memory Use Markers: lifetime.start lifetime.end invariant.start invariant.end launder.invariant.group strip.invariant.group
- Constrained Floating-Point Intrinsics: experimental.constrained.fadd experimental.constrained.fsub experimental.constrained.fmul experimental.constrained.fdiv experimental.constrained.frem experimental.constrained.fma experimental.constrained.fptrunc experimental.constrained.fpext
- Constrained libm-equivalent Intrinsics: experimental.constrained.sqrt experimental.constrained.pow experimental.constrained.powi experimental.constrained.sin experimental.constrained.cos experimental.constrained.exp experimental.constrained.exp2 experimental.constrained.log experimental.constrained.log10 experimental.constrained.log2 experimental.constrained.rint experimental.constrained.nearbyint experimental.constrained.maxnum experimental.constrained.minnum experimental.constrained.ceil experimental.constrained.floor experimental.constrained.round experimental.constrained.trunc
- General Intrinsics: var.annotation ptr.annotation.* annotation.* codeview.annotation trap debugtrap stackprotector stackguard objectsize expect assume ssa_copy type.test type.checked.load donothing experimental.deoptimize experimental.guard experimental.widenable.condition load.relative sideeffect is.constant.*
- Stack Map Intrinsics: llvm.experimental.stackmap llvm.experimental.patchpoint.*
- Element Wise Atomic Memory Intrinsics: memcpy.element.unordered.atomic memmove.element.unordered.atomic memset.element.unordered.atomic
- Objective-C ARC Runtime Intrinsics: objc.autorelease objc.autoreleasePoolPop objc.autoreleasePoolPush objc.autoreleaseReturnValue objc.copyWeak objc.destroyWeak objc.initWeak objc.loadWeak objc.loadWeakRetained objc.moveWeak objc.release objc.retain objc.retainAutorelease objc.retainAutoreleaseReturnValue objc.retainAutoreleasedReturnValue objc.retainBlock objc.storeStrong objc.storeWeak
ARM Cortex M0 instruction list (59 instructions)
- Moves: mov movs
- add, subtract, multiply: add adds adcs adr sub subs sbcs rsbs muls
- compare: cmp cmn
- logical: ands eors orrs bics mvns tst
- shift: lsls lsrs asrs
- rotate: rors
- load, store: ldr ldrh ldrb ldrsh ldrsb ldm str strh strb stm
- push, pop: push pop
- branch: b bx blx
- branch (32-bit Thumb instruction): bl
- extend: sxth sxtb uxth uxtb
- reverse: rev rev16 revsh
- state change: svc cpsid cpsie bkpt
- state change: mrs msr (32-bit Thumb instructions)
- hint: sev wfe wfi nop
- barriers (32-bit Thumb instructions): isb dmb dsb
All instructions are 16-bit Thumb (Thumb-1) instructions except for the 32-bit Thumb instructions (Thumb-2) indicated.
These instructions are: "all of the 16-bit Thumb instructions from ARMv7-M excluding CBZ, CBNZ and IT" plus "the 32-bit Thumb instructions BL, DMB, DSB, ISB, MRS and MSR" [1].
Note: the ARM instruction mnemonics listed in [2] often have the letter 'S' at the end of them; for instance, 'ANDS' is the mnemonic for logical AND. In addition, sometimes there are two variants of an instruction, one with an 'S' and one without, for instance, MOVS and MOV. In these cases, the 'S' suffix means that the flags are updated. This 'S' suffix causes the instruction listing that we use here (from [3]) to differ slightly from (a) the instruction listing in https://en.wikipedia.org/wiki/ARM_Cortex-M#Instruction_sets and (b) the instruction names in the headings (but not the bodies) of sections within [4], both of which use mnemonics without this 'S' suffix (and combine mnemonics that differ only in the inclusion of this 'S').
Note: the set of mnemonics found at [5] is identical to that found at [6] except that the latter includes YIELD. [7] also includes YIELD. We do not include YIELD here because it is not in [8], and because [9] indicates in a footnote that it executes as NOP.
Note: as of this writing, the set of mnemonics found at [10] includes just one 'CPS' whereas the other sources have both CPSID and CPSIE. Here we include both CPSID and CPSIE.
JVM instruction list (206 instructions)
- loading constants:
- aconst_null
- bipush
- dconst_0 dconst_1
- fconst_0 fconst_1 fconst_2
- iconst_m1 iconst_0 iconst_1 iconst_2 iconst_3 iconst_4 iconst_5
- lconst_0 lconst_1
- ldc ldc_w ldc2_w
- sipush
- addition, subtraction, multiplication, division:
- dadd fadd iadd ladd
- dsub fsub isub lsub
- dmul fmul imul lmul
- ddiv fdiv idiv ldiv
- drem frem irem lrem
- dneg fneg ineg lneg
- comparisons:
- dcmpg dcmpl
- fcmpg fcmpl
- lcmp
- arrays:
- anewarray newarray multianewarray arraylength
- aaload aastore
- baload bastore
- caload castore
- daload dastore
- faload fastore
- iaload iastore
- laload lastore
- saload sastore
- OOP:
- new checkcast instanceof
- getfield getstatic putfield putstatic
- invokedynamic invokeinterface invokespecial invokestatic invokevirtual
- areturn dreturn freturn ireturn lreturn return
- variable loads/stores:
- aload aload_0 aload_1 aload_2 aload_3 astore astore_0 astore_1 astore_2 astore_3
- dload dload_0 dload_1 dload_2 dload_3 dstore dstore_0 dstore_1 dstore_2 dstore_3
- fload fload_0 fload_1 fload_2 fload_3 fstore fstore_0 fstore_1 fstore_2 fstore_3
- iload iload_0 iload_1 iload_2 iload_3 istore istore_0 istore_1 istore_2 istore_3
- lload lload_0 lload_1 lload_2 lload_3 lstore lstore_0 lstore_1 lstore_2 lstore_3
- variable indexed operations: iinc
- exception handling: athrow
- stack ops:
- dup dup_x1 dup_x2 dup2 dup2_x1 dup2_x2
- pop pop2
- swap
- conversions:
- d2f d2i d2l
- f2d f2i f2l
- i2b i2c i2d i2f i2l i2s
- l2d l2f l2i
- jumps: goto goto_w jsr jsr_w ret
- switch: lookupswitch tableswitch
- compare-and-branch:
- if_acmpeq if_acmpne if_icmpeq if_icmpge if_icmpgt if_icmple if_icmplt if_icmpne
- ifeq ifge ifgt ifle iflt ifne
- ifnonnull ifnull
- bitwise logical:
- iand ior ixor
- land lor lxor
- shifts:
- ishl ishr iushr
- lshl lshr lushr
- sync: monitorenter monitorexit
- misc: breakpoint impdep1 impdep2 nop wide
note: jsr, jsr_w, ret have effectively been deprecated; see [11]
JVM instruction list discussion:
Many of the JVM's instructions are organized around the types reference (also called address ('a'), or objectref), array, byte, char, double, float, integer, long, short.
The array type is a data structure. There are instructions to create new arrays and to get their length. For each of the, there are instructions to load items of that type from arrays and to store them into arrays.
There are a few types for which few instructions are provided that are specific to that type. These are byte, char, and short. Each of these have instructions to load and store the type from/into an array. Each of these have instructions to convert an integer into a value of the type. Byte and short have instructions to push an immediate constant of that type onto the stack.
This leaves what i'll refer to as the 'main types': reference, double, float, integer, long.
The main numerical types can be grouped into floating-point (double, float), and integral (integer, long). Out of the main numerical types, integer is the primary workhorse.
The reference type is a pointer. It has a distinguished element 'null'. This is the only type that can be thrown. Like the main numerical types, there are instructions to load references from/store them to variables, and return them from methods. There are compare-and-branch instructions that branch based on the equality, or lack of equality, of two references, and that branch based on whether or not a reference is null.
Each of the main numerical types has instructions to push constants 0 and 1 in that type onto the stack. Floats also have an instruction to push 2, and integers also have instructions to push -1, 3, 4, and 5. Each of the main numerical types has instructions for addition, subtraction, multiplication, division, remainder. Except for integers, they each have compare instructions, which push a bool or integer to indicate the result of the comparison (presumably integers don't need this because presumably the use of these comparison results is as inputs to compare-and-branch instructions, which take integer arguments directly). The floating-point (double and float) types have two variants of these compare instructions with different behavior only on NaNs?. Each of the main numerical types has instructions to load from/store to local variables. Each of the main numerical types has instructions to convert to each of the other main numerical types, and integers also can be converted to bytes, chars, and shorts. The integral types has bitwise logical and bitshift operators.
Integers have compare-and-branch instructions that branch based on the inequality of two integers; ==, !=, <=, <, >, >=. There are also compare-and-branch instructions that branch based on the inequality of one integer and zero (again, ==, !=, <=, <, >, >=). There is an increment operator that operates directly on integer variables.
There are instructions for loading constants from the constant table. There are a number of polymorphic stack instructions; various 'dup's, pops, and swap. Instructions for control flow include jumps and switches. There are many OOP instructions for working with objects, and for invoking and returning from methods. Finally, there are also synchronization and miscellaneous operations.
LuaJIT2 instruction list (94 instructions)
Comparison: islt isge isle isgt iseqv isnev iseqs isnes iseqn isnen iseqp isnep Unary Test and Copy: istc isfc ist isf Unary: mov not unm len binary: addvn subvn mulvn divvn modvn addnv subnt mulnv divnv modnv addvv subvv mulvv divvv modvv pow cat constant: kstr kdata kshort knum kpri knil Upvalue and Function: uget usetv usets usetn usetp uclo fnew table: tnew tdup gget gset tgetv tgets tgetb tsetv tsets tsetb tsetm Calls and Vararg Handling: callm call callmt callt iterc itern varg isnext Returns: retm ret ret0 ret1 Loops and branches: fori jfori forl iforl jforl iterl iiterl jiterl loop iloop jloop jmp Function headers: funcf ifuncf jfuncf funcv ifuncv jfuncv funcc funccw func
Discussion:
As an optimized instruction set, LuaJIT?2 includes various 'immediate' instructions, such as TGETB and TSETB, which index into a table data structure with an 8-bit immediate constant integer index.
The LuaJIT?2 instruction set is based upon the Lua instruction set, version 5.1 of which is documented at http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf .
CIL instruction list (229 instructions)
addition, subtraction, multiplication, division:
- add add.ovf add.ovf.un div div.un mul mul.ovf mul.ovf.un neg rem rem.un sub sub.ovf sub.ovf.un
logical:
function calling:
- calls: call calli callvirt
- argument handling: arglist ldarg ldarg.0 ldarg.1 ldarg.2 ldarg.3 ldarg.s ldarga ldarga.s starg starg.s
- returns: ret
- tailcall: tail.
conditional branches:
- beq beq.s bge bge.s bge.un bge.un.s bgt bgt.s bgt.un bgt.un.s ble ble.s ble.un ble.un.s blt blt.s blt.un blt.un.s bne.un bne.un.s brfalse brfalse.s brinst brinst.s brnull brnull.s brtrue brtrue.s brzero brzero.s
misc:
- break no.typecheck/rangecheck/nullcheck nop readonly. unaligned. volatile.
oop:
- box castclass constrained cpobj initobj isinst ldfld ldflda ldftn ldobj ldsfld ldsflda ldstr ldtoken ldvirtftn mkrefany newobj refanytype refanyval sizeof stfld stobj stsfld unbox unbox.any
arrays:
- ldelem ldelem.i ldelem.i1 ldelem.i2 ldelem.i4 ldelem.i8 ldelem.r4 ldelem.r8 ldelem.ref ldelem.u1 ldelem.u2 ldelem.u4 ldelem.u8 ldelema ldlen newarr stelem stelem.i stelem.i1 stelem.i2 stelem.i4 stelem.i8 stelem.r4 stelem.r8 stelem.ref
jump:
comparisons:
- ceq cgt cgt.un clt clt.un
floating-point specific:
conversions:
- conv.i conv.i1 conv.i2 conv.i4 conv.i8 conv.ovf.i conv.ovf.i.un conv.ovf.i1 conv.ovf.i1.un conv.ovf.i2 conv.ovf.i2.un conv.ovf.i4 conv.ovf.i4.un conv.ovf.i8 conv.ovf.i8.un conv.ovf.u conv.ovf.u.un conv.ovf.u1 conv.ovf.u1.un conv.ovf.u2 conv.ovf.u2.un conv.ovf.u4 conv.ovf.u4.un conv.ovf.u8 conv.ovf.u8.un conv.r.un conv.r4 conv.r8 conv.u conv.u1 conv.u2 conv.u4 conv.u8
misc memory ops:
stack ops:
exception handling:
- endfault endfilter endfinally leave leave.s rethrow throw
constant loads:
- ldc.i4 ldc.i4.0 ldc.i4.1 ldc.i4.2 ldc.i4.3 ldc.i4.4 ldc.i4.5 ldc.i4.6 ldc.i4.7 ldc.i4.8 ldc.i4.m1 ldc.i4.M1 ldc.i4.s ldc.i8 ldc.r4 ldc.r8 ldnull
loads and stores:
- loads: ldind.i ldind.i1 ldind.i2 ldind.i4 ldind.i8 ldind.r4 ldind.r8 ldind.ref ldind.u1 ldind.u2 ldind.u4 ldind.u8
- stores: stind.i stind.i1 stind.i2 stind.i4 stind.i8 stind.r4 stind.r8 stind.ref
- variables: ldloc ldloc.0 ldloc.1 ldloc.2 ldloc.3 ldloc.s ldloca ldloca.s stloc stloc.0 stloc.1 stloc.2 stloc.3 stloc.s
allocation:
shifts:
Misc tables
List of conditionals by type, including both compares and compare-and-branches
In most of the previous sections, we separated comparison operations (with no control flow) from conditional compare-and-branch operations. Since some platforms have a greater variety of compare instructions and fewer compare-and-branch instructions, and others have fewer compare instructions and a greater variety of compare-and-branch instructions, this makes it more difficult to see the popularity of different comparison relations.
So, in this section we group instructions by which relation they test, regardless of whether they are comparison operations or compare-and-branch operations.
For each grouping, we give a count of the platforms which offer an instruction in that group. Since ARM doesn't have floating-point instructions and LuaJIT?2 doesn't have integer instructions, these counts are never more than 6. If the count is 5, we indicate which platforms are missing (which will always be one of ARM, LuaJIT?2, and then one other platform).
Equality
Equals
integer equals (6):
- branch int: BEQ (Branch =) (i32) (RISC-V RV32I)
- compare int: i32.eq i64.eq (WASM)
- compare int: icmp eq (LLVM)
- branch int: beq (ARM)
- branch int: if_acmpeq if_icmpeq (JVM)
- branch int: beq beq.s (polymorphic) (CIL)
- compare int: ceq (polymorphic) (CIL)
float equals (6):
- compare float: FEQ.S, FEQ.D (f32, f64) (RISC-V F): dest = (f32 == f32)
- compare float: f32.eq, f64.eq (WASM)
- compare float: fcmp eq and ordered (LLVM)
- compare float: fcmp eq or unordered (LLVM)
- compare float: trinary: dcmpg dcmpl fcmpg fcmpl (JVM)
- branch float: iseqv iseqn (LuaJIT?2)
- branch float: beq beq.s (CIL)
- compare float: ceq (polymorphic) (CIL)
note: RISC-V, JVM's branches don't work on floats. LuaJIT?'s branches are only on floats (and pris and strings)
Not-equals
integer not-equals (6):
- branch int: BNE (Branch !=) (i32) (RISC-V RV32I)
- cmp int: i32.ne i64.ne (WASM)
- cmp int: icmp ne (LLVM)
- branch int: bne (ARM)
- branch int: if_acmpne if_icmpne (JVM)
- branch int: bne.un bne.un.s (CIL)
float not-equals (5) (missing RISC-V, ARM):
- cmp float: f32.ne f64.ne (WASM)
- cmp float: fcmp neq and ordered, fcmp neq or unordered (LLVM)
- compare float: trinary: dcmpg dcmpl fcmpg fcmpl (JVM)
- branch float: isnev (LuaJIT?2)
- branch float: isnes isnen isnep (LuaJIT?2)
- branch float: bne.un bne.un.s (CIL)
Comparison Inequalities
Less-than
integer less-than (6):
- branch: BLT (Branch <) (i32) (RISC-V RV32I)
- compare: SLT (Set <) (i32) (RISC-V RV32I)
- compare: SLTI (Set < Immediate) (i32) (RISC-V RV32I)
- compare: i32_lt_s, i64_lt_s (WASM)
- compare: icmp slt (LLVM)
- branch: blt (ARM)
- branch: if_icmplt (JVM)
- branch: blt blt.s (CIL)
- compare: clt (CIL)
float less-than (6):
- compare: FLT.S, FLT.D (f32, f64) (RISC-V F): dest = (f32 < f32)
- compare: f32.lt, f64.lt (WASM)
- compare: fcmp lt and ordered (LLVM)
- compare: fcmp lt or unordered (LLVM)
- compare: trinary: dcmpg dcmpl fcmpg fcmpl (JVM)
- branch: islt (LuaJIT?2)
- branch: blt blt.s (CIL)
- compare: clt (CIL)
unsigned integer less-than (5) (missing JVM, LuaJIT?2):
- branch: BLTU (Branch < Unsigned) (i32) (RISC-V RV32I)
- compare: SLTU (Set < Unsigned) (u32) (RISC-V RV32I)
- compare: SLTIU (Set < Imm Unsigned) (u32) (RISC-V RV32I)
- compare: i32_lt_u, i64_lt_u (WASM)
- compare: icmp ult (LLVM)
- branch: blo (ARM)
- branch: blt.un blt.un.s (CIL)
- compare: clt.un (CIL)
Less-than-or-equal-to
integer less-than-or-equal-to (5) (missing RISC-V, LuaJIT?2):
- compare: i32_le_s, i64_le_s (WASM)
- compare: icmp sle (LLVM)
- branch: ble (ARM)
- branch: if_icmple (JVM)
- branch: ble ble.s (CIL)
float less-than-or-equal-to (5) (missing ARM, JVM):
- compare: FLE.S, FLE.D (f32, f64) (RISC-V F): dest = (f32 <= f32)
- compare: f32.le, f64.le (WASM)
- compare: fcmp le and ordered (LLVM)
- compare: fcmp le or unordered (LLVM)
- branch: isle (LuaJIT?2)
- branch: ble ble.s (CIL)
unsigned integer less-than-or-equal-to (4):
- compare: i32_le_u, i64_le_u (WASM)
- compare: icmp ule (LLVM)
- branch: bls (ARM)
- branch: ble.un ble.un.s (CIL)
Greater-than-or-equal-to
integer greater-than-or-equal-to (6):
- branch: BGE (Branch >=) (i32) (RISC-V RV32I)
- compare: i32_ge_s, i64_ge_s (WASM)
- compare: icmp sge (LLVM)
- branch: bge (ARM)
- branch: if_icmpgt (JVM)
- branch: bge bge.s (CIL)
float greater-than-or-equal-to (4):
- compare: f32.ge f64.ge (WASM)
- compare: fcmp ge and ordered, fcmp ge or unordered (LLVM)
- branch: isge (LuaJIT?2)
- branch: bge bge.s (CIL)
unsigned integer greater-than-or-equal-to (5) (missing JVM, LuaJIT?2):
- branch: BGEU (Branch >= Unsigned) (i32) (RISC-V RV32I)
- compare: i32_ge_u, i64_ge_u (WASM)
- icmp uge (LLVM)
- branch: BHS (ARM)
- branch: bge.un bge.un.s (CIL)
Greater-than
integer greater-than (5) (missing RISC-V, LuaJIT?2):
- compare: i32_gt_s, i64_gt_s (WASM)
- compare: icmp sgt (LLVM)
- branch: bgt (ARM)
- branch: if_icmpge (JVM)
- branch: bgt bgt.s (CIL)
- compare: cgt (CIL)
float greater-than (5) (missing RISC-V, ARM):
- compare: f32.gt f64.gt (WASM)
- compare: fcmp gt and ordered, fcmp gt or unordered (LLVM)
- compare: trinary: dcmpg dcmpl fcmpg fcmpl (JVM)
- branch: isgt (LuaJIT?2)
- branch: bgt bgt.s (CIL)
- compare: cgt (CIL)
unsigned integer greater-than (4):
- compare: i32_gt_u, i64_gt_u (WASM)
- compare: icmp ugt (LLVM)
- branch: bhi (ARM)
- branch: bgt.un bgt.un.s (CIL)
- compare: cgt.un (CIL)
Compares against zero
Equals-zero
unary equals-zero, or null, or false:
integer (4) (or, 6 if RISC-V and LLVM are counted):
- compare: i32.eqz i64.eqz (WASM)
- branch: (beq when used as unary compare) (ARM)
- branch: ifeq ifnull (JVM)
- branch: brzero brzero.s brnull brnull.s brfalse brfalse.s (CIL)
notes:
- This comparision is not needed in RISC-V because RISC-V can always use immediate zero
- This comparision is not needed in LLVM because LLVM can always just compare to a constant.
float equals-zero, or null, or false (1) (or, 2 if LLVM is counted):
- branch: isf isfc (is false-y) (LuaJIT?2)
note: This comparision is not needed in LLVM because LLVM can always just compare to a constant.
Not-equals-zero
unary not-equals-zero or non-null or true (also, boolean conditional branch) (5) (or, 6 if RISC-V is counted): integer:
- branch: BR (conditional form) (LLVM)
- branch: BR_IF (WASM): "Executing the if instruction pops an i32 condition off the stack and either falls through to the next instruction or sets the program counter to after the else or end of the if."
- branch: (bne when used as unary compare) (ARM)
- branch: ifne ifnonnull (JVM)
- branch: brtrue brtrue.s brinst brinst.s (CIL)
notes:
- This comparision is not needed in RISC-V because RISC-V can always use immediate zero
- Although LLVM can always compare to a constant, this is present in LLVM presumaly because it is LLVM's primary branching construct
float not-equals-zero (1) (or, 2 if LLVM is counted):
- branch: ist istc (is truth-y) (LuaJIT?2)
note: This comparision is not needed in LLVM because LLVM can always just compare to a constant.
Other comparison operations
unary integer compare <0:
unary integer compare >=0:
other integer:
- bvs bvc (ARM)
- ifgt ifle (compare vs. 0) (JVM)
other float compare:
- fcmp false (LLVM)
- fcmp true (LLVM)
- fcmp neither QNAN (LLVM)
- fcmp either QNAN (LLVM)
Trinary integer compare:
Summary of the platform counts in the above table
Integer
- 6 equals
- 6 not-equals
- 6 less-than
- 5 less-than-or-equal-to (missing RISC-V, LuaJIT?2)
- 6 greater-than-or-equal-to
- 5 greater-than (missing RISC-V, LuaJIT?2)
- 4 (or 6) equals-zero
- 5 (or 6) not-equals-zero
Floating-point
- 6 equals
- 5 not-equals (missing RISC-V, ARM)
- 6 less-than
- 5 less-than-or-equal-to (missing ARM, JVM)
- 4 greater-than-or-equal-to
- 5 greater-than (missing RISC-V, ARM)
- 1 (or 2) equals-zero
- 1 (or 2) not-equals-zero
Sum over both of integer, floating-point
- 12 equals
- 11 not-equals
- 12 less-than
- 10 less-than-or-equal-to
- 10 greater-than-or-equal-to
- 10 greater-than
- 5 (or 8) equals-zero
- 6 (or 8) not-equals-zero
Count of platforms with either integer or float of operation
- 7 equals
- 7 not-equals
- 7 less-than
- 7 less-than-or-equal-to
- 7 greater-than-or-equal-to
- 6 greater-than
- 5 (or 7) equals-zero
- 6 (or 7) not-equals-zero
Unsigned integer
- 5 less-than (missing JVM, LuaJIT?2)
- 4 less-than-or-equal-to
- 5 greater-than-or-equal-to (missing JVM, LuaJIT?2)
- 4 greater-than
'Other comparison operators' omitted because they are not very popular.
Calling conventions
Here we only look at the two processor ISAs; the higher-level platforms are at a higher level of abstraction that do not require caller/callee-saving of registers.
Floating-point and vector registers will not be discussed; therefore references below to 'registers' may be read as 'integer registers'.
Registers listed with a special purpose (for example, a link/return address register or stack pointer), or argument-passing/return-value-passing registers, are not counted as either caller-saved or callee-saved (other sources might list the link register and argument-passing and return-value-passing as caller-saved, and the stack pointer as callee-saved, for example).
RISC-V
32 registers total.
Arguments are passed in 8 registers, x10-x17. Return values are passed in 2 registers, x10-x11.
7 registers are caller-saved: x5, x6, x7 and x28-x31. 11 registers are callee-saved: x9, x18-x27. The return address is in register x1. The stack pointer is in register x2. The remaining 3 registers are: zero register x0, global pointer x3, thread pointer x4, frame pointer x8.
References:
ARM Cortex M0
16 registers total.
Arguments are passed in 4 registers, r0 thru r3. Return values are passed in 2 registers, r0-r1.
1 register is caller-saved: r12. 7 registers are callee-saved: r4-r8 and r10-r11. r13 is the stack pointer, r14 is the link register, r15 is the program counter, r9 is the platform-specific 'platform register'.
References:
Others
For comparion, some other potentially relevant calling conventions include 64-bit ARM (AArch64), Microsoft x64, System V AMD64 , and x86 (32-bit) cdecl:
ARM 64-bit (32 registers total) passes arguments and return values in 8 registers, 0-7. 7 registers are caller-saved (9-15). 10 registers are callee-saved (19-28). The remaining registers are: the "Indirect result location register" (8), the "Intra-Procedure-call scratch register" (16 and 17), the platform-specific register (18), the frame pointer (29), the link register (30), and the stack pointer (SP).
The Microsoft x64 convention on x86_64 (16 registers total) passes arguments in 4 registers (RCX, RDX, R8, R9), and returns values in 1 (separate) register (RAX). 2 registers are caller-saved (R10, R11). 8 registers are callee-saved (RBX, RBP, RDI, RSI, R12, R13, R14, and R15). RSP is the stack pointer.
The System V AMD64 on x86_64 (16 registers total) convention passes arguments in 6 registers (RSI, RDX, RCX, R8, R9, R10), and returns arguments in 2 registers (RAX, RDX; RDX is used for values greater than 64 bits). 2 registers (RDI, R11) are caller-saved. 6 registers are callee-saved (RBX, RBP, and R12–R15). RSP is the stack pointer.
The cdecl calling convention on 32-bit x86 (8 registers total) passes no arguments in registers, and returns arguments in 1 register (EAX). 2 registers are caller-saved (ECX, and EDX). 4 registers are callee-saved (EBX, EBP, ESI, EDI). ESP is the stack pointer.
Summary table
== |
---|
total registers | argument registers | caller-saved | callee-saved | other | name |
8 | 1 | 2 | 4 | 1 (SP) | x86 32-bit cdecl |
16 | 5 | 2 | 8 | 1 (SP) | x86_64 microsoft |
16 | 7 | 2 | 6 | 1 (SP) | x86_64 System V AMD64 |
16 | 4 | 1 | 7 | 4 (SP, LR, ..) | ARM 32-bit |
32 | 8 | 7 | 10 | 7 (SP, LR, FP, PC, ..) | ARM 64-bit |
32 | 8 | 7 | 11 | 6 (SP, LR, FP, ..) | RISC-V |
== |
---|
References
Discussion
From most popular to least
In the following, when we use terms like 'every platform' or 'whenever..', we are implicitly referring only to the 7 platforms in this study.
Every platform has facilities for:
- specifying constants/literals
- addition, subtraction, multiplication
- a jump (sometimes called "unconditional branch") to a statically known immediate or label
- an indirect jump (to a location determined at runtime)
- some form of branch-if-not-equal-to-zero/branch-if-true
- the following comparisons, on both integers and floats, if supported: equality, less-than
- the following comparisons, on integers if an integer platform, or on floats if not: not-equals, greater-than-or-equal-to, equals-zero, not-equals-zero
The most commonform of arithmetic is signed 32-bit integer, however one high-level platform (LuaJIT?2) only supports 64-bit floating point. 5 of the 7 platforms support all four combinations of 32- and 64-bit, integer and floating point. Some of the integer platforms support unsigned integer operations throughout, but others only support unsigned operations in some places.
Whenever integer arithmetic is supported, 3 bitwise shifts and three bitwise logical operations are supported:
- bitwise shifts: left shift, right shift signed (shift right arithmetic), right shift unsigned (shift right logical)
- bitwise logical: and, or, xor
Whenever floating-point arithmetic is supported, addition, subtraction, multiplication, division are all found.
All platforms that support both integer and floating point support conversions between signed integer and floating-point, and all platforms that support both 32-bit and 64-bit floating point support conversions between floating-point quantities of different bitwidths.
NOP is in all platforms except for LuaJIT?.
Most of the platforms also support:
- conversions between 32- and 64-bit
- integer division and remainder
- floating point negation, remainder/mod
- integer and floating-point loads and stores
- FENCE or sync barrier or monitor or volatile instruction (or prefix)
- compare instructions: integer less-than, floating-point equality and less-than
- branch on equality, inequality, less-than-or-equals, less-than, greater-than-or-equal, greater-than, false/equals-zero/is-null
- 'switch'-statement like indirect branching
- subroutine support with some form of CALL, RETURN, and some way to do indirect CALLs
- breakpoints
- data structures: vectors/arrays, aggregates
- some unsigned operations
All register machines have MOV and all stack machines have DROP (also called POP). Both hardware processor ISAs and none of the others have:
- unconstrained low-level indirect jumps, and none of the others do, although CIL has an unconstrained high-level indirect jump
- link registers.
- special registers
Both hardware processor ISAs have supervisor call.
Instructions or intrinsics for each of the following is provided by three platforms in this study:
Arithmetic:
- addition and subtraction with overflow or carry, signed 32-bit (note: but only as intrinsics in LLVM)
- integer negation or similar
- logical NOT (either bitwise or boolean)
- integer compares: equality, greater-than, unsigned greater-than
- floating-point compares: less-than-or-equal-to, greater-than
- floating-point specific: sqrt, copysign, min, max (note: but only as intrinsics in LLVM)
- conversions:
- from larger integers to 8-bit and 16-bit integers
- from unsigned 32-bit integers to 64-bit integers
- coercive casting between integers and floating-points
Memory access:
- variable loads and stores
Control flow:
- exception handling
- variable-length argument lists (variadic functions)
- an illegal/unreachable instruction
Allocation:
- various allocation instructions
Data structures:
Instructions or intrinsics for each of the following is provided by two platforms in this study:
Arithmetic:
- instructions to load various higher-level data structure constants such as strings
- constant tables/constant pools
- PC-relative arithmetic
- multiplication with overflow (signed and unsigned, 32- and 64-bit)
- unsigned, 64-bit variants of addition, subtraction, multiplication with overflow
- right rotate
- integer compares: inequality, greater-than-or-equal-to, less-than-or-equal-to,
- floating point compares: inequality, greater-than-or-equal-to
- clz ctz popcnt
- byteswaps
- integer conversions from 8-bit or 16-bit to larger
- floating point:
- abs
- ceil, floor, trunc, nearest
- rounding modes and exception modes
- classify
- fused multiply-add
- pow
Memory access:
- global variable loads/stores
- short instructions to load/store the first 4 variables
Stack ops:
Atomics and Sync:
- AMOs: SWAP, ADD, AND, OR, XOR, MIN, MAX, MINU, MAXU
- either load-release/store-conditional, or compare-and-swap
Control flow:
- jump with link register
- unconstrained indirect branch
- branches: <0, >=0, unsigned <, unsigned >=
- select
- tail calls
- structured control flow loops
- invoke instructions for object-oriented calling.
Data structures:
Misc:
- cycle counters
- memory ops such as memcpy
Arithmetic
Every platform has ways to specify constants/literals. Most platforms specify these only as immediates or directly in the IR, but two of them also have constant pools.
The most common number type if 32-bit signed integers, but 64-bit floats are also popular, and most platforms offer all of 32- and 64- bit, integer and float. All platforms offer some way to convert between the various types that they have. Some platforms offer coercive casting between types.
Every platform has ways to add, subtract, and multiply. Most platforms also offer division.
Every platform with integers offers at least 6 bitwise operations: shift left, shift right unsigned, shift right signed, and, or, xor.
Most platforms offer some unsigned operations and some offer all unsigned operations.
Most platforms offer compares of integer less-than, floating-point equality, and floating-point less-than (WASM and LLVM use these for branching, see below, but RISC-V and CIL also offer them). Some platforms offer more compares.
Some platforms offer additional arithmetic operations:
- integer: negation, right rotate, bit count ops (clz ctz popcnt), PC-relative arithmetic
- floating-point: negation, sqrt, copysign, max, min, abs, rounding ops (ceil, floor, trunc, nearest), classify, fused multiply-add, pow
Two floating-point platforms offer rounding and exception modes.
Memory access
Most platforms have some form of integer and floating-point loads and stores. Some platforms also support local variables loads/stores.
Some platforms have memory allocation instructions, but they differ widely.
Two platforms have global variable loads/stores.
Register and stack ops
All register machines have copy/MOV and all stack machines have DROP (also called POP). Most stack machines have DUP.
Atomics and sync
Most platforms have some form of atomic or sync functionality, but they differ widely.
A few platforms offer AMOs (SWAP, ADD, AND, OR, XOR, MIN, MAX, MINU, MAXU) and either load-release/store-conditional, or compare-and-swap.
Control flow
Every platform have unconditional jump. Every platform has some form of branch-if-true, or branch-if-not-equal-to-zero, or branch-if-non-null.
Every platform has some form of unconditional indirect jump. Most platforms offer some form of 'switch'-statement-like indirect jump. The two hardware processor ISAs have unconstrained low-level indirect jumps, and the other platforms require all low-level indirect jumps to specify a list of all possible jump targets, although some of them offer higher-level unconstrained indirect jumps (to functions or methods).
Most platforms have branch on equality, inequality, less-than-or-equals, less-than, greater-than-or-equal, greater-than, false/equals-zero/is-null. Some platforms have branch on <0, >=0, unsigned <, unsigned >=.
Many platforms have compare-and-branch but some platforms require two instructions for these; either (a compare instruction which places a boolean in a register and then a branch instruction (WASM and LLVM do this)), or (an arithmetic instruction which sets a flag register and then a branch instruction which reads those flags (ARM does this)). Some platforms (WASM and LLVM, the same ones which split compares and boolean-conditional-branches) offer a non-branching SELECT operation.
Most platforms have subroutine support, with some form of CALL, RETURN, and some way to do indirect CALLs.
Many platforms have restrictions on jumping across function boundaries.
Some platforms have:
- exception handling
- variable-length argument lists (variadic functions)
- an illegal/unreachable instruction
- tail calls
- structured control flow loops
Comparisons
This section considers both (non-control-flow) compares and compare-and-branches together, to try and get insight into which types of comparison are the most popular.
Every platform in this study offers both of the following comparisons on integers if they support integers, and on floats if they support floats:
Every integer platform has facilities for all of following comparisons on integers, and the only non-integer platform (LuaJIT?2) also supports all of these on floats:
- not-equals
- greater-than-or-equal-to
- equals-zero
- not-equals-zero (all platforms have branching instructions for this one; see previous section)
You might be wondering what's so special about greater-than-or-equal-to; i actually think it might just be on this list due to chance. You see, RISC-V has integer compare-and-branch operations for beq, bne, blt, bge (equals, not-equals, less-than, greater-than-or-equal-to). As noted above, every platform in this study has less-than on integers if they support integers, and on floating-point if they support floats. Greater-than-or-equal-to is not like that though; only 4 of the six floating-point-capable platforms have a floating-point greater-than-or-equal-to primitive. For example, in floating-point, RISC-V has equals, less-than, and less-than-or-equal-to. I don't know why they chose to have greater-than-or-equal-to for integers and less-than-or-equal-to for floats; for integers, it's really just a convention, because by reversing the integer arguments in the RISC-V instruction you can get greater-than and less-than-or-equal-to in addition to less-than and greater-than-or-equal-to. But my conclusion is that greater-than-or-equal-to is probably not that special.
If we had a much larger sample size (many more than 7 platforms) maybe we could get to the bottom of this, but from what we have here all that i can really conclude is that:
- equality seems to be especially popular (everyone has it)
- not-equals-zero seems to be especially popular (everyone has a branching instruction based on this or something like it)
- everyone has at least some of less-than/less-than-or-equal-to-zero/greater-than-or-equal-to-zero/greater-than. Out of these, less-than seems to be more popular than the others (everyone has it), and greater-than MIGHT be slightly less popular than the others.
- in addition to equals, which is very popular, non-equals seems to be pretty popular
- in addition to not-equal-to-zero, which is very popular, equals-zero seems to be pretty popular, but only for integers
Data structures
Most platforms have vectors/arrays, and aggregates.
Some platforms have a 'length' instruction.
A few platforms have OOP, and strings.
Misc
Most platforms have NOP and breakpoint instructions.
The two hardware processor ISA platforms have link registers, special registers, and supervisor call.
A few platforms have cycle counters, and memory operations such as memcpy.
Related work
RISC-V Geneology
RISC-V Geneology surveys 18 instruction set architectures prior to RISC-V, "chosen primarily from earlier UC Berkeley RISC architectures and major proprietary RISC instruction sets".
They present a matrix of which instructions in each instruction set correspond to which RISC-V instructions, allowing me to present a count of the analogs of each instruction below.
The paper lists 98 instructions which have analogs that "appear in at least three" prior ISAs, which is (with counts of their prior analogs in parentheses) (grouped by function by me):
- immediates: LUI(8)
- control flow: JAL(14) JALR(16) BEQ(17) BNE(17) BLT(13) BGE(13) BLTU(8) BGEU(8)
- loads and stores: LB(11) LH(13) LW(18) LBU(11) LHU(12) SB(14) SH(14) SW(18) FLW(15) FSW(15) FLD(14) FSD(14)
- integer add/sub/mul/div: ADD(18) ADDI(17) SUB(18) MUL(12) MULH(7) MULHU(9) DIV(8) DIVU(7) REMU(7)
- shifts: SLTI(4) SLTIU(3) SLLI(12) SRLI(11) SRAI(12) SLL(17) SLT(4) SLTU(3) SRL(16) SRA(17)
- logical: XORI(14) ORI(14) ANDI(15) XOR(18) OR(18) AND(18)
- concurrency: FENCE(7) FENCE.I(4) LR.W(8) SC.W(8) AMOSWAP.W(3) AMOADD.W(3)
- misc: SCALL(16) SBREAK(9) RDCYCLE(3) RDTIME(3)
- floating point add/sub/mul/div: FADD.S(16) FSUB.S(15) FMUL.S(16) FDIV.S(16) FADD.D(16) FSUB.D(16) FMUL.D(16) FDIV.D(16)
- floating point fused multiply/add: FMADD.S(5) FMSUB.S(3) FNMSUB.S(4) FNMADD.S(4) FMADD.D(4) FMSUB.D(3) FNMSUB.D(4) FNMADD.D(4)
- floating point sqrt: FSQRT.S(10) FSQRT.D(10)
- floating point signs: FSGNJ.S(3) FSGNJN.S(3) FSGNJ.D(3) FSGNJN.D(2)
- floating point conversions: FCVT.W.S(14) FCVT.S.W(13) FCVT.S.D(11) FCVT.D.S(10) FCVT.W.D(14) FCVT.D.W(13) FMV.X.S(8) FMV.S.X(6)
- floating point comparisons: FEQ.S(11) FLT.S(11) FLE.S(9) FEQ.D(11) FLT.D(11) FLE.D(9)
- floating point misc: RCSR(8) FRRM(6) FRFLAGS(8) FSRMI(7) FSFLAGSI(8)
(note: the paper lists RDINSTRET as having at least 3 analogs, but in their matrix it has only 1; perhaps they made a mistake with that one; so the list above only has 97 instructions, not 98).
The instructions with >=3 analogs but <8 are:
- integer add/sub/mul/div: MULH(7) DIVU(7) REMU(7)
- shifts: SLTI(4) SLTIU(3) SLT(4) SLTU(3)
- concurrency: FENCE(7) FENCE.I(4) AMOSWAP.W(3) AMOADD.W(3)
- misc: RDCYCLE(3) RDTIME(3)
- floating point fused multiply/add: FMADD.S(5) FMSUB.S(3) FNMSUB.S(4) FNMADD.S(4) FMADD.D(4) FMSUB.D(3) FNMSUB.D(4) FNMADD.D(4)
- floating point signs: FSGNJ.S(3) FSGNJN.S(3) FSGNJ.D(3) FSGNJN.D(2)
- floating point conversions: FMV.S.X(6)
- floating point misc: FRRM(6) FSRMI(7)
Here is the subset of the instructions with at least 8 prior analogs found:
- immediates: LUI (8)
- control flow: JAL(14) JALR(16) BEQ(17) BNE(17) BLT(13) BGE(13) BLTU(8) BGEU(8)
- loads and stores: LB(11) LH(13) LW(18) LBU(11) LHU(12) SB(14) SH(14) SW(18) FLW(15) FSW(15) FLD(14) FSD(14)
- integer add/sub/mul/div: ADD(18) ADDI(17) SUB(18) MUL(12) MULHU(9) DIV(8)
- shifts: SLLI(12) SRLI(11) SRAI(12) SLL(17) SRL(16) SRA(17)
- logical: XORI(14) ORI(14) ANDI(15) XOR(18) OR(18) AND(18)
- concurrency: LR.W(8) SC.W(8)
- misc: SCALL(16) SBREAK(9)
- floating point add/sub/mul/div: FADD.S(16) FSUB.S(15) FMUL.S(16) FDIV.S(16) FADD.D(16) FSUB.D(16) FMUL.D(16) FDIV.D(16)
- floating point sqrt: FSQRT.S(10) FSQRT.D(10)
- floating point conversions: FCVT.W.S(14) FCVT.S.W(13) FCVT.S.D(11) FCVT.D.S(10) FCVT.W.D(14) FCVT.D.W(13) FMV.X.S(8)
- floating point comparisons: FEQ.S(11) FLT.S(11) FLE.S(9) FEQ.D(11) FLT.D(11) FLE.D(9)
- floating point misc: RCSR(8) FRFLAGS(8) FSFLAGSI(8)
Note that now we have lost the unsigned div/rem instructions, all concurrency except LR/SC, cycle counts, fused multiply/add, and floating point sign instructions (perhaps some of the prior ISAs had instructions like FABS and FNEG, which are only assembler pseudoinstructions in RISC and hence not listed here, to replace the RISC-V sign instructions).
The instructions with >=8 analogs but <11 are:
- immediates: LUI (8)
- control flow: BLTU(8) BGEU(8)
- integer add/sub/mul/div: MULHU(9) DIV(8)
- concurrency: LR.W(8) SC.W(8)
- misc: SBREAK(9)
- floating point sqrt: FSQRT.S(10) FSQRT.D(10)
- floating point conversions: FCVT.D.S(10) FMV.X.S(8)
- floating point comparisons: FLE.S(9) FLE.D(9)
- floating point misc: RCSR(8) FRFLAGS(8) FSFLAGSI(8)
Here is the subset of the instructions with at least 11 prior analogs found:
- control flow: JAL(14) JALR(16) BEQ(17) BNE(17) BLT(13) BGE(13)
- loads and stores: LB(11) LH(13) LW(18) LBU(11) LHU(12) SB(14) SH(14) SW(18) FLW(15) FSW(15) FLD(14) FSD(14)
- integer add/sub/mul/div: ADD(18) ADDI(17) SUB(18) MUL(12)
- shifts: SLLI(12) SRLI(11) SRAI(12) SLL(17) SRL(16) SRA(17)
- logical: XORI(14) ORI(14) ANDI(15) XOR(18) OR(18) AND(18)
- misc: SCALL(16)
- floating point add/sub/mul/div: FADD.S(16) FSUB.S(15) FMUL.S(16) FDIV.S(16) FADD.D(16) FSUB.D(16) FMUL.D(16) FDIV.D(16)
- floating point conversions: FCVT.W.S(14) FCVT.S.W(13) FCVT.S.D(11) FCVT.W.D(14) FCVT.D.W(13)
- floating point comparisons: FEQ.S(11) FLT.S(11) FEQ.D(11) FLT.D(11)
Note that now we have lost LUI, unsigned compare-and-branch instructions, the high-bits-mul instruction, division, all concurrency (perhaps some of the prior ISAs had other concurrency mechanisms though), breakpoints, sqrt, floating point LE comparison, and floating-point exception and rounding mode instructions.
The instructions with >=11 analogs but <16 are:
- control flow: JAL(14) BLT(13) BGE(13)
- loads and stores: LB(11) LH(13) LBU(11) LHU(12) SB(14) SH(14) FLW(15) FSW(15) FLD(14) FSD(14)
- integer add/sub/mul/div: MUL(12)
- shifts: SLLI(12) SRLI(11) SRAI(12)
- logical: XORI(14) ORI(14) ANDI(15)
- floating point add/sub/mul/div: FSUB.S(15)
- floating point conversions: FCVT.W.S(14) FCVT.S.W(13) FCVT.S.D(11) FCVT.W.D(14) FCVT.D.W(13)
- floating point comparisons: FEQ.S(11) FLT.S(11) FEQ.D(11) FLT.D(11)
Here is the subset of the instructions with at least 16 prior analogs found:
- control flow: JALR(16) BEQ(17) BNE(17)
- loads and stores: LW(18) SW(18)
- integer add/sub/mul/div: ADD(18) ADDI(17) SUB(18)
- shifts: SLL(17) SRL(16) SRA(17)
- logical: XOR(18) OR(18) AND(18)
- misc: SCALL(16)
- floating point add/sub/mul/div: FADD.S(16) FMUL.S(16) FDIV.S(16) FADD.D(16) FSUB.D(16) FMUL.D(16) FDIV.D(16)
Note that now we have lost JAL, compare-and-branch instructions except for (not)-equality predicates, loads and stores of all word sizes except for 32-bit, multiplication, immediate shift and logical (in fact, all immediates except for ADDI; presumably all ISAs provide some way to load immediates, however), and all of the floating-point loads/stores, conversions, and comparisons.
Let's take stock of what remains; these appear to be the common core instructions. I've left out the floating point add/sub/mul/div, because these appear to me to be pretty useless without any floating-point loads/stores, conversion, and comparisons; presumably older ISAs had some way to use them however.
- no load immediate instructions (presumably every ISA provides some way to load immediates, however; RISC-V could use ADDI in conjunction with its zero register)
- jumps (only indirect jump provided, but presumably direct jump can be synthesized on ISAs missing it): JALR
- branches on (in)equality: BEQ BNE (presumably all ISAs also provide some way to compare less-than or less-than-or-equal, however)
- 32-bit loads/stores: LW SW
- add/sub: ADD SUB
- immediate addition (also used in RISC-V for an assembler MV pseudoinstruction): ADDI
- shifts and logical: SLL SRL SRA XOR OR AND
- system calls: SCALL
This is 15 instructions.
TODO
goals for this document:
Revise the 'concordance' of RISC-V, WASM, ARM Cortex M0, LLVM, JVM, LuaJIT?2 instructions into a readable list of instructions, grouped by type of purpose, referencing their analogs in each of those systems (hence the word 'concordance'), with a description of the semantics of each instruction (and how the various systems differ).
todo:
- provide more details for the most common instructions ("provide description of the semantics of each instruction (and how the various systems differ)."
- reformat (right now the newlines in my source document aren't showing). Switch to asciidoc.
- add TOC
- correct the little TODOs throughout
- re-read, edit
- ask relevant communities (RISC-V, WASM, etc) for (a) resolution of my questions (grep for question marks), (b) errors, (c) other comments, (d) pointers to writeups that are similar, or that express opinions about which instructions could be omitted, and about how each ISA should be/could have been changed. Maybe put on Gitlab. Add CC BY license. Maybe add to Wikipedia.
- i'd like to add ARM64, QBE, MIR, ColdFire? (representing 68k), and the subset of x86 matching other instructions here