This document surveys the instruction sets of seven intermediate-language, virtual machine, or assembly-language platforms: RISC-V, WASM, ARM Cortex M0, LLVM, JVM, LuaJIT?2, CIL. It groups instructions together by purpose, referencing their analogs in each of those systems (hence the word 'concordance'). In this way, it shows which instructions in one platform are the correspond to which instructions in other platforms.
The aim is to provide insight into which sorts of instruction are 'popular' (common to many platforms), and what the most 'popular' choices for the semantics of those instructions are.
The organization of this document is:
RISC-V is an instruction set architecture (ISAs) for hardware microprocessors [1].
As a general-purpose hardware microprocessor ISA, RISC-V is intended to efficiently execute any high-level programming language.
RISC-V is a register machine with a set of integer register with a fixed bitwidth (in this document we'll look at the 64-bit variant of RISC-V), and also another set of floating point registers (if the floating point extension is being used).
WASM is a virtual machine designed to enable high performance applications on the Web [2]. It is designed to support many different high-level-languages.
WASM is considered to be higher-level than RISC-V and ARM, but lower-level than JVM, CIL, and LuaJIT?2. WASM is a sandbox that allows untrusted code to be executed without giving that code arbitrary access to the host machine.
WASM is a stack machine.
WASM has loads and stores for working with memory, but it also has local and global variables.
Unusually for a virtual machine that seeks to be a low-level target platform for many different high-level-languages, WASM has some block-structured control flow, and corresponding restrictions on jump and branch targets.
LLVM is considered to be higher-level than RISC-V and ARM, but lower-level than JVM, CIL, and LuaJIT?2.
Although LLVM was always intended to be a low-level target platform for many different high-level-languages, it was first used to support the compilation of C and C++.
LLVM code is structured into basic blocks. A basic block is "a straight-line code sequence with no branches in except to the entry and no branches out except at the exit." [3]. Some of the LLVM instructions (for example, branch or return) are called 'terminator instructions', and each LLVM basic block must end with one of these 'terminator instructions'.
Instead of registers (or a stack), LLVM has an infinite set of variables. These variables must be accessed in SSA (single-static-assignment) form, which means that you cannot reassign to the same variable twice.
ARM is a family of hardware microprocessors. We will look at the instruction set architecture (ISAs) for the ARM Cortex M0, which is the ARM processor with the smallest instruction set.
As a general-purpose hardware microprocessor ISA, ARM is intended to efficiently execute any high-level programming language.
ARM Cortex M0 has 16 numbered 32-bit registers, of which 13 are GPRs and 3 are special (PC, link register, stack pointer). There are three other misc. registers (Program Status Register (PSR), PRIMASK (exception mask), CONTROL (stack control)). [4]
Although the JVM is now host to multiple high-level languages, it was originally designed to run Java.
The JVM is considered a 'higher-level' virtual machine; it provides high-level-language data structures. The JVM is a sandbox that allows untrusted code to be executed without giving that code arbitrary access to the host machine.
The JVM is a stack machine.
LuaJIT?2 is considered a 'higher-level' virtual machine.
LuaJIT?2 is a register machine.
Note: It was a difficult choice whether to analyze the Lua 5.1 bytecode instruction set or the LuaJIT? 2 bytecode instruction set. The Lua 5.1 bytecode instruction set is smaller and better documented (see the excellent http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf ). However, as the LuaJIT? 2 instruction set is optimized and had the opportunity to learn from the Lua 5.1 bytecode, I was interested in seeing what choices were made with regards to things like: which instructions should have immediate constant forms? which branching conditionals should exist?
The LuaJIT? 2 instruction set is clearly based on the Lua 5.1 instruction set, and seems to be for the most part an elaboration of it, with various additional specialization instructions added, various immediate constants added, some branching conditionals added, various 'marker' instructions to help with the JIT process added, etc.
The CLI is considered a 'higher-level' virtual machine. It provides high-level-language data structures and enables interoperation between high-level languages. The CLI provides facilities to allow untrusted code to be executed without giving that code arbitrary access to the host machine.
The CLI is a stack machine.
The instruction set of the CLI is called the CIL (Common Intermediate Language).
Generally in this document we consider the dividing line between instructions to be separate mnemonics.
Sometimes if a platform (e.g. LLVM) lists a group of related instructions together in its documentation, then we consider that whole group as one instruction (for instance some LLVM intrinsics have a type name in their mnemonic, but in the LLVM documentation, rather than having a separate section for each type variant, they are all listed together with an asterisk representing the type name).
RISC-V has immediate variants. WASM doesn't have immediate variants (todo: is this correct?).
RISC-V is presented as a base integer instruction set (either RV32I (with 32-bit registers), RV32E (slimmed down embedded variant of RV32I), or RV64I (with 64-bit registers)), on top of which may be layered various extensions (such as M for multiplication, A for atomics, F for single-precision floating point, and D for double-precision floating point).
We will be assuming a processor with RV64IMAFD (also called RV64G), that is, a RISC-V processor (RV) with 64-bit integer registers (RV64I) with 64-bit multiply, atomic, single-precision floating point, and double-precision floating point extensions (IMAFD; the IMAFD selection of extensions can be abbreviated as "G" because IMAFD represents the "standard general-purpose ISA", according to the RISC-V spec).
We will be mainly discussing RV64I, in which the registers are 64-bits, but RISC-V also has a 32-bit variant (RV32I) in which the registers are all 32-bits instead of 64. Many operations are only available in 64-bit form in RV64I, but since RV32I would provide them in 32-bit form, we will usually mark these as supporting both 32- and 64-bits, for easier comparison with instruction sets like WASM which provide explicit 32- and 64- bit forms throughout.
RISC-V floating point operations are only included in the F (single) and D (double) extensions, not the base integer instruction set.
The RISC-V instructions are untyped and operate on memory and registers, so since i want to give types for everything (to facilitate comparison), here is the convention that i will use. The types indicated in the following will be i64 by default (64 because the RV64I registers are 64-bits). However, if the operation is clearly intended to work with unsigned quantities, the type shown will be 'u' instead of 'i'; and if the operation is clearly intended to work with less than 64 bits, 32, 16, or 8 will be shown instead of 64. TODO enforce this convention.
Below, we'll say 'ARM' although we only mean ARM Cortex M0.
Sometimes i write '(polymorphic)' next to an instruction. I usually only do that when the same instruction will be appearing multiple times in the same section. There are other instructions which are also polymorphic which do not get this marking.
This document is probably full of errors. You would probably want this sort of comparison to be written by someone who has written code in each of the platform instruction sets being compared, or who at least has carefully read the documentation and specifications. Instead, what you are reading was written by someone who has merely glanced through the documentation, and who in many cases is making assumptions based only on the names of the instruction, without reading their definitions or using them in code!
This is, to some extent, an apples-to-oranges comparison. The different platforms being compared have different purposes and are different kinds of things.
These particular instruction sets were chosen for comparison because they possess the following properties:
There is necessarily some subjective preference involved in grouping similar functionality together. In this document the choice of grouping affects how widely functionality is considered to be shared, and hence how 'popular' it is considered to be.
For example, LLVM 'constrained' arithmetic is somewhat similar to RISC-V arithmetic in that both offer a way of choosing rounding and exception modes; however LLVM offers a constrained floating-point remainder function whereas RISC-V does not offer a floating-point remainder function. Does that mean that LLVM's constrained.frem intrinsic should be counted as something supported by only one platform? I have instead chosen to only consider the larger grouping 'floating-point arithmetic with rounding/exception mode control', which is listed as something supported by two platforms, with a note that RISC-V does not offer a floating-point remainder instruction.
For example, i have chosen to group ARM and CIL's bitwise NOT with LuaJIT?'s boolean NOT.
For example, i have chosen to group RISC-V's 'classify' operation with CIL's 'ckfinite', even though 'classify' provides much more functionality.
For example, i have chosen to group RISC-V's load-release/store-conditional with LLVM's compare-and-swap, because even though these are very different operations, they can each be used as a primitive upon which to build synchronization/consensus/atomicity.
For example, i have chosen to break out 'constant loads from a constant pool/table' as separate functionality, but then also consider 'constant loads' as something supported by all 7 platforms (mixing together immediate constants and constant pools), even though there are many differences in the details (particularly bitwidths).
For example, i have chosen to group ARM's 'reverse subtraction' (rsbs) with integer negation.
For example, i have chosen to group together a wide variety of fence, synchronization, and volatile instructions/prefixes.
For example, i have chosen to group together a wide variety of memory-allocation-related instructions.
Sometimes i have chosen to list the same instruction in multiple places, causing it to be 'double-counted' in a sense. For example, RISC-V's unconditional jumps are listed once in the 'instructions supported by two platforms' in a section noting the link register functionality, but listed again later for their unconditional jump functionality.
Instruction lists were collected from these references in May 2019.
Short excerpts of text describing the semantics of instructions are taken, without quoting, from these references.
RISC-V:
Web Assembly (WASM):
LLVM:
ARM Cortex M0:
JVM:
CIL:
LuaJIT?2:
To follow. Full concordance is at the end.
Every platform in this study has:
Every platform in this study has some mechanism for specifying constants/literals, however, in terms of INSTRUCTIONS, LLVM does not have a constant load instruction because it doesn't need one; constants can be assigned to a variable in the LLVM IR without an instruction. For this reason, constant load instructions are listed below in the section on instruction classes supported by six platforms.
Every platform in this study has instrutions for addition, subtraction, and multiplication. However, one platform (ARM) is integer-only, whereas another platform (LuaJIT?2) is floating-point only. For this reason, these instructions are listed below in the section on instruction classes supported by six platforms.
Every platform in this study provides unconditional jumps/direct branches to an immediate or a label.
RISC-V and ARM provide link register variants of their indirect branch instructions.
NOTE: when i say that a jump is 'PC-relative', often it is actually relative to the instruction following the jump instruction, rather than to the jump instruction itself. Sometimes i provide that level of detail, but often i omit it.
in RISC-V, the RISC-V JAL instruction is a PC-relative jump with a 20-bit signed immediate. The units of the immediate are multiplies of 2 bytes, so this means that RISC-V's JAL can encode a jump offset of +-1MiB?. However, jumps to anywhere in a 32-bit absolute address range are available using fixed two-instruction sequences involving a 20-bit immediate constant load followed by an indirect jump (JALR), which contains a base register and a 12-bit displacement. RISC-V JAL writes the address following the jump instruction into a destination register.
In WASM, i'm not quite sure that i understand the WASM BR instruction but i think its immediate argument is a label index, which do NOT "reference program positions in the instruction stream but instead reference outer control constructs by relative nesting depth" [5]. Recall that WASM only has structured control flow. In other words, a WASM BR can only target the end statements of enclosing blocks, and the immediate argument is how many enclosing blocks to break out of. The immediate argument is 32 bits.
In LLVM, the immediate of the BR instruction is of type 'label'. I couldn't find any specification on how many labels are allowed in a program, and LLVM's bitcode representation has variable-width integers, so presumably these labels can be represented by these variable length integers and there is no limit to how many there can be, although i didn't look into it too closely.
In ARM Cortex M0, the branch-with-link variant of the B instruction has a range of +-~16MiB?, relative to the PC register [6].
In the JVM, the goto instruction has a PC-relative signed range of 16 bits (+-~32KiB?), but must be confined to the same method. goto_w has a PC-relative signed range of 32 bits (+-~2GiB?), HOWEVER currently the JVM has a limit on method size of 64KiB? bytes, and goto_w also has the restriction that it must not cross methods, so effectively the limit of goto_w is +-~64KiB? [7].
In LuaJIT?2, the JMP instruction has a PC-relative signed range of 16 bits (+-~32KiB?) [8] (interestingly, this seems to have been 18-bits in the original Lua that LuaJIT?2 is based upon [9]).
In CIL, the BR instruction has a PC-relative signed range of 32 bits (+-~2MiB?), and the BR.S instruction has a PC-relative signed range of 8 bits (+-~128). BR instructions may not jump into or out of exception-handling blocks ("try, catch, filter, and finally blocks").
CIL's JMP is a higher-level unconditional direct jump which exits the current method and jumps to another method with the same calling convention, number and type of arguments as the current method.
Every platform in this study offers some form of unconditional indirect branching. However, LuaJIT?2 only offers a higher-level form of indirect branching via its function CALL instructions. For this reason, low-level unconditional indirect branch instructions are listed below in the section on instruction classes supported by six platforms.
Every platform in this study offers some form of branch-if-true, or branch-if-not-equal-to-zero, or branch-if-non-null. However, RISC-V does not need a separate instruction for this; it can use the 'zero register', which is a 'register' which always holds the value constant zero, as an operand to its BNE (branch-if-not-equal) instruction. For this reason, branch-if-not-equal-to-zero instructions are listed below in the section on instruction classes supported by six platforms.
Above, we discussed conditional branches. However, some platforms offer few compare-and-branch instructions, but also offer non-control-flow-altering compare instructions. In this section, we consider both compare and compare-and-branch instructions, and look at which types of comparisons are shared by all platforms, whether that comparison is offered as a (non-control-flow-altering) compare, or as a compare-and-branch.
Every platform in this study offers both of the following comparisons on integers if they support integers, and on floats if they support floats:
In addition to the above, instructions for each of the following is provided by all but one platform in this study:
Arithmetic:
Control flow:
WASM, JVM, CIL have instructions to directly load i32, i64, f32, f64 constants (but not unsigned?). RISC-V has various instructions that can be specialized to directly load integer constants the size of the registers (i64 if RV64I, or i32 if the chip is RV32I instead of RV64I); everything else must be synthesized/coerced.
Lua has f64 constants only (in addition to nils, booleans, strings), but LuaJIT?2 does have an instruction to load 16-bit immediate constants.
JVM does provide immediate constants of 8- and 16-bits (extended to ints of 32 bits) via bipush and sipush.
ARM has up to 32-bit constant loads via immediate mode of some load instructions.
LLVM doesn't need constant loads because constants can be assigned to a variable in the LLVM IR without an instruction.
In RISC-V, ADDI and ORI can be used with the zero register to load 12-bit immediates. LUI loads a 20-bit immediate in the most-significant bits and zeros the other 12 bits. AUIPC is similar to LUI but then it also adds this offset to the PC.
In WASM, there are instructions to push integer and floating point 32-bit and 64-bit constants onto the stack.
In ARM, single-byte can be loaded using the MOVS instruction with an immediate constant. Otherwise, the constant must be placed in a literal pool within the instruction stream, and loaded from that literal pool using LDR (load) with a PC-relative offset. Literal pools must be manually placed using the LTORG assembler directive [10]), and must be within +1020 bytes of the current instruction [11]. The ARM assembler pseudo-instruction "LDR Rd, =const" chooses between MOV and LDR and places the constant into the literal pool as needed (an error is generated if a literal pool is required but none can be found near enough).
In JVM, there is a constant pool and instructions ldc, ldc_2, ldc2_w to load from it. There are also short instructions iconst_m1..iconst_5 to push the values -1..5 onto the stack as integers, as well aconst_null to push the null reference, dconst_0 and dconst_1 to push 0.0 and 1.0 doubles, fconst_0..fconst_2 to push 0.0, 1.0, 2.0 floats, lconst_0, lconst_1 to push 0, 1 longs. There are also bipush and sipush instructions for pushing 8-bit and 16-bit integers onto the stack.
In LuaJIT?2, there is a constant pool, and instructions KSTR, KCDATA, KNUM, KPRI to load from it. KPRI can load nil, false, true. There is also a KSHORT instruction to load a 16-bit signed immediate integer.
In CIL, there are the short instruction ldc.i4.m1..ldc.i4.8 to push the 32-bit integers -1..8 onto the stack. There are also ldc.i4.s to push an immediate byte onto the stack as a 32-bit integer, and ldc.i4, ldc.i8, ldc.r4, ldc.r8 to push immediate constant integers and floats, both 32-bit and 64-bit. There is also ldnull to push the null reference, and ldstr to push an immediate constant string (it's not clear to me if this is actually stored in the instruction stream, or if it's in a constant pool, although it seems to me that the spec says that the constant pool is not available at runtime).
Every platform in this study provides integer operations, except for LuaJIT?2, which only provides floating point. Every platform in this study provides floating-point operations, except for ARM, which only provides integers.
Every integer platform in this study provides both 32-bit and 64-bit integers, except for ARM, which only provides 32-bit integers, and RISC-V, which only provides integers of its register size (either 32-bit or 64-bit).
Every floating-point platform in this study provides both 32-bit and 64-bit floating-point, except for LuaJIT?2, which only provides 64-bit floating-point.
Every integer platform in this study provides signed integer addition, subtraction, and multiplication (this multiplication returns the lower half of the resulting bits; the "low order bits"; equivalently, the result mod 2^bitwidth). Every floating-point platform in this study provides floating-point addition, subtraction, multiplication, and division.
Andreas Olofsson of Adapteva noted in a blog post that RISC-V's FDIV (floating point division) instruction is "expensive" and that it was a "tough call" whether to include such an instruction in his Epiphany ISA [12]