Cross-platform concordance of instructions from seven intermediate- or assembly- language architectures: RISC-V, WASM, LLVM, ARM Cortex M0, JVM, LuaJIT2, CLI

Introduction

This document surveys the instruction sets of seven intermediate-language, virtual machine, or assembly-language platforms: RISC-V, WASM, ARM Cortex M0, LLVM, JVM, LuaJIT?2, CIL. It groups instructions together by purpose, referencing their analogs in each of those systems (hence the word 'concordance'). In this way, it shows which instructions in one platform are the correspond to which instructions in other platforms.

The aim is to provide insight into which sorts of instruction are 'popular' (common to many platforms), and what the most 'popular' choices for the semantics of those instructions are.

The organization of this document is:

introduction to each of the platforms
concordance of instructions, grouped into sections in decreasing order of popularity (so we start with those instruction classes common to all 7 platforms, and end with instructions found on only 1 platform). Some of the most popular instructions have some details provided on their semantics in these sections (not in the full concordance below).
full concordance of instructions (all instructions on all platforms, grouped by purpose)
for each platform, a concise listing of all instructions on that platform
discussion

Introduction to the platforms

Introduction to RISC-V

RISC-V is an instruction set architecture (ISAs) for hardware microprocessors [1].

As a general-purpose hardware microprocessor ISA, RISC-V is intended to efficiently execute any high-level programming language.

RISC-V is a register machine with a set of integer register with a fixed bitwidth (in this document we'll look at the 64-bit variant of RISC-V), and also another set of floating point registers (if the floating point extension is being used).

Introduction to WASM

WASM is a virtual machine designed to enable high performance applications on the Web [2]. It is designed to support many different high-level-languages.

WASM is considered to be higher-level than RISC-V and ARM, but lower-level than JVM, CIL, and LuaJIT?2. WASM is a sandbox that allows untrusted code to be executed without giving that code arbitrary access to the host machine.

WASM is a stack machine.

WASM has loads and stores for working with memory, but it also has local and global variables.

Unusually for a virtual machine that seeks to be a low-level target platform for many different high-level-languages, WASM has some block-structured control flow, and corresponding restrictions on jump and branch targets.

Introduction to LLVM

LLVM is considered to be higher-level than RISC-V and ARM, but lower-level than JVM, CIL, and LuaJIT?2.

Although LLVM was always intended to be a low-level target platform for many different high-level-languages, it was first used to support the compilation of C and C++.

LLVM code is structured into basic blocks. A basic block is "a straight-line code sequence with no branches in except to the entry and no branches out except at the exit." [3]. Some of the LLVM instructions (for example, branch or return) are called 'terminator instructions', and each LLVM basic block must end with one of these 'terminator instructions'.

Instead of registers (or a stack), LLVM has an infinite set of variables. These variables must be accessed in SSA (single-static-assignment) form, which means that you cannot reassign to the same variable twice.

Introduction to ARM Cortex

ARM is a family of hardware microprocessors. We will look at the instruction set architecture (ISAs) for the ARM Cortex M0, which is the ARM processor with the smallest instruction set.

As a general-purpose hardware microprocessor ISA, ARM is intended to efficiently execute any high-level programming language.

ARM Cortex M0 has 16 numbered 32-bit registers, of which 13 are GPRs and 3 are special (PC, link register, stack pointer). There are three other misc. registers (Program Status Register (PSR), PRIMASK (exception mask), CONTROL (stack control)). [4]

Introduction to JVM

Although the JVM is now host to multiple high-level languages, it was originally designed to run Java.

The JVM is considered a 'higher-level' virtual machine; it provides high-level-language data structures. The JVM is a sandbox that allows untrusted code to be executed without giving that code arbitrary access to the host machine.

The JVM is a stack machine.

Introduction to LuaJIT2

LuaJIT?2 is considered a 'higher-level' virtual machine.

LuaJIT?2 is a register machine.

Note: It was a difficult choice whether to analyze the Lua 5.1 bytecode instruction set or the LuaJIT? 2 bytecode instruction set. The Lua 5.1 bytecode instruction set is smaller and better documented (see the excellent http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf ). However, as the LuaJIT? 2 instruction set is optimized and had the opportunity to learn from the Lua 5.1 bytecode, I was interested in seeing what choices were made with regards to things like: which instructions should have immediate constant forms? which branching conditionals should exist?

The LuaJIT? 2 instruction set is clearly based on the Lua 5.1 instruction set, and seems to be for the most part an elaboration of it, with various additional specialization instructions added, various immediate constants added, some branching conditionals added, various 'marker' instructions to help with the JIT process added, etc.

Introduction to CLI

The CLI is considered a 'higher-level' virtual machine. It provides high-level-language data structures and enables interoperation between high-level languages. The CLI provides facilities to allow untrusted code to be executed without giving that code arbitrary access to the host machine.

The CLI is a stack machine.

The instruction set of the CLI is called the CIL (Common Intermediate Language).

Minutia

What is an instruction?

Generally in this document we consider the dividing line between instructions to be separate mnemonics.

Sometimes if a platform (e.g. LLVM) lists a group of related instructions together in its documentation, then we consider that whole group as one instruction (for instance some LLVM intrinsics have a type name in their mnemonic, but in the LLVM documentation, rather than having a separate section for each type variant, they are all listed together with an asterisk representing the type name).

RISC-V conventions

RISC-V variants

RISC-V has immediate variants. WASM doesn't have immediate variants (todo: is this correct?).

RISC-V is presented as a base integer instruction set (either RV32I (with 32-bit registers), RV32E (slimmed down embedded variant of RV32I), or RV64I (with 64-bit registers)), on top of which may be layered various extensions (such as M for multiplication, A for atomics, F for single-precision floating point, and D for double-precision floating point).

We will be assuming a processor with RV64IMAFD (also called RV64G), that is, a RISC-V processor (RV) with 64-bit integer registers (RV64I) with 64-bit multiply, atomic, single-precision floating point, and double-precision floating point extensions (IMAFD; the IMAFD selection of extensions can be abbreviated as "G" because IMAFD represents the "standard general-purpose ISA", according to the RISC-V spec).

We will be mainly discussing RV64I, in which the registers are 64-bits, but RISC-V also has a 32-bit variant (RV32I) in which the registers are all 32-bits instead of 64. Many operations are only available in 64-bit form in RV64I, but since RV32I would provide them in 32-bit form, we will usually mark these as supporting both 32- and 64-bits, for easier comparison with instruction sets like WASM which provide explicit 32- and 64- bit forms throughout.

RISC-V floating point operations are only included in the F (single) and D (double) extensions, not the base integer instruction set.

RISC-V typing

The RISC-V instructions are untyped and operate on memory and registers, so since i want to give types for everything (to facilitate comparison), here is the convention that i will use. The types indicated in the following will be i64 by default (64 because the RV64I registers are 64-bits). However, if the operation is clearly intended to work with unsigned quantities, the type shown will be 'u' instead of 'i'; and if the operation is clearly intended to work with less than 64 bits, 32, 16, or 8 will be shown instead of 64. TODO enforce this convention.

ARM conventions

Below, we'll say 'ARM' although we only mean ARM Cortex M0.

Polymorphic

Sometimes i write '(polymorphic)' next to an instruction. I usually only do that when the same instruction will be appearing multiple times in the same section. There are other instructions which are also polymorphic which do not get this marking.

Warnings, excuses, and qualifications

This document is probably full of errors. You would probably want this sort of comparison to be written by someone who has written code in each of the platform instruction sets being compared, or who at least has carefully read the documentation and specifications. Instead, what you are reading was written by someone who has merely glanced through the documentation, and who in many cases is making assumptions based only on the names of the instruction, without reading their definitions or using them in code!

This is, to some extent, an apples-to-oranges comparison. The different platforms being compared have different purposes and are different kinds of things.

These particular instruction sets were chosen for comparison because they possess the following properties:

They are general purpose
They have a relatively small number of instructions
They were developed or reimagined relatively recently (so presumably their designers learned from the lessons of the past) (the JVM is the exception here; it was introduced in 1994)
They are relatively widely-used (or at least much talked-about) (LuaJIT?2 is the exception here, but when people do talk about it, it seems to me to be widely regarded as well-designed)
Their description is free and open, and i find it easy to read

There is necessarily some subjective preference involved in grouping similar functionality together. In this document the choice of grouping affects how widely functionality is considered to be shared, and hence how 'popular' it is considered to be.

For example, LLVM 'constrained' arithmetic is somewhat similar to RISC-V arithmetic in that both offer a way of choosing rounding and exception modes; however LLVM offers a constrained floating-point remainder function whereas RISC-V does not offer a floating-point remainder function. Does that mean that LLVM's constrained.frem intrinsic should be counted as something supported by only one platform? I have instead chosen to only consider the larger grouping 'floating-point arithmetic with rounding/exception mode control', which is listed as something supported by two platforms, with a note that RISC-V does not offer a floating-point remainder instruction.

For example, i have chosen to group ARM and CIL's bitwise NOT with LuaJIT?'s boolean NOT.

For example, i have chosen to group RISC-V's 'classify' operation with CIL's 'ckfinite', even though 'classify' provides much more functionality.

For example, i have chosen to group RISC-V's load-release/store-conditional with LLVM's compare-and-swap, because even though these are very different operations, they can each be used as a primitive upon which to build synchronization/consensus/atomicity.

For example, i have chosen to break out 'constant loads from a constant pool/table' as separate functionality, but then also consider 'constant loads' as something supported by all 7 platforms (mixing together immediate constants and constant pools), even though there are many differences in the details (particularly bitwidths).

For example, i have chosen to group ARM's 'reverse subtraction' (rsbs) with integer negation.

For example, i have chosen to group together a wide variety of fence, synchronization, and volatile instructions/prefixes.

For example, i have chosen to group together a wide variety of memory-allocation-related instructions.

Sometimes i have chosen to list the same instruction in multiple places, causing it to be 'double-counted' in a sense. For example, RISC-V's unconditional jumps are listed once in the 'instructions supported by two platforms' in a section noting the link register functionality, but listed again later for their unconditional jump functionality.

References

Instruction lists were collected from these references in May 2019.

Short excerpts of text describing the semantics of instructions are taken, without quoting, from these references.

RISC-V:

https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf

Web Assembly (WASM):

LLVM:

https://llvm.org/docs/LangRef.html

ARM Cortex M0:

JVM:

https://en.wikipedia.org/wiki/Java_bytecode_instruction_listings
http://www.cs.vsb.cz/benes/vyuka/pre/lab/jvm/ (not sure if this is authoritative but it's informative)

CIL:

LuaJIT?2:

http://wiki.luajit.org/Bytecode-2.0
and, although this discusses a different (but related and better documented) instruction set, for context you may wish to read: http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf .

Concordance, divided into sections by popularity

To follow. Full concordance is at the end.

Concordance of instructions supported by all seven platforms in this study

Every platform in this study has:

some mechanism for specifying constants/literals
addition, subtraction, multiplication
instructions for a jump/unconditional branch to an immediate or label
some form of unconditional indirect branching
some form of branch-if-not-equal-to-zero/branch-if-true

Arithmetic

Constant loads

Every platform in this study has some mechanism for specifying constants/literals, however, in terms of INSTRUCTIONS, LLVM does not have a constant load instruction because it doesn't need one; constants can be assigned to a variable in the LLVM IR without an instruction. For this reason, constant load instructions are listed below in the section on instruction classes supported by six platforms.

Add, subtract, multiply

Every platform in this study has instrutions for addition, subtraction, and multiplication. However, one platform (ARM) is integer-only, whereas another platform (LuaJIT?2) is floating-point only. For this reason, these instructions are listed below in the section on instruction classes supported by six platforms.

Control flow

Jumps / unconditional branch

Jump to immediate / direct branch

Every platform in this study provides unconditional jumps/direct branches to an immediate or a label.

RISC-V and ARM provide link register variants of their indirect branch instructions.

JAL (Jump and Link) (32b) (RISC-V RV32I)
BR (WASM)
BR (unconditional form) (LLVM)
B BL (ARM)
goto goto_w (JVM)
jsr jsr_w (JVM) (deprecated)
jmp (LuaJIT?2)
br br.s (CIL)
jmp (CIL)

NOTE: when i say that a jump is 'PC-relative', often it is actually relative to the instruction following the jump instruction, rather than to the jump instruction itself. Sometimes i provide that level of detail, but often i omit it.

in RISC-V, the RISC-V JAL instruction is a PC-relative jump with a 20-bit signed immediate. The units of the immediate are multiplies of 2 bytes, so this means that RISC-V's JAL can encode a jump offset of +-1MiB?. However, jumps to anywhere in a 32-bit absolute address range are available using fixed two-instruction sequences involving a 20-bit immediate constant load followed by an indirect jump (JALR), which contains a base register and a 12-bit displacement. RISC-V JAL writes the address following the jump instruction into a destination register.

In WASM, i'm not quite sure that i understand the WASM BR instruction but i think its immediate argument is a label index, which do NOT "reference program positions in the instruction stream but instead reference outer control constructs by relative nesting depth" [5]. Recall that WASM only has structured control flow. In other words, a WASM BR can only target the end statements of enclosing blocks, and the immediate argument is how many enclosing blocks to break out of. The immediate argument is 32 bits.

In LLVM, the immediate of the BR instruction is of type 'label'. I couldn't find any specification on how many labels are allowed in a program, and LLVM's bitcode representation has variable-width integers, so presumably these labels can be represented by these variable length integers and there is no limit to how many there can be, although i didn't look into it too closely.

In ARM Cortex M0, the branch-with-link variant of the B instruction has a range of +-~16MiB?, relative to the PC register [6].

In the JVM, the goto instruction has a PC-relative signed range of 16 bits (+-~32KiB?), but must be confined to the same method. goto_w has a PC-relative signed range of 32 bits (+-~2GiB?), HOWEVER currently the JVM has a limit on method size of 64KiB? bytes, and goto_w also has the restriction that it must not cross methods, so effectively the limit of goto_w is +-~64KiB? [7].

In LuaJIT?2, the JMP instruction has a PC-relative signed range of 16 bits (+-~32KiB?) [8] (interestingly, this seems to have been 18-bits in the original Lua that LuaJIT?2 is based upon [9]).

In CIL, the BR instruction has a PC-relative signed range of 32 bits (+-~2MiB?), and the BR.S instruction has a PC-relative signed range of 8 bits (+-~128). BR instructions may not jump into or out of exception-handling blocks ("try, catch, filter, and finally blocks").

CIL's JMP is a higher-level unconditional direct jump which exits the current method and jumps to another method with the same calling convention, number and type of arguments as the current method.

Unconditional indirect branches

Every platform in this study offers some form of unconditional indirect branching. However, LuaJIT?2 only offers a higher-level form of indirect branching via its function CALL instructions. For this reason, low-level unconditional indirect branch instructions are listed below in the section on instruction classes supported by six platforms.

Conditional branches

Every platform in this study offers some form of branch-if-true, or branch-if-not-equal-to-zero, or branch-if-non-null. However, RISC-V does not need a separate instruction for this; it can use the 'zero register', which is a 'register' which always holds the value constant zero, as an operand to its BNE (branch-if-not-equal) instruction. For this reason, branch-if-not-equal-to-zero instructions are listed below in the section on instruction classes supported by six platforms.

Comparisons

Above, we discussed conditional branches. However, some platforms offer few compare-and-branch instructions, but also offer non-control-flow-altering compare instructions. In this section, we consider both compare and compare-and-branch instructions, and look at which types of comparisons are shared by all platforms, whether that comparison is offered as a (non-control-flow-altering) compare, or as a compare-and-branch.

Every platform in this study offers both of the following comparisons on integers if they support integers, and on floats if they support floats:

equality comparison
less-than comparison

Concordance of instructions supported by six platforms

In addition to the above, instructions for each of the following is provided by all but one platform in this study:

Arithmetic:

both 32-bit and 64-bit arithmetic
loading constants/literals
signed integer addition, subtraction, multiplication
floating-point addition, subtraction, multiplication, division
bitwise shifts: left shift, right shift signed (shift right arithmetic), right shift unsigned (shift right logical)
bitwise logical: and, or, xor

Control flow:

unconditional indirect branch
branch: branch-if-not-equal-to-zero/branch-if-true/branch-if-non-null
NOP (note: only an intrinsic in LLVM)

Arithmetic

Constant loads

WASM, JVM, CIL have instructions to directly load i32, i64, f32, f64 constants (but not unsigned?). RISC-V has various instructions that can be specialized to directly load integer constants the size of the registers (i64 if RV64I, or i32 if the chip is RV32I instead of RV64I); everything else must be synthesized/coerced.

Lua has f64 constants only (in addition to nils, booleans, strings), but LuaJIT?2 does have an instruction to load 16-bit immediate constants.

JVM does provide immediate constants of 8- and 16-bits (extended to ints of 32 bits) via bipush and sipush.

ARM has up to 32-bit constant loads via immediate mode of some load instructions.

LLVM doesn't need constant loads because constants can be assigned to a variable in the LLVM IR without an instruction.

(pseudoinstruction using ADDI or ORI) (i64, i32) (RISC-V RV32I/RV64I)
i32.const, i64.const, f32.const, f64.const (WASM)
MOV/MVN or LDR (ARM)
bipush sipush (JVM)
kdata kshort knum kpri knil (LuaJIT?2)
ldc.i4 ldc.i4.s ldc.i8 ldc.r4 ldc.r8 (CIL)

In RISC-V, ADDI and ORI can be used with the zero register to load 12-bit immediates. LUI loads a 20-bit immediate in the most-significant bits and zeros the other 12 bits. AUIPC is similar to LUI but then it also adds this offset to the PC.

In WASM, there are instructions to push integer and floating point 32-bit and 64-bit constants onto the stack.

In ARM, single-byte can be loaded using the MOVS instruction with an immediate constant. Otherwise, the constant must be placed in a literal pool within the instruction stream, and loaded from that literal pool using LDR (load) with a PC-relative offset. Literal pools must be manually placed using the LTORG assembler directive [10]), and must be within +1020 bytes of the current instruction [11]. The ARM assembler pseudo-instruction "LDR Rd, =const" chooses between MOV and LDR and places the constant into the literal pool as needed (an error is generated if a literal pool is required but none can be found near enough).

In JVM, there is a constant pool and instructions ldc, ldc_2, ldc2_w to load from it. There are also short instructions iconst_m1..iconst_5 to push the values -1..5 onto the stack as integers, as well aconst_null to push the null reference, dconst_0 and dconst_1 to push 0.0 and 1.0 doubles, fconst_0..fconst_2 to push 0.0, 1.0, 2.0 floats, lconst_0, lconst_1 to push 0, 1 longs. There are also bipush and sipush instructions for pushing 8-bit and 16-bit integers onto the stack.

In LuaJIT?2, there is a constant pool, and instructions KSTR, KCDATA, KNUM, KPRI to load from it. KPRI can load nil, false, true. There is also a KSHORT instruction to load a 16-bit signed immediate integer.

In CIL, there are the short instruction ldc.i4.m1..ldc.i4.8 to push the 32-bit integers -1..8 onto the stack. There are also ldc.i4.s to push an immediate byte onto the stack as a 32-bit integer, and ldc.i4, ldc.i8, ldc.r4, ldc.r8 to push immediate constant integers and floats, both 32-bit and 64-bit. There is also ldnull to push the null reference, and ldstr to push an immediate constant string (it's not clear to me if this is actually stored in the instruction stream, or if it's in a constant pool, although it seems to me that the spec says that the constant pool is not available at runtime).

Add, subtract, multiply, divide

Every platform in this study provides integer operations, except for LuaJIT?2, which only provides floating point. Every platform in this study provides floating-point operations, except for ARM, which only provides integers.

Every integer platform in this study provides both 32-bit and 64-bit integers, except for ARM, which only provides 32-bit integers, and RISC-V, which only provides integers of its register size (either 32-bit or 64-bit).

Every floating-point platform in this study provides both 32-bit and 64-bit floating-point, except for LuaJIT?2, which only provides 64-bit floating-point.

Every integer platform in this study provides signed integer addition, subtraction, and multiplication (this multiplication returns the lower half of the resulting bits; the "low order bits"; equivalently, the result mod 2^bitwidth). Every floating-point platform in this study provides floating-point addition, subtraction, multiplication, and division.

Andreas Olofsson of Adapteva noted in a blog post that RISC-V's FDIV (floating point division) instruction is "expensive" and that it was a "tough call" whether to include such an instruction in his Epiphany ISA [12]

proj-plbook-plChTargetLanguagesConcordance

Cross-platform concordance of instructions from seven intermediate- or assembly- language architectures: RISC-V, WASM, LLVM, ARM Cortex M0, JVM, LuaJIT2, CLI

Introduction

Introduction to the platforms

Introduction to RISC-V

Introduction to WASM

Introduction to LLVM

Introduction to ARM Cortex

Introduction to JVM

Introduction to LuaJIT2

Introduction to CLI

Minutia

What is an instruction?

RISC-V conventions

RISC-V variants

RISC-V typing

ARM conventions

Polymorphic

Warnings, excuses, and qualifications

References

Concordance, divided into sections by popularity

Concordance of instructions supported by all seven platforms in this study

Arithmetic

Constant loads

Add, subtract, multiply

Control flow

Jumps / unconditional branch

Jump to immediate / direct branch

Unconditional indirect branches

Conditional branches

Comparisons

Concordance of instructions supported by six platforms

Arithmetic

Constant loads

Add, subtract, multiply, divide