proj-plbook-plChTargetLanguagesConcordance3

Continued from Target Languages Concordance part II

Instruction lists from each platform

When instruction counts are given, we count mnemonics. Sometimes similar mnemonics are grouped together and counted as one.

RISC-V instruction list

RV32I (base 32-bit integer set; 47 instructions):

RV64I (base 64-bit integer set; 59 instructions) (note: RV64I includes everything in RV32I, but adapted for 64-bit, plus these) (12 new instructions and 3 new encodings of old instructions) (note: the instructions with 'W' at the end of their name are 32-bit versions of the instructions, since the un-suffixed instructions inherited from RV32I change to 64-bit in RV64I; the way i think about this is that un-suffixed instructions operate with whatever bitwidth the registers are, which is 32-bits in RV32I and 64-bits in RV64I, unless they are specifically made to operate on a certain bitwidth, in which case this is indicated with a suffix to the instruction name):

RV32M (multiply extension; 8 instructions):

RV64M (multiply extension; 13 instructions) (note: RV64M includes everything in RV32M, but adapted for 64-bit, plus these) (5 new instructions):

RV32A (32-bit atomics extension; 11 instructions):

RV64A (64-bit atomics extension; 22 instructions) (note: RV64A includes everything in RV32A, plus these) (11 new instructions):

RV32F (32-bit/single-precision floating point extension for RV32I: 26 instructions):

RV64F (32-bit/single-precision floating point extension for RV64I: 30 instructions) (note: RV64F includes everything in RV32F, plus these) (4 new instructions):

RV32D (64-bit/double-precision floating point extension for RV64I: 26 instructions):

RV64D (64-bit/double-precision floating point extension for RV64I: 32 instructions) (note: RV64D includes everything in RC32D, plus these) (6 new instructions):

(so, RV64IMAFD, otherwise known as RV64G, contains 156 instructions in total)

WASM instruction list (172 instructions)

control:

parametric:

constant loads:

loads and stores:

comparisons:

arithmetic:

conversions:

allocation: memory.size memory.grow

LLVM instruction list

LLVM instructions (64 instructions):

LLVM intrinsics (185 intrinsics, if the families denoted by the '*'s below are each grouped together and counted as one):

ARM Cortex M0 instruction list (59 instructions)

All instructions are 16-bit Thumb (Thumb-1) instructions except for the 32-bit Thumb instructions (Thumb-2) indicated.

These instructions are: "all of the 16-bit Thumb instructions from ARMv7-M excluding CBZ, CBNZ and IT" plus "the 32-bit Thumb instructions BL, DMB, DSB, ISB, MRS and MSR" [1].

Note: the ARM instruction mnemonics listed in [2] often have the letter 'S' at the end of them; for instance, 'ANDS' is the mnemonic for logical AND. In addition, sometimes there are two variants of an instruction, one with an 'S' and one without, for instance, MOVS and MOV. In these cases, the 'S' suffix means that the flags are updated. This 'S' suffix causes the instruction listing that we use here (from [3]) to differ slightly from (a) the instruction listing in https://en.wikipedia.org/wiki/ARM_Cortex-M#Instruction_sets and (b) the instruction names in the headings (but not the bodies) of sections within [4], both of which use mnemonics without this 'S' suffix (and combine mnemonics that differ only in the inclusion of this 'S').

Note: the set of mnemonics found at [5] is identical to that found at [6] except that the latter includes YIELD. [7] also includes YIELD. We do not include YIELD here because it is not in [8], and because [9] indicates in a footnote that it executes as NOP.

Note: as of this writing, the set of mnemonics found at [10] includes just one 'CPS' whereas the other sources have both CPSID and CPSIE. Here we include both CPSID and CPSIE.

JVM instruction list (206 instructions)

note: jsr, jsr_w, ret have effectively been deprecated; see [11]

JVM instruction list discussion:

Many of the JVM's instructions are organized around the types reference (also called address ('a'), or objectref), array, byte, char, double, float, integer, long, short.

The array type is a data structure. There are instructions to create new arrays and to get their length. For each of the, there are instructions to load items of that type from arrays and to store them into arrays.

There are a few types for which few instructions are provided that are specific to that type. These are byte, char, and short. Each of these have instructions to load and store the type from/into an array. Each of these have instructions to convert an integer into a value of the type. Byte and short have instructions to push an immediate constant of that type onto the stack.

This leaves what i'll refer to as the 'main types': reference, double, float, integer, long.

The main numerical types can be grouped into floating-point (double, float), and integral (integer, long). Out of the main numerical types, integer is the primary workhorse.

The reference type is a pointer. It has a distinguished element 'null'. This is the only type that can be thrown. Like the main numerical types, there are instructions to load references from/store them to variables, and return them from methods. There are compare-and-branch instructions that branch based on the equality, or lack of equality, of two references, and that branch based on whether or not a reference is null.

Each of the main numerical types has instructions to push constants 0 and 1 in that type onto the stack. Floats also have an instruction to push 2, and integers also have instructions to push -1, 3, 4, and 5. Each of the main numerical types has instructions for addition, subtraction, multiplication, division, remainder. Except for integers, they each have compare instructions, which push a bool or integer to indicate the result of the comparison (presumably integers don't need this because presumably the use of these comparison results is as inputs to compare-and-branch instructions, which take integer arguments directly). The floating-point (double and float) types have two variants of these compare instructions with different behavior only on NaNs?. Each of the main numerical types has instructions to load from/store to local variables. Each of the main numerical types has instructions to convert to each of the other main numerical types, and integers also can be converted to bytes, chars, and shorts. The integral types has bitwise logical and bitshift operators.

Integers have compare-and-branch instructions that branch based on the inequality of two integers; ==, !=, <=, <, >, >=. There are also compare-and-branch instructions that branch based on the inequality of one integer and zero (again, ==, !=, <=, <, >, >=). There is an increment operator that operates directly on integer variables.

There are instructions for loading constants from the constant table. There are a number of polymorphic stack instructions; various 'dup's, pops, and swap. Instructions for control flow include jumps and switches. There are many OOP instructions for working with objects, and for invoking and returning from methods. Finally, there are also synchronization and miscellaneous operations.

LuaJIT2 instruction list (94 instructions)

Comparison: islt isge isle isgt iseqv isnev iseqs isnes iseqn isnen iseqp isnep Unary Test and Copy: istc isfc ist isf Unary: mov not unm len binary: addvn subvn mulvn divvn modvn addnv subnt mulnv divnv modnv addvv subvv mulvv divvv modvv pow cat constant: kstr kdata kshort knum kpri knil Upvalue and Function: uget usetv usets usetn usetp uclo fnew table: tnew tdup gget gset tgetv tgets tgetb tsetv tsets tsetb tsetm Calls and Vararg Handling: callm call callmt callt iterc itern varg isnext Returns: retm ret ret0 ret1 Loops and branches: fori jfori forl iforl jforl iterl iiterl jiterl loop iloop jloop jmp Function headers: funcf ifuncf jfuncf funcv ifuncv jfuncv funcc funccw func

Discussion:

As an optimized instruction set, LuaJIT?2 includes various 'immediate' instructions, such as TGETB and TSETB, which index into a table data structure with an 8-bit immediate constant integer index.

The LuaJIT?2 instruction set is based upon the Lua instruction set, version 5.1 of which is documented at http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf .

CIL instruction list (229 instructions)

addition, subtraction, multiplication, division:

logical:

function calling:

conditional branches:

misc:

oop:

arrays:

jump:

comparisons:

floating-point specific:

conversions:

misc memory ops:

stack ops:

exception handling:

constant loads:

loads and stores:

allocation:

shifts:



Misc tables

List of conditionals by type, including both compares and compare-and-branches

In most of the previous sections, we separated comparison operations (with no control flow) from conditional compare-and-branch operations. Since some platforms have a greater variety of compare instructions and fewer compare-and-branch instructions, and others have fewer compare instructions and a greater variety of compare-and-branch instructions, this makes it more difficult to see the popularity of different comparison relations.

So, in this section we group instructions by which relation they test, regardless of whether they are comparison operations or compare-and-branch operations.

For each grouping, we give a count of the platforms which offer an instruction in that group. Since ARM doesn't have floating-point instructions and LuaJIT?2 doesn't have integer instructions, these counts are never more than 6. If the count is 5, we indicate which platforms are missing (which will always be one of ARM, LuaJIT?2, and then one other platform).

Equality

Equals

integer equals (6):

float equals (6):

note: RISC-V, JVM's branches don't work on floats. LuaJIT?'s branches are only on floats (and pris and strings)

Not-equals

integer not-equals (6):

float not-equals (5) (missing RISC-V, ARM):

Comparison Inequalities

Less-than

integer less-than (6):

float less-than (6):

unsigned integer less-than (5) (missing JVM, LuaJIT?2):

Less-than-or-equal-to

integer less-than-or-equal-to (5) (missing RISC-V, LuaJIT?2):

float less-than-or-equal-to (5) (missing ARM, JVM):

unsigned integer less-than-or-equal-to (4):

Greater-than-or-equal-to

integer greater-than-or-equal-to (6):

float greater-than-or-equal-to (4):

unsigned integer greater-than-or-equal-to (5) (missing JVM, LuaJIT?2):

Greater-than

integer greater-than (5) (missing RISC-V, LuaJIT?2):

float greater-than (5) (missing RISC-V, ARM):

unsigned integer greater-than (4):

Compares against zero

Equals-zero

unary equals-zero, or null, or false:

integer (4) (or, 6 if RISC-V and LLVM are counted):

notes:

float equals-zero, or null, or false (1) (or, 2 if LLVM is counted):

note: This comparision is not needed in LLVM because LLVM can always just compare to a constant.

Not-equals-zero

unary not-equals-zero or non-null or true (also, boolean conditional branch) (5) (or, 6 if RISC-V is counted): integer:

notes:

float not-equals-zero (1) (or, 2 if LLVM is counted):

note: This comparision is not needed in LLVM because LLVM can always just compare to a constant.

Other comparison operations

unary integer compare <0:

unary integer compare >=0:

other integer:

other float compare:

Trinary integer compare:

Summary of the platform counts in the above table

Integer

Floating-point

Sum over both of integer, floating-point

Count of platforms with either integer or float of operation

Unsigned integer

'Other comparison operators' omitted because they are not very popular.

Calling conventions

Here we only look at the two processor ISAs; the higher-level platforms are at a higher level of abstraction that do not require caller/callee-saving of registers.

Floating-point and vector registers will not be discussed; therefore references below to 'registers' may be read as 'integer registers'.

Registers listed with a special purpose (for example, a link/return address register or stack pointer), or argument-passing/return-value-passing registers, are not counted as either caller-saved or callee-saved (other sources might list the link register and argument-passing and return-value-passing as caller-saved, and the stack pointer as callee-saved, for example).

RISC-V

32 registers total.

Arguments are passed in 8 registers, x10-x17. Return values are passed in 2 registers, x10-x11.

7 registers are caller-saved: x5, x6, x7 and x28-x31. 11 registers are callee-saved: x9, x18-x27. The return address is in register x1. The stack pointer is in register x2. The remaining 3 registers are: zero register x0, global pointer x3, thread pointer x4, frame pointer x8.

References:

ARM Cortex M0

16 registers total.

Arguments are passed in 4 registers, r0 thru r3. Return values are passed in 2 registers, r0-r1.

1 register is caller-saved: r12. 7 registers are callee-saved: r4-r8 and r10-r11. r13 is the stack pointer, r14 is the link register, r15 is the program counter, r9 is the platform-specific 'platform register'.

References:

Others

For comparion, some other potentially relevant calling conventions include 64-bit ARM (AArch64), Microsoft x64, System V AMD64 , and x86 (32-bit) cdecl:

ARM 64-bit (32 registers total) passes arguments and return values in 8 registers, 0-7. 7 registers are caller-saved (9-15). 10 registers are callee-saved (19-28). The remaining registers are: the "Indirect result location register" (8), the "Intra-Procedure-call scratch register" (16 and 17), the platform-specific register (18), the frame pointer (29), the link register (30), and the stack pointer (SP).

The Microsoft x64 convention on x86_64 (16 registers total) passes arguments in 4 registers (RCX, RDX, R8, R9), and returns values in 1 (separate) register (RAX). 2 registers are caller-saved (R10, R11). 8 registers are callee-saved (RBX, RBP, RDI, RSI, R12, R13, R14, and R15). RSP is the stack pointer.

The System V AMD64 on x86_64 (16 registers total) convention passes arguments in 6 registers (RSI, RDX, RCX, R8, R9, R10), and returns arguments in 2 registers (RAX, RDX; RDX is used for values greater than 64 bits). 2 registers (RDI, R11) are caller-saved. 6 registers are callee-saved (RBX, RBP, and R12–R15). RSP is the stack pointer.

The cdecl calling convention on 32-bit x86 (8 registers total) passes no arguments in registers, and returns arguments in 1 register (EAX). 2 registers are caller-saved (ECX, and EDX). 4 registers are callee-saved (EBX, EBP, ESI, EDI). ESP is the stack pointer.

Summary table

==
total registers argument registers caller-saved callee-saved other name
81241 (SP)x86 32-bit cdecl
165281 (SP)x86_64 microsoft
167261 (SP)x86_64 System V AMD64
164174 (SP, LR, ..)ARM 32-bit
3287107 (SP, LR, FP, PC, ..)ARM 64-bit
3287116 (SP, LR, FP, ..)RISC-V
==

References



Discussion

From most popular to least

In the following, when we use terms like 'every platform' or 'whenever..', we are implicitly referring only to the 7 platforms in this study.

Every platform has facilities for:

The most commonform of arithmetic is signed 32-bit integer, however one high-level platform (LuaJIT?2) only supports 64-bit floating point. 5 of the 7 platforms support all four combinations of 32- and 64-bit, integer and floating point. Some of the integer platforms support unsigned integer operations throughout, but others only support unsigned operations in some places.

Whenever integer arithmetic is supported, 3 bitwise shifts and three bitwise logical operations are supported:

Whenever floating-point arithmetic is supported, addition, subtraction, multiplication, division are all found.

All platforms that support both integer and floating point support conversions between signed integer and floating-point, and all platforms that support both 32-bit and 64-bit floating point support conversions between floating-point quantities of different bitwidths.

NOP is in all platforms except for LuaJIT?.

Most of the platforms also support:

All register machines have MOV and all stack machines have DROP (also called POP). Both hardware processor ISAs and none of the others have:

Both hardware processor ISAs have supervisor call.

Instructions or intrinsics for each of the following is provided by three platforms in this study:

Arithmetic:

Memory access:

Control flow:

Allocation:

Data structures:

Instructions or intrinsics for each of the following is provided by two platforms in this study:

Arithmetic:

Memory access:

Stack ops:

Atomics and Sync:

Control flow:

Data structures:

Misc:

Arithmetic

Every platform has ways to specify constants/literals. Most platforms specify these only as immediates or directly in the IR, but two of them also have constant pools.

The most common number type if 32-bit signed integers, but 64-bit floats are also popular, and most platforms offer all of 32- and 64- bit, integer and float. All platforms offer some way to convert between the various types that they have. Some platforms offer coercive casting between types.

Every platform has ways to add, subtract, and multiply. Most platforms also offer division.

Every platform with integers offers at least 6 bitwise operations: shift left, shift right unsigned, shift right signed, and, or, xor.

Most platforms offer some unsigned operations and some offer all unsigned operations.

Most platforms offer compares of integer less-than, floating-point equality, and floating-point less-than (WASM and LLVM use these for branching, see below, but RISC-V and CIL also offer them). Some platforms offer more compares.

Some platforms offer additional arithmetic operations:

Two floating-point platforms offer rounding and exception modes.

Memory access

Most platforms have some form of integer and floating-point loads and stores. Some platforms also support local variables loads/stores.

Some platforms have memory allocation instructions, but they differ widely.

Two platforms have global variable loads/stores.

Register and stack ops

All register machines have copy/MOV and all stack machines have DROP (also called POP). Most stack machines have DUP.

Atomics and sync

Most platforms have some form of atomic or sync functionality, but they differ widely.

A few platforms offer AMOs (SWAP, ADD, AND, OR, XOR, MIN, MAX, MINU, MAXU) and either load-release/store-conditional, or compare-and-swap.

Control flow

Every platform have unconditional jump. Every platform has some form of branch-if-true, or branch-if-not-equal-to-zero, or branch-if-non-null.

Every platform has some form of unconditional indirect jump. Most platforms offer some form of 'switch'-statement-like indirect jump. The two hardware processor ISAs have unconstrained low-level indirect jumps, and the other platforms require all low-level indirect jumps to specify a list of all possible jump targets, although some of them offer higher-level unconstrained indirect jumps (to functions or methods).

Most platforms have branch on equality, inequality, less-than-or-equals, less-than, greater-than-or-equal, greater-than, false/equals-zero/is-null. Some platforms have branch on <0, >=0, unsigned <, unsigned >=.

Many platforms have compare-and-branch but some platforms require two instructions for these; either (a compare instruction which places a boolean in a register and then a branch instruction (WASM and LLVM do this)), or (an arithmetic instruction which sets a flag register and then a branch instruction which reads those flags (ARM does this)). Some platforms (WASM and LLVM, the same ones which split compares and boolean-conditional-branches) offer a non-branching SELECT operation.

Most platforms have subroutine support, with some form of CALL, RETURN, and some way to do indirect CALLs.

Many platforms have restrictions on jumping across function boundaries.

Some platforms have:

Comparisons

This section considers both (non-control-flow) compares and compare-and-branches together, to try and get insight into which types of comparison are the most popular.

Every platform in this study offers both of the following comparisons on integers if they support integers, and on floats if they support floats:

Every integer platform has facilities for all of following comparisons on integers, and the only non-integer platform (LuaJIT?2) also supports all of these on floats:

You might be wondering what's so special about greater-than-or-equal-to; i actually think it might just be on this list due to chance. You see, RISC-V has integer compare-and-branch operations for beq, bne, blt, bge (equals, not-equals, less-than, greater-than-or-equal-to). As noted above, every platform in this study has less-than on integers if they support integers, and on floating-point if they support floats. Greater-than-or-equal-to is not like that though; only 4 of the six floating-point-capable platforms have a floating-point greater-than-or-equal-to primitive. For example, in floating-point, RISC-V has equals, less-than, and less-than-or-equal-to. I don't know why they chose to have greater-than-or-equal-to for integers and less-than-or-equal-to for floats; for integers, it's really just a convention, because by reversing the integer arguments in the RISC-V instruction you can get greater-than and less-than-or-equal-to in addition to less-than and greater-than-or-equal-to. But my conclusion is that greater-than-or-equal-to is probably not that special.

If we had a much larger sample size (many more than 7 platforms) maybe we could get to the bottom of this, but from what we have here all that i can really conclude is that:

Data structures

Most platforms have vectors/arrays, and aggregates.

Some platforms have a 'length' instruction.

A few platforms have OOP, and strings.

Misc

Most platforms have NOP and breakpoint instructions.

The two hardware processor ISA platforms have link registers, special registers, and supervisor call.

A few platforms have cycle counters, and memory operations such as memcpy.

Related work

RISC-V Geneology

RISC-V Geneology surveys 18 instruction set architectures prior to RISC-V, "chosen primarily from earlier UC Berkeley RISC architectures and major proprietary RISC instruction sets".

They present a matrix of which instructions in each instruction set correspond to which RISC-V instructions, allowing me to present a count of the analogs of each instruction below.

The paper lists 98 instructions which have analogs that "appear in at least three" prior ISAs, which is (with counts of their prior analogs in parentheses) (grouped by function by me):

(note: the paper lists RDINSTRET as having at least 3 analogs, but in their matrix it has only 1; perhaps they made a mistake with that one; so the list above only has 97 instructions, not 98).

The instructions with >=3 analogs but <8 are:

Here is the subset of the instructions with at least 8 prior analogs found:

Note that now we have lost the unsigned div/rem instructions, all concurrency except LR/SC, cycle counts, fused multiply/add, and floating point sign instructions (perhaps some of the prior ISAs had instructions like FABS and FNEG, which are only assembler pseudoinstructions in RISC and hence not listed here, to replace the RISC-V sign instructions).

The instructions with >=8 analogs but <11 are:

Here is the subset of the instructions with at least 11 prior analogs found:

Note that now we have lost LUI, unsigned compare-and-branch instructions, the high-bits-mul instruction, division, all concurrency (perhaps some of the prior ISAs had other concurrency mechanisms though), breakpoints, sqrt, floating point LE comparison, and floating-point exception and rounding mode instructions.

The instructions with >=11 analogs but <16 are:

Here is the subset of the instructions with at least 16 prior analogs found:

Note that now we have lost JAL, compare-and-branch instructions except for (not)-equality predicates, loads and stores of all word sizes except for 32-bit, multiplication, immediate shift and logical (in fact, all immediates except for ADDI; presumably all ISAs provide some way to load immediates, however), and all of the floating-point loads/stores, conversions, and comparisons.

Let's take stock of what remains; these appear to be the common core instructions. I've left out the floating point add/sub/mul/div, because these appear to me to be pretty useless without any floating-point loads/stores, conversion, and comparisons; presumably older ISAs had some way to use them however.

This is 15 instructions.




TODO

goals for this document:

Revise the 'concordance' of RISC-V, WASM, ARM Cortex M0, LLVM, JVM, LuaJIT?2 instructions into a readable list of instructions, grouped by type of purpose, referencing their analogs in each of those systems (hence the word 'concordance'), with a description of the semantics of each instruction (and how the various systems differ).

todo: