Bayle Shanks's website: proj-plbook-plChArmIsa

Table of Contents for Programming Languages: a survey

ARM: Intro

https://en.wikipedia.org/wiki/ARM_architecture#32-bit_architecture

http://users.ece.utexas.edu/~valvano/EE345M/Arm_EE382N_4.pdf

https://sourceware.org/cgen/gen-doc/arm-thumb-insn.html list of instructions with names, todo

A recent addition to the ARM ISA family is ARM64 (ARMv8 A64 / AArch64), described on the pages http://www.arm.com/products/processors/instruction-set-architectures/index.php http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0677b/ch01s01.html http://www.arm.com/files/downloads/ARMv8_Architecture.pdf http://www.cs.utexas.edu/~peterson/arm/DDI0487A_a_armv8_arm_errata.pdf http://www.arm.com/files/pdf/ARMv8R__Architecture_Oct13.pdf.

ARM has various versions and 3 profiles; A (full-features for use as e.g. CPU of smartphone or computer; has virtual addressing MMU), R (real-time, for use in e.g. car engines; has deterministic (i think) physical addressing MMU), M (microcontroller; only supports Thumb ISA). The latest version is v8, but according to the ARM Wikipedia page only A and R profiles are (yet) available for v8. v7 has all 3 profiles (e.g. http://web.eecs.umich.edu/~prabal/teaching/eecs373-f10/readings/ARMv7-M_ARM.pdf ). There's also an E-M which is like M with a DSP extension, found in v7.

ARM Thumb: "The Thumb instruction set is a subset of the most commonly used 32-bit ARM instructions." -- (ARM7TDMI Technical Reference Manual Revision: r4p1) "The Thumb instruction set provides better code density, at the expense of inferior performance....Thumb-2, a major enhancement of the Thumb instruction set. Thumb-2 provides almost exactly the same functionality as the ARM instruction set. It has both 16-bit and 32-bit instructions, and achieves ARM-like performance with Thumb-like code density." -- (RealView? Compilation Tools Assembler Guide Version 4.0) https://en.wikipedia.org/wiki/ARM_Cortex-M

"The biggest register difference involves the SP register. The Thumb state has unique stack mnemonics (PUSH, POP) that don't exist in the ARM state. These instructions assume the existence of a stack pointer, for which R13 is used. They translate into load and store instructions in the ARM state. " -- http://www.embedded.com/electronics-blogs/beginner-s-corner/4024632/Introduction-to-ARM-thumb

"The original Thumb-Instruction set only contained 16-bit instructions. Thumb2 introduced mixed 16/32 bit instructions....The ARM processor has 2 instruction sets, the traditional ARM set, where the instructions are all 32-bit long, and the more condensed Thumb(2) set, where most common instructions are 16-bit long (and some are 32-bit long)." -- http://stackoverflow.com/questions/10638130/thumb-instruction-in-arm

Some instructions have immediate addressing modes and others do not. i won't bother to include that information because my interest here is mainly in the instruction set. I leave out some instructions that are, to me, uninteresting variants of existing ones. Note that the purpose of these listings is not accuracy, but rather to get a sense of what sorts of instructions are in RISC-ish CPU instruction sets.

Note that in Thumb2, instructions cannot reference the PC (program counter) or SP (stack pointer) as operands, including destination operand, unless noted. Note that every instruction that returns a result takes an operand specifying the destination register; operations are NOT done in place on the input registers (except when the destination register given is the same as an input register).

ARM has 'barrel shifting', meaning that shifts and rotates can be performed on operands without issuing separate instructions.

It has a clever way of representing 32-bit immediate values with only 8 bits plus 4 bits to determine a shift, which allows it to represent any power of 2 as an immediate value: http://alisdair.mcdiarmid.org/2014/01/12/arm-immediate-value-encoding.html . "Thumb-2 immediate encoding is even more gleeful--in addition to allowing rotation, it also allows for spaced repetition of any 8-bit pattern (common in low level hack patterns, like from [1]) to be encoded in single instructions." -- https://news.ycombinator.com/item?id=7046803 . If the value you want isn't accessible as an immediate, you can load it from a constant table or you can compute it, or some instruction sets have MOVW and MOVT which can construct and combine 16-bit immediates into a 32-bit value. Some assemblers let you just specify the immediate and the assembler figures out how to get it ( https://news.ycombinator.com/item?id=7045898 ).

ARM instructions traditionally encoded a conditional execution field, allowing instructions to be skipped depending on the flags, without doing a branch. On ARM64 this has been changed:

" arm64 ... sort of ditches conditional execution. It’s not on every instruction any more, but it’s still available on more instructions than on most other arches.

To the usual complement of typical conditional instructions (branch, add/sub with carry, select and set), arm64 adds select with increment, negate, or inversion, the ability to conditionally set to -1 as well as +1, and the ability to conditionally compare and merge the flags in a fairly flexible manner (it’s really a conditional select of condition flags between the result of a comparison and an immediate). This actually preserves most of the power of conditional execution (except for really exotic hand-coded usages), while taking up much less encoding space. " -- stephencanon , https://news.ycombinator.com/item?id=7047762

ARM has 8 Operating Modes ). "Each mode has its own mode-specific registers, including a status register":

User – normal operation
Fast interrupt – handling of ”fast” interrupts
Interrupt – handling of all other interrupts
Supervisor – operating system protected mode
Abort – abortion of memory access
System – operating system privileged mode
Undefined – invalid instruction in stream
Secure monitor – on-chip security features

(descriptions from http://www.cs.virginia.edu/~skadron/cs433_s09_processors/arm11.pdf )

Addressing modes ( http://www.cs.uregina.ca/Links/class-info/301/ARM-addressing/lecture.html ):

register
absolute
immediate
register indirect
register indirect with immediate offset
register indirect preincrementing by immediate offset
register indirect postincrementing by immediate offset
register indirect with register offset
register indirect with register offset with scaling

For ARM64 (AArch64), see also https://developer.arm.com/documentation/102374/0101/Loads-and-stores---addressing , which presents just 4 addressing modes applied only to loads/stores:

simple (register), offset (register + immediate offset), pre-indexed (register += immediate offset), post-indexed (like pre-indexed except the address used is before adding the offset)
the AArch64 spec, ( https://developer.arm.com/documentation/ddi0487/latest/ ), also mentions a PC-relative addressing mode called 'literal'

The AArch64 spec, ( https://developer.arm.com/documentation/ddi0487/latest/ ), speaks of other "addressing modes", but afaict from section C1.3 "Address generation" subsection "Address calculation", these are just ways to compute addresses with instructions like ADD, rather than ways to avoid using a separate instruction to compute an address.

The notes in section C1.3 "Address generation" subsection "Address calculation" indicate that when using an ADD instruction to add an immediate offset to a base address, the size of the immediate is 12 bits.

I can't tell if there is a way to use a single ADD instruction to compute (base + scale*index + immediate_offset), but it appears to me that this would require two instructions, one to add the scaled index, and a second to add the immediate offset.

ARM: 16-bit Thumb2 instructions

MOV LSL r1 r2 r3 (logical shift left; r1 := r2 << r3) LSR ASR (arithmetic shift left) ADD (note; the source and/or destination operands for ADD can include SP, the stack pointer; in this way you can get the SP into a register) SUB (note; the source and destination operands for SUB can include SP, the stack pointer)

ADR (Add immediate to program counter; in this way you can get the PC into a register; useful for getting the address of a 'label' if your assembler translates labels to relative offsets )

CMP

AND EOR (xor)

ADC (Add with Carry; a + b + carry bit) SBC (Subtract with Carry; a - b - carry bit) ROR (Rotate Right) TST (Test bits: TST x y: update condition code flags on Rn AND Rm) RSB (Reverse subtract (from zero; e.g. negate)) CMP (update condition code flags on Rn - Rm) CMN (Compare Negative; update condition code flags on Rn + Rm) ORR (or) MUL BIC (Bit Clear: x AND (NOT y)) MVN (Move Negative/NOT: binary negation)

BL (branch with link; BL <label>: LR register = address of next instruction, PC = label)

BX (Branch and Exchange; this is used to enter/exit "thumb state") BLX (Branch with Link and Exchange; this is used to enter/exit "thumb state")

Load and store:

STR (Store word. Addressing modes include immediate, register offset, PC offset, SP offset. Can store list of multiple registers (STMIA).) also STRH for store halfword, STRB for byte

LDR (Load word. Addressing modes include immediate, register offset, SP offset. Can load list of multiple registers (LDMIA).) also LDRH for Load unsigned halfword, LDRSH for signed halfword, LDRB for unsigned byte, LDRSB for signed byte

LDR (load from literal pool instrs) B (unconditional, conditional branch instructions: takes as an operand a 'condition field' (this is different from a condition code), which is one of equal, not equal, Carry Set / Unsigned higher or same, Carry Clear / Unsigned lower, Negative, Positive or zero, Overflow, No overflow, Unsigned higher, Unsigned lower or same, Signed greater than or equal, Signed less than or equal, Signed greater than, Signed less than, always

SVC (service (system) call instructions; formerly SWI) SETEND (set endianness) CPS (change processor state; enables and disables specified interrupts) BKPT (software breakpoint) IT (If-Then; "Makes up to four following instructions conditional, according to pattern. pattern is a string of up to three letters. Each letter can be T (Then) or E (Else)."

Adjust stack pointer instructions Increment stack pointer ADD (SP plus immediate) Decrement stack pointer SUB (SP minus immediate)

Sign or zero extend instructions (these are used to convert a signed or unsigned value of a certain byte width into a value of a larger byte width, e.g. to convert a signed byte representing "-10" to a signed word representing "-10"; see http://odellconnie.blogspot.com/2012/03/sign-extension-zero-extension.html ) SXTH (Signed Extend Halfword to Word: SXTH Rd Rm: Rd[31:0] := SignExtend?(Rm[15:0])) SXTB (Signed Extend Byte to Word: Rd[31:0] := SignExtend?(Rm[7:0]) UXTH (Unsigned Extend Halfword to word: Rd[31:0] := ZeroExtend?(Rm[15:0])) UXTB (Unsigned Extend Byte to word: Rd[31:0] := ZeroExtend?(Rm[7:0]))

Compare and branch on (non-)zero instructions CBZ (Compare and branch on zero; CBZ r <label>: if r == 0, goto <label>) CBNZ (Compare and branch on non-zero)

PUSH (push selected registers onto stack) POP (push selected registers from stack)

Reverse byte instructions REV (Byte-Reverse Word, e.g. reverse the ordering of the four bytes in the word (and put the result in the destination register)) REV16 (Byte-Reverse Packed Halfword, e.g. reverse the ordering of the two bytes in both halfwords) REVSH (Byte-Reverse Signed Halfword, e.g. reverse the bytes in the low halfword, and sign extend the result to will the whole word)

NOP-compatible hint instructions: NOP YIELD (Yield control to alternative thread) WFE (Wait For Event) WFI (Wait For Interrupt) SEV (Send event; signal event in multiprocessor system)

ARM: 32-bit Thumb2 instructions

ORN (OR (not)) TEQ (update condition code flags on a XOR b) MOVT (move the source halfword into the top halfword of the destination register) BFC (Bit Field Clear; set specified bits to zero; takes a starting bit and a bitwidth) BFI (Bit Field Insert; set specified bits to specified values; takes a starting bit and a bitwidth and a source value)

SBFX (Signed Bit Field extract) SSAT (Signed saturate, LSL, ASR) SSAT16 (Signed saturate 16-bit) UBFX (Unsigned Bit Field extract) USAT (Unsigned saturate, LSL, ASR) USAT16 (Unsigned saturate 16-bit)

PKH (Pack halfword, BT, TB) RRX (Rotate Right with Extend)

Signed and unsigned extend instructions with optional addition: SXTAB (Signed extend byte and add) SXTAB16 (Signed extend two bytes to halfwords, and add) SXTAH (Signed extend halfword and add) SXTB16 (Signed extend two bytes to halfwords) UXTAB (Unsigned extend byte and add) UXTAB16 (Unsigned extend two bytes to halfwords, and add) UXTAH (Unsigned extend halfword and add) UXTB16 (Unsigned extend two bytes to halfwords)

SIMD add and subtract: QADD16, UADD16, QADD8, UADD8, QASX, UASX, QSUB16, UHADD16, QSUB8, UHADD8, QSAX, UHASX, SADD16, UHSUB16, SADD8, UHSUB8, SASX, UHSAX, SHADD16, UQADD16, SHADD8, UQADD8, SHASX, UQASX, SHSUB16, UQSUB16, SHSUB8, UQSUB8, SHSAX, UQSAX, SSUB16, USUB16, SSUB8, USUB8, SSAX

Mnemonic element Meaning: Q prefix Signed saturating arithmetic. S prefix Signed arithmetic, modulo 28 or 216. SH prefix Signed halving arithmetic. The result of the calculation is halved. U prefix Unsigned arithmetic, modulo 28 or 216. UH prefix Unsigned halving arithmetic. The result of the calculation is halved. UQ prefix Unsigned saturating arithmetic. 16 suffix The instruction performs two 16-bit calculations. 8 suffix The instruction performs four 8-bit calculations. ASX mnemonic The instruction performs one 16-bit addition and one 16-bit subtraction. The X indicates that the halfwords of the second operand are exchanged before the operation. SAX mnemonic The instruction performs one 16-bit subtraction and one 16-bit addition. The X indicates that the halfwords of the second operand are exchanged before the operation.

CLZ (Count Leading Zeros (just what is sounds like)) QADD (Saturating Add) QDADD (Saturating Double and Add) QDSUB (Saturating Double and Subtract) QSUB (Saturating Subtract) RBIT (Reverse Bits) SEL (Select bytes; passed 4 bits in GE register, which control, in each of the four word positions of the output, which word out of the two input bytes will contribute that byte)

multiply/divide and accumulate (add/subtract the result of multiplying to the destination, in-place), with various different byte widths of the operands and destination register(s): MLA (multiply and accumulate; x + (y*z)) MLS (multiply and subtract) SMLAxy (Signed Multiply-Accumulate Add, with double-length result) SMLAD (Signed Dual Multiply-Accumulate Add) SMLAWx (Signed Multiply-Accumulate Add) SMLSD (Signed Dual Multiply Subtract and Accumulate) SMMLA (Signed 32 + 32 x 32-bit, most significant word) SMMLS (Signed 32 – 32 x 32-bit, most significant word) SMMUL (Signed 32 x 32-bit, most significant 32-bit word) SMUAD (Signed Dual Multiply Add) SMULxy SMULWx SMUSD (Signed Dual Multiply Subtract) USAD8 (Unsigned Sum of Absolute Differences) USADA8 (Unsigned Accumulate Absolute Differences)

with 64-bit results (two registers to hold result): SMULL (Signed multiply with double-length result) UMULL (Unsigned multiply with double-length result) SDIV (Signed divide) UDIV (Unsigned divide) SMLALxy (Signed multiply with double-length result and accumulate) SMLALD (Signed Multiply Accumulate Long Dual) SMLSLD (Signed Multiply Subtract accumulate Long Dual) UMLAL (Unsigned 64 + 32 x 32) UMAAL (Unsigned multiply and accumulate with double-length result)

loads and stores:

add versions for postindexing, and for double words
PLD, PLI (preload)

LDRD (load double) STRD (store double) LDREX (load exclusive word; something to do with semaphores) STREX (store exclusive word; something to do with semaphores) CLREX (clear local processor exclusive tag; something to do with semaphores)

TBB (Table Branch Byte) TBH (Table Branch Halfword)

LDMDB / LDMEA (Load Multiple Decrement Before / Empty Ascending) RFE (Return From Exception) SRS (Store Return State) STMDB / STMFD on page 4-333 (Store Multiple Decrement Before / Full Descending)

MRS (Move from Status register to ARM Register, e.g. put the condition codes into a register) MSR (Move from ARM register to Status register, e.g. copy a register over the condition codes) SUBS (Return From Exception without stack)

DBG (Debug hint)

Special control operations: CLREX (Clear Exclusive) DSB (Data Synchronization Barrier) DMB (Data Memory Barrier) ISB (Instruction Synchronization Barrier)

Coprocessor instructions: not listed

Links:

ARM: Cortex M profile

Cortex M0, M0+, and M1 only have these instructions:

16-bit: ADC, ADD, ADR, AND, ASR, B, BIC, BKPT, BLX, BX, CMN, CMP, CPS, EOR, LDM, LDR, LDRB, LDRH, LDRSB, LDRSH, LSL, LSR, MOV, MUL, MVN, NOP, ORR, POP, PUSH, REV, REV16, REVSH, ROR, RSB, SBC, SEV, STM, STMIA, STR, STRB, STRH, SUB, SVC, SXTB, SXTH, TST, UXTB, UXTH, WFE, WFI, YIELD

32-bit: BL (branch with link), DMB (Data Memory Barrier; Ensure the order of observation of memory accesses), DSB (Data Synchronization Barrier; Ensure the completion of memory accesses), ISB (Instruction Synchronization Barrier; flush processor pipeline and branch prediction logic), MRS (Move from Status register), MSR (move to status register)

Note that the 16-bit instruction set is identical to the 16-bit thumb-2 instruction set above, except for SETEND (set endianness), IT (if-then), CBZ (Compare and branch on zero), CBNZ. (also, BL here appears only as 32-bit, whereas it was in the 16-bit instruction set, but I think that BL is actually 32-bits in the 16-bit instruction set in some way, not sure i understand that though). IT, CBZ, CBNZ are added in the Cortex M3, as well as a bunch of 32-bit instructions:

new 32-bit instructions in the Cortex M3: BFC (Bit Field Clear), BFI (Bit Field Insert), CDP (?), CLREX (clear local processor exclusive tag), CLZ (count leading zeros), DBG (debug hint), various loads (LDC, LDMA, LDMDB, LDRBT, LDRD, LDREX, LDREXB, LDREXH, LDRHT, LDRSB, LDRSBT, LDRSHT, LDRT), MCR (?), MLS (multiply and subtract), MCRR (?), MLA (multiply and accumulate; x + (y*z)), MOVT (move the source halfword into the top halfword of the destination register), MRC (?), MRRC (?), ORN (x or (not(y)), PLD (preload data), PLDW, PLI (preload instructions), RRX (Rotate Right with Extend), SBFX (Signed Bit Field extract), SDIV (Signed divide), SMLAL (an SMULL-like thingee), SMULL, SSAT (signed saturate), STC (?), various stores (STMDB, STRBT, STRD, STREX, STREXB, STREXH, STRHT, STRT), TBB (Table Branch Byte), TBH (Table Branch Halfword), TEQ (update condition code flags on a XOR b), UBFX (Unsigned Bit Field extract), UDIV (Unsigned divide), other multiply, multiply-accumulate, and saturate instructions (UMLAL, UMULL, USAT)

Note that http://www.eetimes.com/document.asp?doc_id=1319726 claims that "SoCs? based on ARM's M0+ Flycatcher core will not run Linux, although they do hit the sub-50-cent price point for the IoT?, including security engines and targeted peripherals."

As of this writing, the Cortex M0+ seems to be the leading design for 32bit tiny low-power devices. There are very small versions of them, e.g. http://cache.freescale.com/files/microcontrollers/doc/fact_sheet/KINETISKL02CSPFS.pdf?fpsp=1 which is 16 mm^2. This device runs about 48 MHz and the M0+ design yields about 1 MIPS/MHz, which means that according to http://www.roylongbottom.org.uk/mips.htm it's about as powerful as a 486! It has 32KB flash RAM (presumably for program storage) and 4 KB RAM. Intel recently released a small low-power chip called the Quark which is a SoC? with a 486 ISA, 512 KB SRAM, 16 KB cache.

ARM Cortex M0 instruction list

from http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0432c/CHDCICDF.html and http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0497a/CIHJJEIH.html

move: mov movs
arithmetic: add(s) adcs (add with carry) adr (PC-relative Address to Register) sub subs sbcs (sub with carry) rsbs (reverse subtract; negate) muls (multiply 32-bit with 32-bit result)
compare: cmp cmn (compare negative)
logical: ands (and) orrs (or) eors (xor) bics (bit clear) mvns (move NOT) tst (AND test)
bit shifts: lsls lsrs asrs rors
loads and stores: ldr (load) ldr(b

sh) (load byte

halfword

signed byte

signed halfword) ldm (load multiple) str (store) str(b

h) stm (store multiple) (push

pop) (push/pop registers onto/from stack)

control: b (branch, conditional or unconditional) bl (branch with link) bx (branch with exchange) blx (branch with link and exchange)
extend: (u

s)xt(b

h) (extend unsigned

signed byte

halfword)

byte-reverse: rev (reverse bytes in word) rev16 (reverse bytes in both halfwords) revsh (reverse signed bottom half word)
State change: svc (supervisor call) cpsi(d

e) (disable/enable interrupts) (mrs

msr) (read/write special register) bkpt (breakpoint)

Hint/events: sev (Send event) wfe (wait for event) wfi (wait for interrupt) yield (this is a no-op) nop
barriers: isb (instruction sync barrier) dmb (Data Memory Barrier) dsb (data sync barrier)

More notes on ARM Cortex Ms

from https://en.m.wikipedia.org/wiki/ARM_Cortex-M

" See also: ARM architecture § Instruction set

The Cortex-M0 / M0+ / M1 implement the ARMv6-M architecture,[9] the Cortex-M3 implements the ARMv7-M architecture,[10] and the Cortex-M4 / M7 implements the ARMv7E-M architecture.[10] The architectures are binary instruction upward compatible from ARMv6-M to ARMv7-M to ARMv7E-M. Binary instructions available for the Cortex-M0 / M0+ / M1 can execute without modification on the Cortex-M3 / M4 / M7. Binary instructions available for the Cortex-M3 can execute without modification on the Cortex-M4 / M7 / M33.[9][10] Only Thumb-1 and Thumb-2 instruction sets are supported in Cortex-M architectures, but the legacy 32-bit ARM instruction set isn't supported.

All six Cortex-M cores implement a common subset of instructions that consists of most Thumb-1, some Thumb-2, including a 32-bit result multiply. The Cortex-M0 / M0+ / M1 / M23 were designed to create the smallest silicon die, thus having the fewest instructions of the Cortex-M family.

The Cortex-M0 / M0+ / M1 include Thumb-1 instructions, except new instructions (CBZ, CBNZ, IT) which were added in ARMv7-M architecture. The Cortex-M0 / M0+ / M1 include a minor subset of Thumb-2 instructions (BL, DMB, DSB, ISB, MRS, MSR). The Cortex-M3 / M4 / M7 / M33 have all base Thumb-1 and Thumb-2 instructions. The Cortex-M3 adds three Thumb-1 instructions, all Thumb-2 instructions, hardware integer divide, and saturation arithmetic instructions. The Cortex-M4 adds DSP instructions and an optional single-precision floating-point unit (VFPv4-SP). The Cortex-M7 adds an optional double-precision FPU (VFPv5).[9][10] ...

    The 32-bit ARM instruction set is not included in Cortex-M cores.
    Endianness is chosen at silicon implementation in Cortex-M cores. Legacy cores allowed "on-the-fly" changing of the data endian mode.
    Co-processors aren't supported on Cortex-M cores.

" SysTick? timer: A 24-bit system timer that extends the functionality of both the processor and the Nested Vectored Interrupt Controller (NVIC). When present, it also provides an additional configurable priority SysTick? interrupt.[9][10][11] Though the SysTick? timer is optional, it is very rare to find a Cortex-M microcontroller without it. "

" Memory Protection Unit (MPU): Provides support for protecting regions of memory through enforcing privilege and access rules. It supports up to eight different regions, each of which can be split into a further eight equal-size sub-regions.[9][10][11] " -- Cortex-M3, M4, M7, and M23 have an MPU option

M0, M1, M0+ and the new M23 are Von Neumann; M3, M4, M4 are Harvard.

" The Cortex-M0 core is optimized for small silicon die size and use in the lowest price chips. (ARMv6-M) ... The Cortex-M0+ is an optimized superset of the Cortex-M0. (ARMv6-M architecture) ... The Cortex-M1 is an optimized core especially designed to be loaded into FPGA chips. (ARMv6-M) ... (Cortex-M3 is ARMv7-M) ... Conceptually the Cortex-M4 is a Cortex-M3 plus DSP instructions, and optional floating-point unit (FPU). If a core contains an FPU, it is known as a Cortex-M4F, otherwise it is a Cortex-M4. (ARMv7E-M) ... The Cortex-M7 is a high-performance core with almost double the power efficiency of the older Cortex-M4. (ARMv7E-M) ... The Cortex-M23 core was announced in October 2016[23] and based on the newer ARMv8-M architecture that was previously announced in November 2015.[24] Conceptually the Cortex-M23 is similar to a Cortex-M0+ plus integer divide instructions and TrustZone? security features, and also has a 2-stage instruction pipeline. ... The Cortex-M33 core was announced in October 2016[23] and based on the newer ARMv8-M architecture that was previously announced in November 2015.[24] Conceptually the Cortex-M33 is similar to a Cortex-M4 plus TrustZone? security features, and also has a 3-stage instruction pipeline. "

note: the 32-bit multiply on the Cortex-M23 only gives a 32-bit result! (the lower 32 bits)

"The Cortex-M0 / M0+ / M1 / M23 only has 32-bit multiply instructions with a lower-32-bit result (32bit × 32bit = lower 32bit), where as the Cortex-M3 / M4 / M7 / M33 includes additional 32-bit multiply instructions with 64-bit results (32bit × 32bit = 64bit)."

relevant parts of table "ARM Cortex-M instruction groups":

all Cortex-M parts have the following Thumb1 instrs (16-bit):

ADC, ADD, ADR, AND, ASR, B, BIC, BKPT, BLX, BX, CMN, CMP, CPS, EOR, LDM, LDR, LDRB, LDRH, LDRSB, LDRSH, LSL, LSR, MOV, MUL, MVN, NOP, ORR, POP, PUSH, REV, REV16, REVSH, ROR, RSB, SBC, SEV, STM, STMIA, STR, STRB, STRH, SUB, SVC, SXTB, SXTH, TST, UXTB, UXTH, WFE, WFI, YIELD (52 instrs)

and the following Thumb2 instrs (32-bit):

BL, DMB, DSB, ISB, MRS, MSR (6 instrs)

The M23 but not the M0+ has the following Thumb1 (16-bit):

CBNZ, CBZ

and the following Thumb2 (32-bit):

SDIV, UDIV

The M23 and M33 only have the following trustzone instrs:

16-bit: BLXNS, BXNS 32-bit: SG, TT, TTT, TTA, TTAT

Links:

https://en.wikipedia.org/wiki/ARM_Cortex-M

ARM Cortex M4 floating-point

ARM Cortex M4 floating-point:

"The FPU fully supports single-precision add, subtract, multiply, divide, multiply and accumulate, and square root operations. It also provides conversions between fixed-point and floating-point data formats, and floating-point constant instructions."

Instructions:

arithmetic: VADD, VSUB, VMUL, VDIV, VNEG, VABS, VSQRT
comparisons: VCMP, VCMPE (compare; E variant means raise exception even if either operand is quiet NaN?; o/w it raises exception only if either operand is signaling NaN?)
conversions: VCVT, VCVTR, (convert; 'R' means to use custom rounding mode instead of rounding towards 0 -- only applicable to conversion from float to int),
loads and stores and movs: VLDR, VSTR, VMOV, multiple load/store: VLDM, VSTM, special register load/store: VMRS (load from special), VMSR (store to special)
multiply variants: VMLA (then accumulate), VMLS (then subtract), VNMLA (then negate, then accumulate), VNMLS (then subtract, then negate), VNMUL (negate then multiply)
stack ops: VPOP, VPUSH,

" 7.2.5 Complete implementation of the IEEE 754 standard

The Cortex‑M?4 FPU supports fused MAC operations as described in the IEEE standard. For complete implementation of the IEEE 754-2008 standard, floating-point functionality must be augmented with library functions. The Cortex‑M?4 floating point instruction set does not support all operations defined in the IEEE 754-2008 standard. Unsupported operations include, but are not limited to the following:

    Remainder.
    Round floating-point number to integer-valued floating-point number.
    Binary-to-decimal conversions.
    Decimal-to-binary conversions.
    Direct comparison of single-precision and double-precision values."

" The FPU sets the cumulative exception status flag in the FPSCR register as required for each instruction, in accordance with the FPv4 architecture. The FPU does not support exception traps. The processor also has six output pins, FPIXC, FPUFC, FPOFC, FPDZC, FPIDC, and FPIOC, that each reflect the status of one of the cumulative exception flags. See the Cortex®‑M?4 Integration and Implementation Manual for a description of these outputs. "

ARM interrupts

ARM provides nested vectored interrupts. "Nested" because if another interrupt occurs while the first one is executing, the currently executing interrupt may itself be interrupted. "Vectored" because each interrupt causes the code at the corresponding interrupt handler entry point to be executed [2] (as opposed to the alternative, "polled" interrupts, in which, during an interrupt, the system calls each handler out of a large group of handlers until one handler 'claims' the interrupt).

ARM has some built-in interrupt types (these have IRQ numbers less than 0); see [3] for a list of built-in interrupt types [4]. It also supports vendor specific interrupt types (which have non-negative IRQ numbers) "typically for devices like UART/I²C?/USB/etc" [5].

Interrupts have priority levels. in ARM, lower priority is more urgent (that is, an interrupt may interrupt another currently executing interrupt if the currently executing interrupt has a higher priority number). There must be at least 4 priority levels available on any Cortex M0 or M0+ device; more on Cortex M3/M4/M7 devices. There are also 'subpriority' levels, which are used to determine which interrupt goes first when multiple interrupts are pending.

You can temporarily disable all interrupts. Disabling interrupts is sometimes called "masking" them [6], but other times a distinction is made between disabling (where the interrupt is never emitted or is completly ignored) and masking (where the interrupt is omitted but held for later) [7].

On Cortex M3/M4/M7, but not on M0/M0+, you can also temporarily disable all interrupts higher than a certain priority (that is, LESS urgent than the given priority).

For each interrupt, you can also:

enable/disable it (individually)
change its priority (except Reset (-3), NMI (-2) and HardFault? (-1) which "have a fixed (negative) priority and cannot be disabled" [8]. Reset is "invoked on power up or a warm reset". NMI is "A Non Maskable Interrupt (NMI) can be signalled by a peripheral or triggered by software". "A HardFault? is an exception that occurs because of an error during exception processing, or because an exception cannot be managed by any other exception mechanism" [9].
mark it as pending, or clear this mark

The other built-in interrupts are: [10]

(not M0) MemManage? "a memory protection related fault"
(not M0) BusFault? "a memory related fault for an instruction or data memory transaction. This might be from an error detected on a bus in the memory system."
(not M0) UsageFault? "an undefined instruction, an illegal unaligned access, invalid state on instruction execution, an error on exception return", and maybe "unaligned address on word and halfword memory access", and maybe "division by zero."
SVCall "supervisor call (SVC) is an exception that is triggered by the SVC instruction. In an OS environment, applications can use SVC instructions to access OS kernel functions and device drivers."
PendSV? "an interrupt-driven request for system-level service. In an OS environment, use PendSV? for context switching when no other exception is active."
SysTick? (optional on M0) (IRQ #-1) "an exception the system timer generates when it reaches zero. Software can also generate a SysTick? exception. In an OS environment, the processor can use this exception as system tick."
On M3/M4/M7 there is also a negative IRQ reserved for debugging [11]

Cortex-M0/M0+/M1 can have up to 32 interrupts, M3/M4/M7/M23 can have up to 240, M33 up to 480. [12].

Links:

ARM history

always had a reputation for weirdness, and I suppose this was the ultimate. While everyone else went 16-bit (or disappeared altogether), Acorn just kept selling variations on the same 8-bit theme. Then, all of a sudden, in 1987, they launched a machine known as Archimedes. It was based on an entirely new processor; the Acorn Risc Machine. This was fully 32-bit data, although it only boasted a 26 bit (equivalent) address bus. It was the first RISC-based home micro in production.

" The ARM chip owed a lot to the experience of its designers with the 6502 upon which its instruction set was based, but it introduced a couple of new ideas. First it had four processor modes with 16 general-purpose registers available. Some of the 16 were different in each mode. It also introduced conditional execution of instructions, avoiding many jumps in code, and helping increase the efficiency of the pipeline. The other interesting feature was its ability to use a barrel-shifter on one of the operands of an instruction with no performance penalty. In other words, a multiply and add can be done in one instruction. This is the kind of technology that Intel are hyping with their 'MMX' Pentiums. Yes, I know MMX is more than that, but it does say something...

Variants

The first ARM chip was available as a second processor for Acorn's 8-bit micros. The ARM chip in the Archimedes was an ARM 2 which ran at 8 MHz. The ARM 3 was installed in several later machines running at speeds up to 25 MHz. Its greatest performance boost came from a simple onboard 4k cache. It was after this that ARM Ltd was spun off from Acorn and started licensing the designs. They came up with the ARM 6 macrocell (what happened to 4 and 5?) and turned it into the ARM 610 processor used in the first Risc PCs. It was coupled with an 8k cache, full 32-bit addressing mode, better cache algorithms and 30 MHz clock. The ARM 710 soon followed with a few preformance tweaks, running at 40 MHz, and the ARM 810 was announced.

Then along came Digital. I'm not sure who initiated the pairing, but somehow Digital Equipment Corp, makers of the blindingly fast Alpha processors, got hold of the ARM designs, and built a processor using their semiconductor expertise. The result was the StrongARM?; a processor that functionally is little different from the ARM 710 except that it is (internally) clocked at 202 MHz. Oh yes, it also has two 8k caches; one for instructions and one for data. Rumour has it that the interpreter of RiscOS?'s built-in BASIC fits neatly into the instruction cache. If this is the case, it explains why interpreted BBC BASIC V is so flippin' fast. The other thing, and this is the cause of most of the few software problems, is that the length of the pipeline has been increased, so that self-modifying code which relies on knowing the length of the pipeline to calculate the PC gets in a real mess."

-- http://www.landley.net/history/mirror/acorn/processors.html

ARM opinions

" I'll just cover those things I really like about ARM in general :)

1. load/store multiple of any arbitrary register combination Yes, thats right. One can do "STM r0, {r0-r15}" if they want to and save every register. LDM is the same.

2. Address updates available for every memory instruction Reusing STM from above, "STM r0!, {r1-r15}", will write the final address to r0 (I've forgotten the exact specifics here). Pretty much every memory op supports this

3. The stack is my territory, and mine alone The processor will never touch the stack. I don't have to deal with processor built stack frames. This greatly simplifies some things

4. Pre-shifts available on all basic ALU instructions (Where "basic ALU" is defined as pretty much everything except MUL. ARM doesn't have division)

This is an incredibly useful feature, though it does make the instructions occasionally look like huge monstrosities! It also means that ARM's ADD instruction can double for most architecture's LEA.

5. Three operand instruction set Well, that one should be reasonably clear ;)

6. No mode flags (or those which exist are implicit) For example, while there are both the ARM and Thumb instruction sets, they're designated by the least significant bit of the branch target address. The BX/BLX instructions automatically move this bit into the current program status register (CPSR)

7. PC is in the register file Yes, you can do "MOV pc, lr" (this is the traditional way to return), and can use the ALU operations for relative branches.

(Caveat: On machines prior to ARMv7 [ARM11 and older processors], these instructions will not transition to/from Thumb mode and the result of loading the least significant bits of PC is Unpredictable. ARMv7 makes them interwork properly with Thumb)

(By the way - when ARM say Unpredictable they mean "May raise a trap, may do something completely unrelated, may be a NOP - behaviour is undefined except that it cannot cause a security hole" and be redefined by future revisions) " -- http://forum.6502.org/viewtopic.php?t=1594

"ARM wasn't really a pure RISC from the beginning (e.g. multicycle instructions like LDM/STM, pre/post-increment addressing modes, built-in shifts)..." [13]

" Yes, although the new instruction set in ARMv8 removes several of the things that made programming in 'classic' ARM assembly such fun on the Acorn, such as the free barrel shifter on most arithmetic ops, conditional execution on all instructions and fast multiple loads/stores with groups of registers. These have gone for various reasons; the fully-flexible barrel shifter is awkward at high frequencies with deep pipelines, the conditional execution flags became a waste of opcode space as branch prediction improved and the load/store multiples required microcode on modern implementations and so increased complexity. " [14]

ARM: Links

ARM: summary

It seems like the 'core' instruction set is indeed the set found in Cortex M0, M0+, and M1. This is a subset of the 16-bit thumb2 set, but with a few 32-bit instructions too.

Those instructions are: MOV, arithmetic (ADD, ADC, SUB, SBC, RSB, MUL), bitwise arithmetic (LSL, LSR, ASR, AND, ORR, EOR, ROR, BIC, MVN), byte reversals (REV, REV16, REVSH), get/set special registers (ADR, MRS, MSR), comparisons (CMP, CMN, TST), branching (B, BL), load/stores with immediate, register offset, PC, SP offset, and multiple registers, push/pop, extension (SXTH, SXTB, UXTH, UXTB), misc control (SVC, NOP), multiprocessing and (YIELD, WFE, WFI, SEV, DMB, DSB), and a few other misc instructions (ISB and some others).

When we get to the Cortex M3 we add 32-bit instructions for bit fields (BFC/BFI, SBFX, UBFX), multiprocessing (LDREX, STREX, CLREX), bitwise arithmetic (CLZ, MOVT, ORN, RRX, saturating versions of things), comparisons (TEQ), various loads and stores (with postindexing and various widths), arithmetic (division, multiply-accumulate (add/subtract) operations with various widths), branch tables (TBB, TBH), and some other misc instructions (DBG, PLD, PLI).

ARM64 instruction list

General instructions:

ADC Add with Carry
ADCS Add with Carry, setting flags
ADD (extended register) Add (extended register)
ADD (immediate) Add (immediate)
ADD (shifted register) Add (shifted register)
ADDS (extended register) Add (extended register), setting flags
ADDS (immediate) Add (immediate), setting flags
ADDS (shifted register) Add (shifted register), setting flags
ADR Form PC-relative address
ADRL pseudo-instruction Load a PC-relative address into a register
ADRP Form PC-relative address to 4KB page
AND (immediate) Bitwise AND (immediate)
AND (shifted register) Bitwise AND (shifted register)
ANDS (immediate) Bitwise AND (immediate), setting flags
ANDS (shifted register) Bitwise AND (shifted register), setting flags
ASR (register) Arithmetic Shift Right (register)
ASR (immediate) Arithmetic Shift Right (immediate)
ASRV Arithmetic Shift Right Variable
AT Address Translate
AUTDA, AUTDZA Authenticate Data address, using key A
AUTDB, AUTDZB Authenticate Data address, using key B AUTIA, AUTIZA, AUTIA1716, AUTIASP, AUTIAZ Authenticate Instruction address, using key A 16.24 AUTIA, AUTIZA, AUTIA1716, AUTIASP, AUTIAZ AUTIB, AUTIZB, AUTIB1716, AUTIBSP, AUTIBZ Authenticate Instruction address, using key B 16.25 AUTIB, AUTIZB, AUTIB1716, AUTIBSP, AUTIBZ
B.cond Branch conditionally
B Branch
BFC Bitfield Clear, leaving other bits unchanged
BFI Bitfield Insert
BFM Bitfield Move
BFXIL Bitfield extract and insert at low end
BIC (shifted register) Bitwise Bit Clear (shifted register)
BICS (shifted register) Bitwise Bit Clear (shifted register), setting flags
BL Branch with Link
BLR Branch with Link to Register
BLRAA, BLRAAZ, BLRAB, BLRABZ Branch with Link to Register, with pointer authentication
BR Branch to Register
BRAA, BRAAZ, BRAB, BRABZ Branch to Register, with pointer authentication
BRK Breakpoint instruction
CBNZ Compare and Branch on Nonzero
CBZ Compare and Branch on Zero
CCMN (immediate) Conditional Compare Negative (immediate)
CCMN (register) Conditional Compare Negative (register)
CCMP (immediate) Conditional Compare (immediate)
CCMP (register) Conditional Compare (register)
CINC Conditional Increment
CINV Conditional Invert
CLREX Clear Exclusive
CLS Count leading sign bits
CLZ Count leading zero bits
CMN (extended register) Compare Negative (extended register)
CMN (immediate) Compare Negative (immediate)
CMN (shifted register) Compare Negative (shifted register)
CMP (extended register) Compare (extended register)
CMP (immediate) Compare (immediate)
CMP (shifted register) Compare (shifted register)
CNEG Conditional Negate
CRC32B, CRC32H, CRC32W, CRC32X CRC32 checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register
CRC32CB, CRC32CH, CRC32CW, CRC32CX CRC32C checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register
CSEL Conditional Select
CSET Conditional Set
CSETM Conditional Set Mask
CSINC Conditional Select Increment
CSINV Conditional Select Invert
CSNEG Conditional Select Negation
DC Data Cache operation
DCPS1 Debug Change PE State to EL1
DCPS2 Debug Change PE State to EL2
DCPS3 Debug Change PE State to EL3
DMB Data Memory Barrier
DRPS Debug restore process state
DSB Data Synchronization Barrier
EON (shifted register) Bitwise Exclusive OR NOT (shifted register)
EOR (immediate) Bitwise Exclusive OR (immediate)
EOR (shifted register) Bitwise Exclusive OR (shifted register)
ERET Returns from an exception
ERETAA, ERETAB Exception Return, with pointer authentication
ESB Error Synchronization Barrier
EXTR Extract register
HINT Hint instruction
HLT Halt instruction
HVC Hypervisor call to allow OS code to call the Hypervisor
IC Instruction Cache operation
ISB Instruction Synchronization Barrier
LSL (register) Logical Shift Left (register)
LSL (immediate) Logical Shift Left (immediate)
LSLV Logical Shift Left Variable
LSR (register) Logical Shift Right (register)
LSR (immediate) Logical Shift Right (immediate)
LSRV Logical Shift Right Variable
MADD Multiply-Add
MNEG Multiply-Negate
MOV (to or from SP) Move between register and stack pointer
MOV (inverted wide immediate) Move (inverted wide immediate)
MOV (wide immediate) Move (wide immediate)
MOV (bitmask immediate) Move (bitmask immediate)
MOV (register) Move (register)
MOVK Move wide with keep
MOVL pseudo-instruction Load a register with either a 32-bit or 64-bit immediate value or any address
MOVN Move wide with NOT
MOVZ Move wide with zero
MRS Move System Register
MSR (immediate) Move immediate value to Special Register
MSR (register) Move general-purpose register to System Register
MSUB Multiply-Subtract
MUL Multiply
MVN Bitwise NOT
NEG (shifted register) Negate (shifted register)
NEGS Negate, setting flags
NGC Negate with Carry
NGCS Negate with Carry, setting flags
NOP No Operation
ORN (shifted register) Bitwise OR NOT (shifted register)
ORR (immediate) Bitwise OR (immediate)
ORR (shifted register) Bitwise OR (shifted register)
PACDA, PACDZA Pointer Authentication Code for Data address, using key A
PACDB, PACDZB Pointer Authentication Code for Data address, using key B
PACGA Pointer Authentication Code, using Generic key
PACIA, PACIZA, PACIA
PACIB, PACIZB, PACIB
PSB Profiling Synchronization Barrier
RBIT Reverse Bits
RET Return from subroutine
RETAA, RETAB Return from subroutine, with pointer authentication
R
REV32 Reverse bytes in 32-bit words
REV64 Reverse Bytes
REV Reverse Bytes
ROR (immediate) Rotate right (immediate)
ROR (register) Rotate Right (register)
RORV Rotate Right Variable
SBC Subtract with Carry
SBCS Subtract with Carry, setting flags
SBFIZ Signed Bitfield Insert in Zero
SBFM Signed Bitfield Move
SBFX Signed Bitfield Extract
SDIV Signed Divide
SEV Send Event
SEVL Send Event Local
SMADDL Signed Multiply-Add Long
SMC Supervisor call to allow OS or Hypervisor code to call the Secure Monitor
SMNEGL Signed Multiply-Negate Long
SMSUBL Signed Multiply-Subtract Long
SMULH Signed Multiply High
SMULL Signed Multiply Long
SUB (extended register) Subtract (extended register)
SUB (immediate) Subtract (immediate)
SUB (shifted register) Subtract (shifted register)
SUBS (extended register) Subtract (extended register), setting flags
SUBS (immediate) Subtract (immediate), setting flags
SUBS (shifted register) Subtract (shifted register), setting flags
SVC Supervisor call to allow application code to call the OS
SXTB Signed Extend Byte
SXTH Sign Extend Halfword
SXTW Sign Extend Word
SYS System instruction
SYSL System instruction with result
TBNZ Test bit and Branch if Nonzero
TBZ Test bit and Branch if Zero
TLBI TLB Invalidate operation
TST (immediate) , setting the condition flags and discarding the result
TST (shifted register) Test (shifted register)
UBFIZ Unsigned Bitfield Insert in Zero
UBFM Unsigned Bitfield Move
UBFX Unsigned Bitfield Extract
UDIV Unsigned Divide
UMADDL Unsigned Multiply-Add Long
UMNEGL Unsigned Multiply-Negate Long
UMSUBL Unsigned Multiply-Subtract Long
UMULH Unsigned Multiply High
UMULL Unsigned Multiply Long
UXTB Unsigned Extend Byte
UXTH Unsigned Extend Halfword
WFE Wait For Event
WFI Wait For Interrupt
XPACD, XPACI, XPACLRI Strip Pointer Authentication Code
YIELD YIELD

Data transfer instructions:

CASA, CASAL, CAS, CASL, CASAL, CAS, CASL Compare and Swap word or doubleword in memory
CASAB, CASALB, CASB, CASLB Compare and Swap byte in memory
CASAH, CASALH, CASH, CASLH Compare and Swap halfword in memory
CASPA, CASPAL, CASP, CASPL, CASPAL, CASP, CASPL Compare and Swap Pair of words or doublewords in memory
LDADDA, LDADDAL, LDADD, LDADDL, LDADDAL, LDADD, LDADDL Atomic add on word or doubleword in memory
LDADDAB, LDADDALB, LDADDB, LDADDLB Atomic add on byte in memory
LDADDAH, LDADDALH, LDADDH, LDADDLH Atomic add on halfword in memory
LDAPR Load-Acquire RCpc Register
LDAPRB Load-Acquire RCpc Register Byte
LDAPRH Load-Acquire RCpc Register Halfword
LDAR Load-Acquire Register
LDARB Load-Acquire Register Byte
LDARH Load-Acquire Register Halfword
LDAXP Load-Acquire Exclusive Pair of Registers
LDAXR Load-Acquire Exclusive Register
LDAXRB Load-Acquire Exclusive Register Byte
LDAXRH Load-Acquire Exclusive Register Halfword
LDCLRA, LDCLRAL, LDCLR, LDCLRL, LDCLRAL, LDCLR, LDCLRL Atomic bit clear on word or doubleword in memory
LDCLRAB, LDCLRALB, LDCLRB, LDCLRLB Atomic bit clear on byte in memory
LDCLRAH, LDCLRALH, LDCLRH, LDCLRLH Atomic bit clear on halfword in memory
LDEORA, LDEORAL, LDEOR, LDEORL, LDEORAL, LDEOR, LDEORL Atomic exclusive OR on word or doubleword in memory
LDEORAB, LDEORALB, LDEORB, LDEORLB Atomic exclusive OR on byte in memory
LDEORAH, LDEORALH, LDEORH, LDEORLH Atomic exclusive OR on halfword in memory
LDLAR Load LOAcquire Register
LDLARB Load LOAcquire Register Byte
LDLARH Load LOAcquire Register Halfword
LDNP Load Pair of Registers, with non-temporal hint
LDP Load Pair of Registers
LDPSW Load Pair of Registers Signed Word
LDR (immediate) Load Register (immediate)
LDR (literal) Load Register (literal)
LDR pseudo-instruction Load a register with either a 32-bit or 64-bit immediate value or any address
LDR (register) Load Register (register)
LDRAA, LDRAB, LDRAB Load Register, with pointer authentication
LDRB (immediate) Load Register Byte (immediate)
LDRB (register) Load Register Byte (register)
LDRH (immediate) Load Register Halfword (immediate)
LDRH (register) Load Register Halfword (register)
LDRSB (immediate) Load Register Signed Byte (immediate)
LDRSB (register) Load Register Signed Byte (register)
LDRSH (immediate) Load Register Signed Halfword (immediate)
LDRSH (register) Load Register Signed Halfword (register)
LDRSW (immediate) Load Register Signed Word (immediate)
LDRSW (literal) Load Register Signed Word (literal)
LDRSW (register) Load Register Signed Word (register)
LDSETA, LDSETAL, LDSET, LDSETL, LDSETAL, LDSET, LDSETL Atomic bit set on word or doubleword in memory
LDSETAB, LDSETALB, LDSETB, LDSETLB Atomic bit set on byte in memory
LDSETAH, LDSETALH, LDSETH, LDSETLH Atomic bit set on halfword in memory
LDSMAXA, LDSMAXAL, LDSMAX, LDSMAXL, LDSMAXAL, LDSMAX, LDSMAXL Atomic signed maximum on word or doubleword in memory
LDSMAXAB, LDSMAXALB, LDSMAXB, LDSMAXLB Atomic signed maximum on byte in memory
LDSMAXAH, LDSMAXALH, LDSMAXH, LDSMAXLH Atomic signed maximum on halfword in memory
LDSMINA, LDSMINAL, LDSMIN, LDSMINL, LDSMINAL, LDSMIN, LDSMINL Atomic signed minimum on word or doubleword in memory
LDSMINAB, LDSMINALB, LDSMINB, LDSMINLB Atomic signed minimum on byte in memory
LDSMINAH, LDSMINALH, LDSMINH, LDSMINLH Atomic signed minimum on halfword in memory
LDTR Load Register (unprivileged)
LDTRB Load Register Byte (unprivileged)
LDTRH Load Register Halfword (unprivileged)
LDTRSB Load Register Signed Byte (unprivileged)
LDTRSH Load Register Signed Halfword (unprivileged)
LDTRSW Load Register Signed Word (unprivileged)
LDUMAXA, LDUMAXAL, LDUMAX, LDUMAXL, LDUMAXAL, LDUMAX, LDUMAXL Atomic unsigned maximum on word or doubleword in memory
LDUMAXAB, LDUMAXALB, LDUMAXB, LDUMAXLB Atomic unsigned maximum on byte in memory
LDUMAXAH, LDUMAXALH, LDUMAXH, LDUMAXLH Atomic unsigned maximum on halfword in memory
LDUMINA, LDUMINAL, LDUMIN, LDUMINL, LDUMINAL, LDUMIN, LDUMINL Atomic unsigned minimum on word or doubleword in memory
LDUMINAB, LDUMINALB, LDUMINB, LDUMINLB Atomic unsigned minimum on byte in memory
LDUMINAH, LDUMINALH, LDUMINH, LDUMINLH Atomic unsigned minimum on halfword in memory
LDUR Load Register (unscaled)
LDURB Load Register Byte (unscaled)
LDURH Load Register Halfword (unscaled)
LDURSB Load Register Signed Byte (unscaled)
LDURSH Load Register Signed Halfword (unscaled)
LDURSW Load Register Signed Word (unscaled)
LDXP Load Exclusive Pair of Registers
LDXR Load Exclusive Register
LDXRB Load Exclusive Register Byte
LDXRH Load Exclusive Register Halfword
PRFM (immediate) Prefetch Memory (immediate)
PRFM (literal) Prefetch Memory (literal)
PRFM (register) Prefetch Memory (register)
PRFUM (unscaled offset) Prefetch Memory (unscaled offset)
STADD, STADDL, STADDL Atomic add on word or doubleword in memory, without return
STADDB, STADDLB Atomic add on byte in memory, without return
STADDH, STADDLH Atomic add on halfword in memory, without return
STCLR, STCLRL, STCLRL Atomic bit clear on word or doubleword in memory, without return
STCLRB, STCLRLB Atomic bit clear on byte in memory, without return
STCLRH, STCLRLH Atomic bit clear on halfword in memory, without return
STEOR, STEORL, STEORL Atomic exclusive OR on word or doubleword in memory, without return
STEORB, STEORLB Atomic exclusive OR on byte in memory, without return
STEORH, STEORLH Atomic exclusive OR on halfword in memory, without return
STLLR Store LORelease Register
STLLRB Store LORelease Register Byte
STLLRH Store LORelease Register Halfword
STLR Store-Release Register
STLRB Store-Release Register Byte
STLRH Store-Release Register Halfword
STLXP Store-Release Exclusive Pair of registers
STLXR Store-Release Exclusive Register
STLXRB Store-Release Exclusive Register Byte
STLXRH Store-Release Exclusive Register Halfword
STNP Store Pair of Registers, with non-temporal hint
STP Store Pair of Registers
STR (immediate) Store Register (immediate)
STR (register) Store Register (register)
STRB (immediate) Store Register Byte (immediate)
STRB (register) Store Register Byte (register)
STRH (immediate) Store Register Halfword (immediate)
STRH (register) Store Register Halfword (register)
STSET, STSETL, STSETL Atomic bit set on word or doubleword in memory, without return
STSETB, STSETLB Atomic bit set on byte in memory, without return
STSETH, STSETLH Atomic bit set on halfword in memory, without return
STSMAX, STSMAXL, STSMAXL Atomic signed maximum on word or doubleword in memory, without return
STSMAXB, STSMAXLB Atomic signed maximum on byte in memory, without return
STSMAXH, STSMAXLH Atomic signed maximum on halfword in memory, without return
STSMIN, STSMINL, STSMINL Atomic signed minimum on word or doubleword in memory, without return
STSMINB, STSMINLB Atomic signed minimum on byte in memory, without return
STSMINH, STSMINLH Atomic signed minimum on halfword in memory, without return
STTR Store Register (unprivileged)
STTRB Store Register Byte (unprivileged)
STTRH Store Register Halfword (unprivileged)
STUMAX, STUMAXL, STUMAXL Atomic unsigned maximum on word or doubleword in memory, without return
STUMAXB, STUMAXLB Atomic unsigned maximum on byte in memory, without return
STUMAXH, STUMAXLH Atomic unsigned maximum on halfword in memory, without return
STUMIN, STUMINL, STUMINL Atomic unsigned minimum on word or doubleword in memory, without return
STUMINB, STUMINLB Atomic unsigned minimum on byte in memory, without return
STUMINH, STUMINLH Atomic unsigned minimum on halfword in memory, without return
STUR Store Register (unscaled)
STURB Store Register Byte (unscaled)
STURH Store Register Halfword (unscaled)
STXP Store Exclusive Pair of registers
STXR Store Exclusive Register
STXRB Store Exclusive Register Byte
STXRH Store Exclusive Register Halfword
SWPA, SWPAL, SWP, SWPL, SWPAL, SWP, SWPL Swap word or doubleword in memory
SWPAB, SWPALB, SWPB, SWPLB Swap byte in memory
SWPAH, SWPALH, SWPH, SWPLH Swap halfword in memory

A64 floating-point instructions:

FABS (scalar) Floating-point Absolute value (scalar)
FADD (scalar) Floating-point Add (scalar)
FCCMP Floating-point Conditional quiet Compare (scalar)
FCCMPE Floating-point Conditional signaling Compare (scalar)
FCMP Floating-point quiet Compare (scalar)
FCMPE Floating-point signaling Compare (scalar)
FCSEL Floating-point Conditional Select (scalar)
FCVT Floating-point Convert precision (scalar)
FCVTAS (scalar) Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar)
FCVTAU (scalar) Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar)
FCVTMS (scalar) Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar)
FCVTMU (scalar) Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar)
FCVTNS (scalar) Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar)
FCVTNU (scalar) Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar)
FCVTPS (scalar) Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar)
FCVTPU (scalar) Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar)
FCVTZS (scalar, fixed-point) Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar)
FCVTZS (scalar, integer) Floating-point Convert to Signed integer, rounding toward Zero (scalar)
FCVTZU (scalar, fixed-point) Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar)
FCVTZU (scalar, integer) Floating-point Convert to Unsigned integer, rounding toward Zero (scalar)
FDIV (scalar) Floating-point Divide (scalar)
FJCVTZS Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero
FMADD Floating-point fused Multiply-Add (scalar)
FMAX (scalar) Floating-point Maximum (scalar)
FMAXNM (scalar) Floating-point Maximum Number (scalar)
FMIN (scalar) Floating-point Minimum (scalar)
FMINNM (scalar) Floating-point Minimum Number (scalar)
FMOV (register) Floating-point Move register without conversion
FMOV (general) Floating-point Move to or from general-purpose register without conversion
FMOV (scalar, immediate) Floating-point move immediate (scalar)
FMSUB Floating-point Fused Multiply-Subtract (scalar)
FMUL (scalar) Floating-point Multiply (scalar)
FNEG (scalar) Floating-point Negate (scalar)
FNMADD Floating-point Negated fused Multiply-Add (scalar)
FNMSUB Floating-point Negated fused Multiply-Subtract (scalar)
FNMUL (scalar) Floating-point Multiply-Negate (scalar)
FRINTA (scalar) Floating-point Round to Integral, to nearest with ties to Away (scalar)
FRINTI (scalar) Floating-point Round to Integral, using current rounding mode (scalar)
FRINTM (scalar) Floating-point Round to Integral, toward Minus infinity (scalar)
FRINTN (scalar) Floating-point Round to Integral, to nearest with ties to even (scalar)
FRINTP (scalar) Floating-point Round to Integral, toward Plus infinity (scalar)
FRINTX (scalar) Floating-point Round to Integral exact, using current rounding mode (scalar)
FRINTZ (scalar) Floating-point Round to Integral, toward Zero (scalar)
FSQRT (scalar) Floating-point Square Root (scalar)
FSUB (scalar) Floating-point Subtract (scalar)
LDNP (SIMD and FP) Load Pair of SIMD and FP registers, with Non-temporal hint
LDP (SIMD and FP) Load Pair of SIMD and FP registers
LDR (immediate, SIMD and FP) Load SIMD and FP Register (immediate offset)
LDR (literal, SIMD and FP) Load SIMD and FP Register (PC-relative literal)
LDR (register, SIMD and FP) Load SIMD and FP Register (register offset)
LDUR (SIMD and FP) Load SIMD and FP Register (unscaled offset)
SCVTF (scalar, fixed-point) Signed fixed-point Convert to Floating-point (scalar)
SCVTF (scalar, integer) Signed integer Convert to Floating-point (scalar)
STNP (SIMD and FP) Store Pair of SIMD and FP registers, with Non-temporal hint
STP (SIMD and FP) Store Pair of SIMD and FP registers
STR (immediate, SIMD and FP) Store SIMD and FP register (immediate offset)
STR (register, SIMD and FP) Store SIMD and FP register (register offset)
STUR (SIMD and FP) Store SIMD and FP register (unscaled offset)
UCVTF (scalar, fixed-point) Unsigned fixed-point Convert to Floating-point (scalar)
UCVTF (scalar, integer) Unsigned integer Convert to Floating-point (scalar)

A64 SIMD scalar instructions:

ABS (scalar) Absolute value (vector)
ADD (scalar) Add (vector)
ADDP (scalar) Add Pair of elements (scalar)
CMEQ (scalar, register) Compare bitwise Equal (vector)
CMEQ (scalar, zero) Compare bitwise Equal to zero (vector)
CMGE (scalar, register) Compare signed Greater than or Equal (vector)
CMGE (scalar, zero) Compare signed Greater than or Equal to zero (vector)
CMGT (scalar, register) Compare signed Greater than (vector)
CMGT (scalar, zero) Compare signed Greater than zero (vector)
CMHI (scalar, register) Compare unsigned Higher (vector)
CMHS (scalar, register) Compare unsigned Higher or Same (vector)
CMLE (scalar, zero) Compare signed Less than or Equal to zero (vector)
CMLT (scalar, zero) Compare signed Less than zero (vector)
CMTST (scalar) Compare bitwise Test bits nonzero (vector)
DUP (scalar, element) Duplicate vector element to scalar
FABD (scalar) Floating-point Absolute Difference (vector)
FACGE (scalar) Floating-point Absolute Compare Greater than or Equal (vector)
FACGT (scalar) Floating-point Absolute Compare Greater than (vector)
FADDP (scalar) Floating-point Add Pair of elements (scalar)
FCMEQ (scalar, register) Floating-point Compare Equal (vector)
FCMEQ (scalar, zero) Floating-point Compare Equal to zero (vector)
FCMGE (scalar, register) Floating-point Compare Greater than or Equal (vector)
FCMGE (scalar, zero) Floating-point Compare Greater than or Equal to zero (vector)
FCMGT (scalar, register) Floating-point Compare Greater than (vector)
FCMGT (scalar, zero) Floating-point Compare Greater than zero (vector)
FCMLA (scalar, by element) Floating-point Complex Multiply Accumulate (by element)
FCMLE (scalar, zero) Floating-point Compare Less than or Equal to zero (vector)
FCMLT (scalar, zero) Floating-point Compare Less than zero (vector)
FCVTAS (scalar) Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector)
FCVTAU (scalar) Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector)
FCVTMS (scalar) Floating-point Convert to Signed integer, rounding toward Minus infinity (vector)
FCVTMU (scalar) Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector)
FCVTNS (scalar) Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector)
FCVTNU (scalar) Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector)
FCVTPS (scalar) Floating-point Convert to Signed integer, rounding toward Plus infinity (vector)
FCVTPU (scalar) Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector)
FCVTXN (scalar) Floating-point Convert to lower precision Narrow, rounding to odd (vector)
FCVTZS (scalar, fixed-point) Floating-point Convert to Signed fixed-point, rounding toward Zero (vector)
FCVTZS (scalar, integer) Floating-point Convert to Signed integer, rounding toward Zero (vector)
FCVTZU (scalar, fixed-point) Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector)
FCVTZU (scalar, integer) Floating-point Convert to Unsigned integer, rounding toward Zero (vector)
FMAXNMP (scalar) Floating-point Maximum Number of Pair of elements (scalar)
FMAXP (scalar) Floating-point Maximum of Pair of elements (scalar)
FMINNMP (scalar) Floating-point Minimum Number of Pair of elements (scalar)
FMINP (scalar) Floating-point Minimum of Pair of elements (scalar)
FMLA (scalar, by element) Floating-point fused Multiply-Add to accumulator (by element)
FMLS (scalar, by element) Floating-point fused Multiply-Subtract from accumulator (by element)
FMUL (scalar, by element) Floating-point Multiply (by element)
FMULX (scalar, by element) Floating-point Multiply extended (by element)
FMULX (scalar) Floating-point Multiply extended
FRECPE (scalar) Floating-point Reciprocal Estimate
FRECPS (scalar) Floating-point Reciprocal Step
FRSQRTE (scalar) Floating-point Reciprocal Square Root Estimate
FRSQRTS (scalar) Floating-point Reciprocal Square Root Step
MOV (scalar) Move vector element to scalar
NEG (scalar) Negate (vector)
SCVTF (scalar, fixed-point) Signed fixed-point Convert to Floating-point (vector)
SCVTF (scalar, integer) Signed integer Convert to Floating-point (vector)
SHL (scalar) Shift Left (immediate)
SLI (scalar) Shift Left and Insert (immediate)
SQABS (scalar) Signed saturating Absolute value
SQADD (scalar) Signed saturating Add
SQDMLAL (scalar, by element) Signed saturating Doubling Multiply-Add Long (by element)
SQDMLAL (scalar) Signed saturating Doubling Multiply-Add Long
SQDMLSL (scalar, by element) Signed saturating Doubling Multiply-Subtract Long (by element)
SQDMLSL (scalar) Signed saturating Doubling Multiply-Subtract Long
SQDMULH (scalar, by element) Signed saturating Doubling Multiply returning High half (by element)
SQDMULH (scalar) Signed saturating Doubling Multiply returning High half
SQDMULL (scalar, by element) Signed saturating Doubling Multiply Long (by element)
SQDMULL (scalar) Signed saturating Doubling Multiply Long
SQNEG (scalar) Signed saturating Negate
SQRDMLAH (scalar, by element) Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element)
SQRDMLAH (scalar) Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector)
SQRDMLSH (scalar, by element) Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element)
SQRDMLSH (scalar) Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector)
SQRDMULH (scalar, by element) Signed saturating Rounding Doubling Multiply returning High half (by element)
SQRDMULH (scalar) Signed saturating Rounding Doubling Multiply returning High half
SQRSHL (scalar) Signed saturating Rounding Shift Left (register)
SQRSHRN (scalar) Signed saturating Rounded Shift Right Narrow (immediate)
SQRSHRUN (scalar) Signed saturating Rounded Shift Right Unsigned Narrow (immediate)
SQSHL (scalar, immediate) Signed saturating Shift Left (immediate)
SQSHL (scalar, register) Signed saturating Shift Left (register)
SQSHLU (scalar) Signed saturating Shift Left Unsigned (immediate)
SQSHRN (scalar) Signed saturating Shift Right Narrow (immediate)
SQSHRUN (scalar) Signed saturating Shift Right Unsigned Narrow (immediate)
SQSUB (scalar) Signed saturating Subtract
SQXTN (scalar) Signed saturating extract Narrow
SQXTUN (scalar) Signed saturating extract Unsigned Narrow
SRI (scalar) Shift Right and Insert (immediate)
SRSHL (scalar) Signed Rounding Shift Left (register)
SRSHR (scalar) Signed Rounding Shift Right (immediate)
SRSRA (scalar) Signed Rounding Shift Right and Accumulate (immediate)
SSHL (scalar) Signed Shift Left (register)
SSHR (scalar) Signed Shift Right (immediate)
SSRA (scalar) Signed Shift Right and Accumulate (immediate)
SUB (scalar) Subtract (vector)
SUQADD (scalar) Signed saturating Accumulate of Unsigned value
UCVTF (scalar, fixed-point) Unsigned fixed-point Convert to Floating-point (vector)
UCVTF (scalar, integer) Unsigned integer Convert to Floating-point (vector)
UQADD (scalar) Unsigned saturating Add
UQRSHL (scalar) Unsigned saturating Rounding Shift Left (register)
UQRSHRN (scalar) Unsigned saturating Rounded Shift Right Narrow (immediate)
UQSHL (scalar, immediate) Unsigned saturating Shift Left (immediate)
UQSHL (scalar, register) Unsigned saturating Shift Left (register)
UQSHRN (scalar) Unsigned saturating Shift Right Narrow (immediate)
UQSUB (scalar) Unsigned saturating Subtract
UQXTN (scalar) Unsigned saturating extract Narrow
URSHL (scalar) Unsigned Rounding Shift Left (register)
URSHR (scalar) Unsigned Rounding Shift Right (immediate)
URSRA (scalar) Unsigned Rounding Shift Right and Accumulate (immediate)
USHL (scalar) Unsigned Shift Left (register)
USHR (scalar) Unsigned Shift Right (immediate)
USQADD (scalar) Unsigned saturating Accumulate of Signed value
USRA (scalar) Unsigned Shift Right and Accumulate (immediate)

A64 SIMD Vector instructions:

ABS (vector) Absolute value (vector)
ADD (vector) Add (vector)
ADDHN, ADDHN2 (vector) Add returning High Narrow
ADDP (vector) Add Pairwise (vector)
ADDV (vector) Add across Vector
AND (vector) Bitwise AND (vector)
BIC (vector, immediate) Bitwise bit Clear (vector, immediate)
BIC (vector, register) Bitwise bit Clear (vector, register)
BIF (vector) Bitwise Insert if False
BIT (vector) Bitwise Insert if True
BSL (vector) Bitwise Select
CLS (vector) Count Leading Sign bits (vector)
CLZ (vector) Count Leading Zero bits (vector)
CMEQ (vector, register) Compare bitwise Equal (vector)
CMEQ (vector, zero) Compare bitwise Equal to zero (vector)
CMGE (vector, register) Compare signed Greater than or Equal (vector)
CMGE (vector, zero) Compare signed Greater than or Equal to zero (vector)
CMGT (vector, register) Compare signed Greater than (vector)
CMGT (vector, zero) Compare signed Greater than zero (vector)
CMHI (vector, register) Compare unsigned Higher (vector)
CMHS (vector, register) Compare unsigned Higher or Same (vector)
CMLE (vector, zero) Compare signed Less than or Equal to zero (vector)
CMLT (vector, zero) Compare signed Less than zero (vector)
CMTST (vector) Compare bitwise Test bits nonzero (vector)
CNT (vector) Population Count per byte
DUP (vector, element) vector
DUP (vector, general) Duplicate general-purpose register to vector
EOR (vector) Bitwise Exclusive OR (vector)
EXT (vector) Extract vector from pair of vectors
FABD (vector) Floating-point Absolute Difference (vector)
FABS (vector) Floating-point Absolute value (vector)
FACGE (vector) Floating-point Absolute Compare Greater than or Equal (vector)
FACGT (vector) Floating-point Absolute Compare Greater than (vector)
FADD (vector) Floating-point Add (vector)
FADDP (vector) Floating-point Add Pairwise (vector)
FCADD (vector) Floating-point Complex Add
FCMEQ (vector, register) Floating-point Compare Equal (vector)
FCMEQ (vector, zero) Floating-point Compare Equal to zero (vector)
FCMGE (vector, register) Floating-point Compare Greater than or Equal (vector)
FCMGE (vector, zero) Floating-point Compare Greater than or Equal to zero (vector)
FCMGT (vector, register) Floating-point Compare Greater than (vector)
FCMGT (vector, zero) Floating-point Compare Greater than zero (vector)
FCMLA (vector) Floating-point Complex Multiply Accumulate
FCMLE (vector, zero) Floating-point Compare Less than or Equal to zero (vector)
FCMLT (vector, zero) Floating-point Compare Less than zero (vector)
FCVTAS (vector) Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector)
FCVTAU (vector) Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector)
FCVTL, FCVTL2 (vector) Floating-point Convert to higher precision Long (vector)
FCVTMS (vector) Floating-point Convert to Signed integer, rounding toward Minus infinity (vector)
FCVTMU (vector) Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector)
FCVTN, FCVTN2 (vector) Floating-point Convert to lower precision Narrow (vector)
FCVTNS (vector) Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector)
FCVTNU (vector) Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector)
FCVTPS (vector) Floating-point Convert to Signed integer, rounding toward Plus infinity (vector)
FCVTPU (vector) Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector)
FCVTXN, FCVTXN2 (vector) Floating-point Convert to lower precision Narrow, rounding to odd (vector)
FCVTZS (vector, fixed-point) Floating-point Convert to Signed fixed-point, rounding toward Zero (vector)
FCVTZS (vector, integer) Floating-point Convert to Signed integer, rounding toward Zero (vector)
FCVTZU (vector, fixed-point) Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector)
FCVTZU (vector, integer) Floating-point Convert to Unsigned integer, rounding toward Zero (vector)
FDIV (vector) Floating-point Divide (vector)
FMAX (vector) Floating-point Maximum (vector)
FMAXNM (vector) Floating-point Maximum Number (vector)
FMAXNMP (vector) Floating-point Maximum Number Pairwise (vector)
FMAXNMV (vector) Floating-point Maximum Number across Vector
FMAXP (vector) Floating-point Maximum Pairwise (vector)
FMAXV (vector) Floating-point Maximum across Vector
FMIN (vector) Floating-point minimum (vector)
FMINNM (vector) Floating-point Minimum Number (vector)
FMINNMP (vector) Floating-point Minimum Number Pairwise (vector)
FMINNMV (vector) Floating-point Minimum Number across Vector
FMINP (vector) Floating-point Minimum Pairwise (vector)
FMINV (vector) Floating-point Minimum across Vector
FMLA (vector, by element) Floating-point fused Multiply-Add to accumulator (by element)
FMLA (vector) Floating-point fused Multiply-Add to accumulator (vector)
FMLS (vector, by element) Floating-point fused Multiply-Subtract from accumulator (by element)
FMLS (vector) Floating-point fused Multiply-Subtract from accumulator (vector)
FMOV (vector, immediate) Floating-point move immediate (vector)
FMUL (vector, by element) Floating-point Multiply (by element)
FMUL (vector) Floating-point Multiply (vector)
FMULX (vector, by element) Floating-point Multiply extended (by element)
FMULX (vector) Floating-point Multiply extended
FNEG (vector) Floating-point Negate (vector)
FRECPE (vector) Floating-point Reciprocal Estimate
FRECPS (vector) Floating-point Reciprocal Step
FRECPX (vector) Floating-point Reciprocal exponent (scalar)
FRINTA (vector) Floating-point Round to Integral, to nearest with ties to Away (vector)
FRINTI (vector) Floating-point Round to Integral, using current rounding mode (vector)
FRINTM (vector) Floating-point Round to Integral, toward Minus infinity (vector)
FRINTN (vector) Floating-point Round to Integral, to nearest with ties to even (vector)
FRINTP (vector) Floating-point Round to Integral, toward Plus infinity (vector)
FRINTX (vector) Floating-point Round to Integral exact, using current rounding mode (vector)
FRINTZ (vector) Floating-point Round to Integral, toward Zero (vector)
FRSQRTE (vector) Floating-point Reciprocal Square Root Estimate
FRSQRTS (vector) Floating-point Reciprocal Square Root Step
FSQRT (vector) Floating-point Square Root (vector)
FSUB (vector) Floating-point Subtract (vector)
INS (vector, element) Insert vector element from another vector element
INS (vector, general) Insert vector element from general-purpose register
LD1 (vector, multiple structures) Load multiple single-element structures to one, two, three, or four registers
LD1 (vector, single structure) Load one single-element structure to one lane of one register
LD1R (vector) Load one single-element structure and Replicate to all lanes (of one register)
LD2 (vector, multiple structures) Load multiple 2-element structures to two registers
LD2 (vector, single structure) Load single 2-element structure to one lane of two registers
LD2R (vector) Load single 2-element structure and Replicate to all lanes of two registers
LD3 (vector, multiple structures) Load multiple 3-element structures to three registers
LD3 (vector, single structure) Load single 3-element structure to one lane of three registers)
LD3R (vector) Load single 3-element structure and Replicate to all lanes of three registers
LD4 (vector, multiple structures) Load multiple 4-element structures to four registers
LD4 (vector, single structure) Load single 4-element structure to one lane of four registers
LD4R (vector) Load single 4-element structure and Replicate to all lanes of four registers
MLA (vector, by element) Multiply-Add to accumulator (vector, by element)
MLA (vector) Multiply-Add to accumulator (vector)
MLS (vector, by element) Multiply-Subtract from accumulator (vector, by element)
MLS (vector) Multiply-Subtract from accumulator (vector)
MOV (vector, element) Move vector element to another vector element
MOV (vector, from general) Move general-purpose register to a vector element
MOV (vector) Move vector
MOV (vector, to general) Move vector element to general-purpose register
MOVI (vector) Move Immediate (vector)
MUL (vector, by element) Multiply (vector, by element)
MUL (vector) Multiply (vector)
MVN (vector) Bitwise NOT (vector)
MVNI (vector) Move inverted Immediate (vector)
NEG (vector) Negate (vector)
NOT (vector) Bitwise NOT (vector)
ORN (vector) Bitwise inclusive OR NOT (vector)
ORR (vector, immediate) Bitwise inclusive OR (vector, immediate)
ORR (vector, register) Bitwise inclusive OR (vector, register)
PMUL (vector) Polynomial Multiply
PMULL, PMULL2 (vector) Polynomial Multiply Long
RADDHN, RADDHN2 (vector) Rounding Add returning High Narrow
RBIT (vector) Reverse Bit order (vector)
REV16 (vector) Reverse elements in 16-bit halfwords (vector)
REV32 (vector) Reverse elements in 32-bit words (vector)
REV64 (vector) Reverse elements in 64-bit doublewords (vector)
RSHRN, RSHRN2 (vector) Rounding Shift Right Narrow (immediate)
RSUBHN, RSUBHN2 (vector) Rounding Subtract returning High Narrow
SABA (vector) Signed Absolute difference and Accumulate
SABAL, SABAL2 (vector) Signed Absolute difference and Accumulate Long
SABD (vector) Signed Absolute Difference
SABDL, SABDL2 (vector) Signed Absolute Difference Long
SADALP (vector) Signed Add and Accumulate Long Pairwise
SADDL, SADDL2 (vector) Signed Add Long (vector)
SADDLP (vector) Signed Add Long Pairwise
SADDLV (vector) Signed Add Long across Vector
SADDW, SADDW2 (vector) Signed Add Wide
SCVTF (vector, fixed-point) Signed fixed-point Convert to Floating-point (vector)
SCVTF (vector, integer) Signed integer Convert to Floating-point (vector)
SDOT (vector, by element) Dot Product signed arithmetic (vector, by element)
SDOT (vector) Dot Product signed arithmetic (vector)
SHADD (vector) Signed Halving Add
SHL (vector) Shift Left (immediate)
SHLL, SHLL2 (vector) Shift Left Long (by element size)
SHRN, SHRN2 (vector) Shift Right Narrow (immediate)
SHSUB (vector) Signed Halving Subtract
SLI (vector) Shift Left and Insert (immediate)
SMAX (vector) Signed Maximum (vector)
SMAXP (vector) Signed Maximum Pairwise
SMAXV (vector) Signed Maximum across Vector
SMIN (vector) Signed Minimum (vector)
SMINP (vector) Signed Minimum Pairwise
SMINV (vector) Signed Minimum across Vector
SMLAL, SMLAL2 (vector, by element) Signed Multiply-Add Long (vector, by element)
SMLAL, SMLAL2 (vector) Signed Multiply-Add Long (vector)
SMLSL, SMLSL2 (vector, by element) Signed Multiply-Subtract Long (vector, by element)
SMLSL, SMLSL2 (vector) Signed Multiply-Subtract Long (vector)
SMOV (vector) Signed Move vector element to general-purpose register
SMULL, SMULL2 (vector, by element) Signed Multiply Long (vector, by element)
SMULL, SMULL2 (vector) Signed Multiply Long (vector)
SQABS (vector) Signed saturating Absolute value
SQADD (vector) Signed saturating Add
SQDMLAL, SQDMLAL2 (vector, by element) Signed saturating Doubling Multiply-Add Long (by element)
SQDMLAL, SQDMLAL2 (vector) Signed saturating Doubling Multiply-Add Long
SQDMLSL, SQDMLSL2 (vector, by element) Signed saturating Doubling Multiply-Subtract Long (by element)
SQDMLSL, SQDMLSL2 (vector) Signed saturating Doubling Multiply-Subtract Long
SQDMULH (vector, by element) Signed saturating Doubling Multiply returning High half (by element)
SQDMULH (vector) Signed saturating Doubling Multiply returning High half
SQDMULL, SQDMULL2 (vector, by element) Signed saturating Doubling Multiply Long (by element)
SQDMULL, SQDMULL2 (vector) Signed saturating Doubling Multiply Long
SQNEG (vector) Signed saturating Negate
SQRDMLAH (vector, by element) Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element)
SQRDMLAH (vector) Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector)
SQRDMLSH (vector, by element) Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element)
SQRDMLSH (vector) Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector)
SQRDMULH (vector, by element) Signed saturating Rounding Doubling Multiply returning High half (by element)
SQRDMULH (vector) Signed saturating Rounding Doubling Multiply returning High half
SQRSHL (vector) Signed saturating Rounding Shift Left (register)
SQRSHRN, SQRSHRN2 (vector) Signed saturating Rounded Shift Right Narrow (immediate)
SQRSHRUN, SQRSHRUN2 (vector) Signed saturating Rounded Shift Right Unsigned Narrow (immediate)
SQSHL (vector, immediate) Signed saturating Shift Left (immediate)
SQSHL (vector, register) Signed saturating Shift Left (register)
SQSHLU (vector) Signed saturating Shift Left Unsigned (immediate)
SQSHRN, SQSHRN2 (vector) Signed saturating Shift Right Narrow (immediate)
SQSHRUN, SQSHRUN2 (vector) Signed saturating Shift Right Unsigned Narrow (immediate)
SQSUB (vector) Signed saturating Subtract
SQXTN, SQXTN2 (vector) Signed saturating extract Narrow
SQXTUN, SQXTUN2 (vector) Signed saturating extract Unsigned Narrow
SRHADD (vector) Signed Rounding Halving Add
SRI (vector) Shift Right and Insert (immediate)
SRSHL (vector) Signed Rounding Shift Left (register)
SRSHR (vector) Signed Rounding Shift Right (immediate)
SRSRA (vector) Signed Rounding Shift Right and Accumulate (immediate)
SSHL (vector) Signed Shift Left (register)
SSHLL, SSHLL2 (vector) Signed Shift Left Long (immediate)
SSHR (vector) Signed Shift Right (immediate)
SSRA (vector) Signed Shift Right and Accumulate (immediate)
SSUBL, SSUBL2 (vector) Signed Subtract Long
SSUBW, SSUBW2 (vector) Signed Subtract Wide
ST1 (vector, multiple structures) Store multiple single-element structures from one, two, three, or four registers
ST1 (vector, single structure) Store a single-element structure from one lane of one register
ST2 (vector, multiple structures) Store multiple 2-element structures from two registers
ST2 (vector, single structure) Store single 2-element structure from one lane of two registers
ST3 (vector, multiple structures) Store multiple 3-element structures from three registers
ST3 (vector, single structure) Store single 3-element structure from one lane of three registers
ST4 (vector, multiple structures) Store multiple 4-element structures from four registers
ST4 (vector, single structure) Store single 4-element structure from one lane of four registers
SUB (vector) Subtract (vector)
SUBHN, SUBHN2 (vector) Subtract returning High Narrow
SUQADD (vector) Signed saturating Accumulate of Unsigned value
SXTL, SXTL2 (vector) Signed extend Long
TBL (vector) Table vector Lookup
TBX (vector) Table vector lookup extension
TRN1 (vector) Transpose vectors (primary)
TRN2 (vector) Transpose vectors (secondary)
UABA (vector) Unsigned Absolute difference and Accumulate
UABAL, UABAL2 (vector) Unsigned Absolute difference and Accumulate Long
UABD (vector) Unsigned Absolute Difference (vector)
UABDL, UABDL2 (vector) Unsigned Absolute Difference Long
UADALP (vector) Unsigned Add and Accumulate Long Pairwise
UADDL, UADDL2 (vector) Unsigned Add Long (vector)
UADDLP (vector) Unsigned Add Long Pairwise
UADDLV (vector) Unsigned sum Long across Vector
UADDW, UADDW2 (vector) Unsigned Add Wide
UCVTF (vector, fixed-point) Unsigned fixed-point Convert to Floating-point (vector)
UCVTF (vector, integer) Unsigned integer Convert to Floating-point (vector)
UDOT (vector, by element) Dot Product unsigned arithmetic (vector, by element)
UDOT (vector) Dot Product unsigned arithmetic (vector)
UHADD (vector) Unsigned Halving Add
UHSUB (vector) Unsigned Halving Subtract
UMAX (vector) Unsigned Maximum (vector)
UMAXP (vector) Unsigned Maximum Pairwise
UMAXV (vector) Unsigned Maximum across Vector
UMIN (vector) Unsigned Minimum (vector)
UMINP (vector) Unsigned Minimum Pairwise
UMINV (vector) Unsigned Minimum across Vector
UMLAL, UMLAL2 (vector, by element) Unsigned Multiply-Add Long (vector, by element)
UMLAL, UMLAL2 (vector) Unsigned Multiply-Add Long (vector)
UMLSL, UMLSL2 (vector, by element) Unsigned Multiply-Subtract Long (vector, by element)
UMLSL, UMLSL2 (vector) Unsigned Multiply-Subtract Long (vector)
UMOV (vector) Unsigned Move vector element to general-purpose register
UMULL, UMULL2 (vector, by element) Unsigned Multiply Long (vector, by element)
UMULL, UMULL2 (vector) Unsigned Multiply long (vector)
UQADD (vector) Unsigned saturating Add
UQRSHL (vector) Unsigned saturating Rounding Shift Left (register)
UQRSHRN, UQRSHRN2 (vector) Unsigned saturating Rounded Shift Right Narrow (immediate)
UQSHL (vector, immediate) Unsigned saturating Shift Left (immediate)
UQSHL (vector, register) Unsigned saturating Shift Left (register)
UQSHRN, UQSHRN2 (vector) Unsigned saturating Shift Right Narrow (immediate)
UQSUB (vector) Unsigned saturating Subtract
UQXTN, UQXTN2 (vector) Unsigned saturating extract Narrow
URECPE (vector) Unsigned Reciprocal Estimate
URHADD (vector) Unsigned Rounding Halving Add
URSHL (vector) Unsigned Rounding Shift Left (register)
URSHR (vector) Unsigned Rounding Shift Right (immediate)
URSQRTE (vector) Unsigned Reciprocal Square Root Estimate
URSRA (vector) Unsigned Rounding Shift Right and Accumulate (immediate)
USHL (vector) Unsigned Shift Left (register)
USHLL, USHLL2 (vector) Unsigned Shift Left Long (immediate)
USHR (vector) Unsigned Shift Right (immediate)
USQADD (vector) Unsigned saturating Accumulate of Signed value
USRA (vector) Unsigned Shift Right and Accumulate (immediate)
USUBL, USUBL2 (vector) Unsigned Subtract Long
USUBW, USUBW2 (vector) Unsigned Subtract Wide
UXTL, UXTL2 (vector) Unsigned extend Long
UZP1 (vector) Unzip vectors (primary)
UZP2 (vector) Unzip vectors (secondary)
XTN, XTN2 (vector) Extract Narrow
ZIP1 (vector) Zip vectors (primary)
ZIP2 (vector) Zip vectors (secondary)

ARM64 things removed compared to ARM32

" If you are familiar with ARMv7-A, you’ll know that many instructions can be conditionally executed. In A32, this is supported via a condition field in the instruction itself; in T32, we have the IT (if-then) instruction for building conditional sequences. This isn’t supported in A64 and we have a different set of specific conditional instructions. You can find examples below.

The ability to “embed” shift and rotate operations into data processing instructions is not supported in the same way in A64, although it is still possible to shift, rotate and sign-extend or zero-extend the second operand.

The Program Counter (PC) is no longer generally accessible. In particular, it can’t be read or modified like other general purpose registers. There are pseudo-instructions which can be used to use it indirectly (for instance, to generate PC-relative addresses at run-time).

Historically, the ARM instruction set has included a space for «coprocessors». Originally, these were external blocks of logic which were connected to the core via a dedicated coprocessor interface. More recently, this support for external coprocessors has been dropped and the instruction set space is used for extension instructions. One specific use of it has been to provide for system configuration and control operations via the notional «coprocessor 15». You won’t find anything like this in A64.

The load and store multiple instructions have been replaced with instructions which load and store pairs of 64-bit registers. These are used for stack operations as well, in place of the earlier PUSH and POP. " -- https://community.arm.com/cfs-file/__key/telligent-evolution-components-attachments/01-2142-00-00-00-00-52-01/Porting-to-ARM-64_2D00_bit.pdf

A comment on what ARMv8 has changed that is good, from https://www.anandtech.com/Show/Index/15036?cPage=5&all=False&sort=0&page=1&slug=sifive-announces-first-riscv-ooo-cpu-core-the-u8series-processor-ip:

" ... There is a HUGE amount of learning that informed ARMv8, from the dropping of predication and shifting everywhere, to the way constants are encoded, to high-impact ideas like load/store pair and their particular version of conditional selection, to the codification of the memory ordering rules. Look at SVE as the newest version of something very different from what they were doing earlier. " -- name99

ARM64 calling convention

https://c9x.me/compile/bib/abi-arm64.pdf

Misc

"ARM/64 fact: integer divide-by-zero doesn't cause an exception. The Microsoft C compiler implicitly inserts a check if the divisor is 0 and triggers an pseudo-div-by-0 exception. Clang and GCC compiles the code as it so you always get 0." [15]

Opinions

"...ARM was a lot more pleasant than MIPS specifically because MIPS was overly minimalistic, so it’d take a surprising number of instructions to get stuff done." gecko
"...the driving success of ARM was it's ability to run small, compact code held in cheap, small memory. ARM was a success because it made the most of limited resources. Not because it was the perfect on-paper design." [16]
"ARM was a pretty damn fine on-paper design (still is). And it was one of the fastest designs you could get back in the day. ARM gives you anything you need to make it fast (like advanced addressing modes and complex instructions) while still admitting simple implementations with good performance." [17]
""> Most current ISAs have hundreds of instructions which will never be generated by compilers." The only ISA with this problem is x86 and compilers have gotten better at making use of the instruction set. If you want to see what an instruction set optimised for compilers looks like, check out ARM64. It has instructions like “conditional select and increment if condition” which compiler writers really love." -- FUZxxl
"ARM64 (aka AArch64) is the best version of x86 yet. It's clean, very little warts, they learned from their mistakes with ARMv7/Thumb2 (specifically the IT instruction)." [18] (note: ARM64 is NOT a version of x86, they are being sarcastic)
"Thumb2 was good. The main issue with AArch64 is that they dropped the variable instruction length. As such all instructions are huge and this significantly slows down code, especially after a mispredicted branch. I'm observing on average a 20% performance loss from thumb2 to aarch64 on the exact same CPU and same kernel, just switching executables, an d 40% larger code or so. Also something to consider, an A53 can only read 64 bits per cycle from the cache, i.e. just two instructions. That doesn't even allow it to fetch a bit more and start to decode in advance. " [19]
"The original ARM ISA felt very VAX-inspired to me, such as the elegant (but ultimately inefficient) use of a general-purpose register for the program counter. I've only just started looking at AArch64 but I agree that it feels a lot more like MIPS though. I think that's a good thing." [20]
"It is interesting to watch ARM finally adopting many of the great architectural solutions that MIPS used 22 years ago, back in 1991, when it launched the MIPS R4000 family of 64 bit processors. [21]" [22]
" > This killed the basic advantage of RISC. The "all instructions the same length" concept really killed it - it meant 2x code bloat. That meant bigger caches or worse cache performance. It meant more RAM and more RAM bandwidth or worse memory performance. The x86 instruction set, for all its faults, is compact. I'm not really sure that is a big deal today, however. Maybe it was when caches were much smaller, but AArch64 went back from a variable width encoding scheme (Thumb) to uniform width instructions in 64-bit mode, without any problems that I'm aware of, and the performance is quite good. At the same time, the x86-64 ISA has gotten quite a bit less space-efficient: because of the extension to 16 registers, REX prefixes are everywhere and eat up lots of bytes of the instruction stream. " [23]
"AArch64 and x86-64 have about the same code size" -- pcwalton
" > in fact I'd say one of the reasons ARM remained competitive is because of conditional execution, the "free" barrel shifter No compiler developer would agree with you. The conditional execution wreaks havoc with dependencies, and branches are very cheap if correctly predicted. The barrel shifter is not as useful as you would think (what fraction of instructions are shl or shr?) Thumb mode does help code density, but not as much as you might think due to Thumb-1 not being practical and Thumb-2 being fairly large. AArch64 is quite a bit denser than x86-64 already. It is true that the ISA doesn't matter too much from a performance point of view. But why not take advantage of the necessary compatibility break to clean things up? There's a lot of needless complexity in our ISAs from the programmer's point of view, and cleaning it up is just good engineering practice. Let's not saddle future generations with the mistakes of the 1980s. ... (further, down, replying to a comment about the barrel shifter on immediate value encodings being useful) ... The immediate value encoding is still there. What's gone is the barrel shifter on arithmetic instructions, other than those that explicitly mention that they perform a shift. " -- pcwalton
"...generally speaking ARM v8 is pretty damn well designed..." -- ksec
"The two modern ARM instruction sets, the 16-bit-encoded ARMv7-M / ARMv8-M (for microcontrollers) and the 64-bit (32-bit-encoded) ARMv8-A, are very different from the traditional ARM ISA and they both are very well designed, incomparably better than RISC-V." -- adrian_b
"ARMv8.2 or newer is a very well designed ISA, while RISC-V is a very bad ISA and I would hate to be forced to use it. OpenPOWER? is a far better ISA than RISC-V, but unfortunately most developers do not have any experience with POWER and they have the wrong belief that POWER is some antique ISA while RISC-V must be some modern fashionable ISA. Therefore even if OpenPOWER? is much better, it is less likely than RISC-V to be used as a replacement for ARM." -- adrian_b
adrian_b on ARMv8 vs POWER:" ARMv8 was a clean design not constrained by compatibility with the past and it was created by people having a lot of experience with the implementation of ISAs in hardware and in software tools. Therefore there is no surprise that it is an efficient ISA. The only significant flaw in its first version was the lack of atomic instructions, but that was corrected in the subsequent versions. The 32-bit POWER was a very nice ISA, but it was not designed to be extendable to 64-bit. It had blocks of the encoding space reserved for future extensions, but various details of the instruction word formats depended on the fact that the size of the registers was 32 bit. When POWER was extended to 64-bit, much earlier than IBM expected, i.e. only 5 years after the introduction of the 32-bit variant, the extension was constrained because IBM has chosen to not have a mode switch like ARM but they have chosen to make a compatible ISA extension, i.e. which has the original POWER ISA as an instruction subset. This has constrained the instruction encodings, so the 64-bit POWER ISA has some parts that seem more clumsy that in ARMv8 and the result is that programs for POWER are usually slightly larger than their ARMv8 equivalents. However, the hardware implementation effort for equivalent performance levels should be very similar for POWER and ARM and significantly less for both than for x86. POWER also had a compressed encoding variant, but that was implemented in very few chips. Now the latest ISA variant has introduced 2 instruction word lengths, i.e. both 64-bit and 32-bit long instructions, instead of just 32-bit long instructions. This allows the embedding of large immediate constants in the instructions, which is an important advantage of x86 vs. traditional RISC ISAs. This might help to reduce the sizes of many POWER programs. " -- adrian_b
"AArch64 is a pretty well-designed instruction set that learns a lot of lessons from AArch32 and other competing ISAs." -- David Chisnall
"I’m not really a fan of RISC-V, but RISC-V manages to copy MIPS while avoiding the most awful parts of MIPS. If you want to learn a simple RISC assembly language, RISC-V is a better choice than MIPS. If you want to learn assembly language for a well-designed ISA, learn AArch64. If you want to learn assembly language that’s a joy to write, learn AArch32 (things like stm and ldm, predication, and the fact that $pc is a general-purpose register are great to use for assembly programmers, difficult to use for compilers, and awful to implement)." -- David Chisnall
" There are cases when cbz/tbz are very useful, but for loops they do not help at all. All the ARMv8 loops need 2 instructions, i.e. 8 bytes, instead of the single compare-and-branch of RISC-V. There are 2 ways to do simple loops in ARM, you can either use an addition that stores the flags, then a conditional branch, or you can use an addition that does not store the flags, then a CBNZ (which tests whether the loop counter is null). Both ways need a pair of instructions. Nevertheless, ARM has an unused opcode space equal in size to the space used by CBNZ/CBZ/TBNZ/TBZ (bits 29 to 31 equal to 3 or 7 instead of 1 or 5). In that unused opcode space, 4 pairs of compare-and-branch instructions could be encoded (3 pairs corresponding to those of RISC-V plus 1 pair of test-under-mask, corresponding to the TEST instruction of x86; each pair being for a condition and its negation). All 4 pairs of compare-and-branch would have 14-bit offsets, like TBZ/TBNZ, i.e. a range larger than that of the RISC-V branches. This addition to the ARM ISA would decrease the code size by 4 bytes for each 25 to 30 bytes, so a 10% to 15% improvement. " -- adrian_b
"arm64 does seem constrained by compatibility with arm32 -- at least in that they until now (ten years later) usually have to share an execution pipeline and register set. Is it really conceivable that the arm64 designers had free rein to make the choice whether to use condition codes or not on a purely technical basis? I don't think so. Even if they thought -- as all other designers of ISAs intended for high performance since 1990 have (Alpha, Itanium, RISC-V) -- that it's better not to use condition codes, I don't think they would have been free to make that choice. The same goes for whether to expose instructions using the "free" shift on the 2nd ALU input. It's not really free -- it's paid for with a longer clock cycle or an extra pipeline stage or splitting instructions into uops. And since it was there for 32 bit they might as well use it in 64 bit as well. And the same for the complex addressing modes. " -- brucehoult
(commenting upon https://gmplib.org/list-archives/gmp-devel/2021-September/006013.html ) "Funny, I thought the whole thing was bitching that RISC V has no carry flag which obviously causes multi word arithmetic to take more instructions. The obvious workaround is to use half-words and use the upper half for carry. There may be better solutions, but at twice the number of instructions this "dumb" method is better than what the author did. Flags were removed because they cause a lot of unwanted dependencies and contention in hardware designs and they aren't even part of any high level language. I still think instead of compare-and-branch they should have made "if" which would execute the following instruction only if true. But that's just just an opinion. I also hate the immediate constants (12 bits?) Inside the instruction. Nothing wrong with 16 32 or 64bit immediate data after the opcode. I hope RISC 6 will come along down the road (not soon) and fix a few things. But I like the lack of flags... " -- [24]

Links

A64:

Arm v1: