proj-plbook-plChIsaComparisons

What is RISC?

Alan Clements wrote a great, well-written, fun to read writeup of RISC. Read that rather than, or at least before, this.

RISC is "Reduced Instruction Set Computer", in contrast to CISC, "Complex Instruction Set Computer". The difference between RISC and CISC is not that RISC necessarily has fewer instructions (although it often does), but rather that RISC instructions are less complex and typically can be executed within a single data memory cycle ( http://en.wikipedia.org/wiki/Reduced_instruction_set_computing#Instruction_set ). Note that this means that RISC instruction sets sometimes eschew operations which access main memory and also do something else, preferring to provide only load/store operations and not other ways of accessing main memory.

The term is not very well defined: "the statement in the 70s about (801/)RISC was that it could be done in a single chip. later in the 80s, (801/)RISC was instructions that could be executed in single machine cycle. Over the decades, the definition of RISC has been somewhat fluid ... especially as the number of circuits in a chip has dramatically increased." -- Lynn Wheeler, https://www.semipublic.comp-arch.net/wiki/RISC_versus_CISC

Here's an attempt to define it:

" what exactly is a RISC processor? This turns out to be quite hard to answer. Here is a list of possible criteria that have been used in the past.

    Instructions are conceptually simple — that is, no baroque things like `evaluate polynomial', or `edit string', both of which were found in the VAX.
    Instructions are uniform length — as opposed, to say, the VAX or M68000 which have a wide range of instruction lengths.
    Instructions use one, or very few, formats — again, unlike the VAX or M68000.
    The instruction set is orthogonal — that is, there are no special rules about what operations are permitted with particular addressing modes (which would complicate the life of a compiler writer).
    There is one, or very few, addressing modes.
    The architecture is load-and-store — that is, only load and store operations access memory — all operate instructions (e.g. arithmetic) only operate on registers.
    The architecture supports two (or perhaps a few more) datatypes — integer and floating point usually." -- http://euler.mat.uson.mx/~havillam/ca/CS323/0708.cs-323004.html

(note:

orthogonality: in processor ISAs, 'orthogonal' seems to refer to:

)

What are popular RISC architectures that might be worth looking at?

As of this writing, ARM is the most commerical successful RISC architecture. Other often-noted ones are SPARC, PowerPC?, and MIPS. Of these, some say that MIPS is the prototypical, most elegant example of RISC:

"MIPS is the cleanest successful RISC. PowerPC? and (32-bit) ARM have so many extra instructions (even a few operating modes, 32-bit ARM especially) that you could almost call them CISC. SPARC has a few odd features and Itanium is composed entirely of odd features. The latter two are more dead than MIPS." -- http://stackoverflow.com/a/2653951/171761

"Answering now your first question: the reason that MIPS features so prominently in books is that it is almost a perfect exemplar of a RISC system. It is a small, relatively pure RISC implementation that is easily understood and that illustrates RISC concepts well. For pedagogical purposes it is probably the best real-world architecture to show the nature of RISC, along with its warts. Other processors thought of as RISC (ARM, SPARC, Alpha, etc.) are more pragmatic and complicated, obfuscating RISC concepts with some more CISC-like enhancements for better performance or other benefits." -- http://stackoverflow.com/a/2796869/171761

"Almost every instruction found in the MIPS core is found in the other architectures" -- http://www.cis.upenn.edu/~milom/cis501-Fall05/papers/RISC-appendix-C.pdf

"MIPS is the most elegant among the effective RISC architectures; even the competition thought so, as evidenced by the strong MIPS influence to be seen in later architectures like DEC’s Alpha and HP’s Precision. Elegance by itself doesn’t get you far in a competitive marketplace, but MIPS microproces- sors have generally managed to be among the most efficient of each generation by remaining among the simplest" --- http://v5.books.elsevier.com/bookscat/samples/9780120884216/9780120884216.PDF

What are popular MCU architectures that might be worth looking at?

In addition, there are 8-bit microcontrollers ("MCUs"), which are not considered in the same class as CPUs but which also have interesting, small intruction sets. The PIC and the AVR architectures are popular ones (the 8051 is also popular but is older, is CISC, and does not seem to be recommended as often; however PIC and AVR are only manufactored by their respective developers, whereas 8051-compatibles are manufactored by a bunch of different companies). Note that Arduino, which you may have heard of, uses AVR or ARM. Many people comment that the AVR is easier to program than the (8-bit) PIC ( http://stackoverflow.com/questions/140049/avr-or-pic-to-start-programming-microcontroller , http://www.ladyada.net/library/picvsavr.html ), but others say that PIC is simpler (e.g. http://www.8051projects.net/lofiversion/t17539/what039s-diff039-between-8051pic-avr.html ); i suspect that they mean that the PIC has fewer instructions and a simpler architecture outside of the ISA, but the AVR has a more uniform architecture and more accessible C compilers, but i'm not too sure what they mean since i've never used either. The PIC and the AVR are both called RISC by some but the AVR has a more RISC-y design (the PIC has indirect addressing), even though it also has a larger instruction set.

AVR, PIC, ARM summary

The AVR, the PIC, and the ARM all have:

mov, jump, call, addition, subtraction, bitwise arithmetic (and/or/not/xor, rotate right, bit clears, NOP, a way to make some hardware-specific calls, branch on zero, branch on condition flag, ways to get and set the condition flags, operands specifying a destination register.

All but the PIC have load/store, relative jumps (higher end PICs have this), <= etc comparisons/branching (higher-end PICs have this), bitwise arithmetic (LSL, LSR, ASR; higher-end PICs have this), negation, carry/no-carry forms of addition and subtraction (higher-end PICs have this), access to the stack pointer (higher-end PICs have this), register indirect addressing for loads (higher-end PICs have this too). AVR Reduced Core and ARM and higher-end PICs have PUSH and POP.

The PIC doesn't have load/store because it memory maps into registers and uses banked memory to deal with the fact that it only has so many registers. The PIC is the only one with banked memory. The ARM doesn't have single instruction bit set/clear until you get to the M3, but the PIC and the AVR do, and they also have a skip/branch-if-bit Only the ARM has MUL (but the AVR Enhanced Core does, as do higher-end PICs), width extension instructions, multiprocessing instructions, multiple registers for load/store/push/pop , byte reversals; it is lacking increment/decrement, and swap nibbles, which the other two do have. Higher-end AVRs and ARMs have post-increment addressing for load/stores. Higher-end ARMs and PICs have multiply-accumulate and division.

Irregularities sometimes seen include not letting anything use immediate (constant) addressing, having an accumulator register with a special role; having to move some things into a certain register first and move it again from there to where you want it, and not having full access to the PC and SP.

In summary, it seems like a reasonable 'common core' would consist of:

mov, jump, call, addition, subtraction, bitwise arithmetic (and/or/not/xor, rotate right, bit clears, NOP, a way to make some hardware-specific calls, branch on zero, branch on condition flag, ways to get and set the condition flags, an operand to specify a destination register for each instruction, load/store, relative jumps, <= etc comparisons/branching, LSL, LSR, ASR, carry/no-carry forms of addition and subtraction, access to the stack pointer, register indirect addressing for loads, PUSH, POP. single instruction bit set/clear, skip/branch-if-bit.

A slightly extended core would also have MUL, post-increment addressing, multiply-accumulate, division, increment, decrement, swap nibbles.

RISC links

Only tangentially of interest:

Summary of major different CPUs and MCU/MCPs

Please note that i know nothing about this stuff and am only repeating what i heard on the web.

timeline:

6502: 1975 PIC: 1975/1985 Z80: 1976 x86: 1978, 1985 (32-bit) 68000: 1979 8051: 1980 MIPS: 1981 ARM: 1986 (ARM6 1992) SPARC: 1987 HC08, HC12: ? PowerPC?: 1992 MCP430: 1993 AVR: 1996

Narrative:

The 6502 was a simple, low-cost CPU; its derivatives were used in Apple IIs and many other machines. It was mentioned in http://www.landley.net/history/mirror/acorn/processors.html .

The PIC MCU family became very popular. It is popular in http://www.embedded.com/electronics-blogs/other/4420311/MCU-popularity--Engineer-vs--provider-perceptions , and mentioned in http://en.wikibooks.org/wiki/Embedded_Systems/Particular_Microprocessors , http://www.eejournal.com/archives/articles/20120822-armchoice/ , and indeed almost everywhere.

The Z80 was used in the Sinclair machines and is still a sometimes remarked-upon yet not too popular embedded CPU. It is mentioned in http://en.wikibooks.org/wiki/Embedded_Systems/Particular_Microprocessors .

The x86 became the most popular PC CPU. The 486 is the earliest version that modern Debian runs on, although it used to run on 386. It is mentioned almost everywhere.

The 68000 was for a time a major competitor to the x86s. A simplified version remains under the name ColdFire?. It is mentioned in http://www.eejournal.com/archives/articles/20120822-armchoice/

The 8051 MCU became very popular. It is mentioned in http://en.wikibooks.org/wiki/Embedded_Systems/Particular_Microprocessors , http://www.eejournal.com/archives/articles/20120822-armchoice/ .

MIPS is a RISC architecture that was once thought to be the future. Is is sometimes described as the "cleanest successful RISC" ( http://stackoverflow.com/questions/2635086/mips-processors-are-they-still-in-use-which-other-architecture-should-i-learn , http://www.cpu-collection.de/?l0=cl&l1=MIPS%20Rx000 ). It is mentioned in http://www.eejournal.com/archives/articles/20120822-armchoice/ .

ARM is the most popular 32-bit CPU architecture. A slightly simplified version is the Cortex M0. It is mentioned almost everywhere, such as http://www.eejournal.com/archives/articles/20120822-armchoice/ .

SPARC is a RISC architecture that was once thought to be the future. It is mentioned in http://www.eejournal.com/archives/articles/20120822-armchoice/ .

HC08 and HC12 evolved from the 6800 family and were popular. This family is popular in http://www.embedded.com/electronics-blogs/other/4420311/MCU-popularity--Engineer-vs--provider-perceptions and mentioned in http://en.wikibooks.org/wiki/Embedded_Systems/Particular_Microprocessors.

PowerPC? is a RISC ISA that for a time powered Apple Macintoshes.

MCP430 is a popular and relatively 'clean' 16-bit MCU with low power consumption. It is popular in http://www.embedded.com/electronics-blogs/other/4420311/MCU-popularity--Engineer-vs--provider-perceptions , and mentioned in http://en.wikibooks.org/wiki/Embedded_Systems/Particular_Microprocessors.

AVR is a relatively 'clean' 8-bit MCU architecture that is popular with hobbyists (but not so popular in industry). It is somewhat popular in http://www.embedded.com/electronics-blogs/other/4420311/MCU-popularity--Engineer-vs--provider-perceptions , and mentioned in http://en.wikibooks.org/wiki/Embedded_Systems/Particular_Microprocessors , and indeed almost everywhere.

The Cypress PSOC is relatively unpopular (later note: actually i heard about it a lot now) but interesting due to its reconfigurable analog and digital blocks, and the the Propeller and the XMOS are unpopular but interesting due to their multiprocessor natures. It is mentioned in http://en.wikibooks.org/wiki/Embedded_Systems/Particular_Microprocessors

One missing piece of data in the above is that Renesas is at the top of the list for MCU revenue yet i can't figure out which of its products are most popular, so i omit them.

Later: in this blog post is a list of popular embedded systems:

    ARM Cortex-M 	
    AVR
    AVR32
    ColdFire
    HC12
    MSP430
    PIC18
    PIC24/dsPIC 	
    PIC32 (MIPS) 	
    PowerPC 	
    RL78 	
    RX100/600 	
    SH 	
    V850 	
    x86 	

The Renesas ones are RL78, RX100/600, SH, V850 (the HC12 is a Freescale (Motorola) processor derived from the 6801). Here's a Google search for those, omitting RX600:

https://www.google.com/search?q=HC12+RL78+RX100+SH+V850

Here's a similar search, but with RX600 instead of RX100: https://www.google.com/search?q=HC12+RL78+RX200+SH+V850

This turns up: http://techon.nikkeibp.co.jp/english/handbook/MCU/handbook_mcu.pdf

which has a discussion of the genealogy Renesas's product lines starting on PDF page 25.

The RL78 is derived from a line of processors from NEC Electronics that originated with a Z80 compatible processor.

Although a line of high-end products was released, the low-end models, the 78K0 and 78K0S (initially called the 78K/0 and 78K/0S), turned out to be pervasive....

"The RL78 was the first product developed by Renesas Electronics after the merger. It combines the 78K0R core and the peripheral circuitry of the R8C. As of March 2011, however, it still uses 78K0R-based peripheral circuitry. A series of products remodeled with the use of R8C-based peripheral circuitry is slated for launch"

The V850 is derived from a line of processors from NEC Electronics that originated with a 8086 compatible processor. "They have been incorporated in various products and are still in use today".

The SH (Super-H) is derived from a Hitachi 32-bit RISC processor. "They are used for SoC? in mobile phones and other devices".

"..the RX family, a line of entirely new CISC-based processors. The company first released a 32-bit processor called the RX600, followed by the 16-bit RX200 processor. The products were designed with the intention to integrate the company’s middle-range CISC architectures into one family in the future. However, the plan was somewhat derailed by the merger with NEC Electronics in 20."

(end 'later:')

here's a Google search for those without the HC12

https://www.google.com/search?q=RL78+RX200+SH+V850

Going by the number of search results found, the last two of those are the least popular. Changing 'SH' to 'superh' is about the same. However, eliminating the first two yields more search results than eliminating the last two. Changing RX200 to RX600 helps. Changing it to RX helps more. Out of RL78 RX superh v850, eliminating the first or the last helps the most (note that many compilers support RX and superh, i guess there is some commonality there). Note that V850 is being replaced by RH850. Replacing V850 with 850 generates more results. Sometimes the RL78 is called 78k. Replacing RL78 with 78 generates more results. Replacing superh with sh (going backwards..) generates more results. Adding the keyword 'renesas' is probably a good idea:

https://www.google.com/search?q=78+RX+sh+850++renesas

Removing RX generates the most results. I also searched for some of these plus "market share". One result was:

One result is http://www.renesasinteractive.com/file.php/1/CoursePDFs/DevCon_On-the-Road/DevCon_On-the-Road/Computing_Architecture/MCU%20R%26D%20Strategies%20for%20the%20Smart%20Society.pdf

but i can't really find the revenue per line for these guys.

As of this writing, the MCP430, PIC, 6800-derivatives, and ARM appear to be the most popular MCUs in industry, and the AVR is the most popular MCU among hobbyists, and ARM and x86 are the most popular CPUs.

The 6502 is probably also worth studying due to its simplicity. ColdFire? is probably also worth studying because some people seem to like it. MIPS is probably also worth studying because people say it is a clean example of a successful RISC. The Cypress PSOC is worth studying because it is intresting, and likewise the Propeller and the XMOS.

So, my list of ISAs to explore is:

The Z80, 8051, SPARC, PowerPC? are left out of this list. They could also be studied if time permits.

Links:

which ones are popular?

archs supported by linux kernel:

as of this writing:

alpha, arc, arm, arm64, avr32, blackfin, c6x, cris, frv, hexagon, ia64, m32r, m68k, metag, microblaze, mips, mn10300, nios2, openrisc, parisc, powerpc, s390, score, sh, sparc, tile, um, unicore32, x86, xtensa [1]

Debian official ports:

amd64, armel, armhf, i386, ia64, kfreebsd, kfreebsd, mips, mipsel, powerpc, s390, s390x, sparc

-- [2]

Debian Unofficial ports:

alpha, arm64, hppa, hurd, m68k, powerpcspe, ppc64, ppc64el, sh4, sparc64, x32

-- [3]

but see also https://www.debian.org/ports/ which has more:

avr32, m32, or1k

consolidating these a little:

alpha, arc, arm, arm64, avr32, blackfin, c6x, cris, frv, hexagon, ia64, m32r, m68k, metag, microblaze, mips, mn10300, nios2, openrisc, parisc, powerpc, s390, score, sh, sparc, tile, um, unicore32, x86, xtensa (TENSILICA XTENSA), [4]

arm, x86, ia-64, mips, powerpc, s390 (z/Architecture), sparc

alpha, hppa, m68k, powerpc, sh4/sh (superH), avr, m32, or1k

some notes on more obscure linux kernel archs:

not surprisingly the Debian ones are a strict subset of the Linux kernel ones (with some renaming)

the combined, consolidated Debian ones are:

alpha, avr, arm, x86, ia-64, m32, m68k, mips, or1k, parisc (HP PA-RISC), powerpc, s390 (z/Architecture), sh (superH), sparc

and the official, consolidated Debian ones are:

arm, ia-64, mips, powerpc, s390 (z/Architecture), sparc, x86

---

http://www.cpushack.com/CPU/cpu.html

---

A Survey of RISC Architectures for Desktop, Server, and Embedded Computers by Steven Przybylskic

---

http://www.forwardcom.info/comparison.html

---

http://www.agner.org/optimize/calling_conventions.pdf https://en.wikipedia.org/wiki/Calling_convention

---

Links

AspenCore? 2017 Embedded Markets Study: top answers for:

Which of the following 8-bit chip families would you consider for your next embedded project?

Microchip PIC: 46% Atmel AVR: 43% STMicroelectronics ST6, ST7, ST8: 18% Freescale HC: 13%

Which of the following 16-bit chip families would you consider for your next embedded project?

Microchip PIC24 / dsPIC: 45% TI MSP430: 42% STMicroelectronics ST9, ST10: 22% Freescale HC16: 15%

Which of the following 32-bit chip families would you consider for your next embedded project?

STMicro STM32 (ARM): 30% Microchip PIC 32-bit (MIPS): 20% Xilinx Zynq (with dual ARM Cortex-A9): 17% Freescale i.MX (ARM): 17%

Which vendor has the best ecosystem for your needs?

Microchip or Atmel (Microchip): 14% Texas Instruments (TI): 14% ST Microelectronics: 11% NXP/Freescale/Qualcom: 11% Xilinx: 5%

-- [22]

---

Code Density Concerns for New Architectures

---

Back when architectures were designed for human assembly programmers and not just as compiler targets. Having a simple and elegant instruction set was considered a selling point.

If you’re interested in learning an assembly language, you’d be hard-pressed to find a better one than m68k.

    5
    calvin 7 months ago | link | 
    Back when architectures were designed for human assembly programmers and not just as compiler targets. Having a simple and elegant instruction set was considered a selling point

Turns out it also makes it hard to make a fast processor. Mashey believed it wasn’t the number of instructions, but the ergonomic and symmetrical forms that lead to things like memory-to-memory instructions, which makes it harder to optimize. (Memory decode, dependencies, etc. when it gets broken down into µops…)

Ironically, the ugly duckling of CISC, x86, is ugly in ways that mostly don’t matter for performance. IBM System/3x0 is actually pretty clean, and arguably on the borderline of RISC with clean mostly fixed instruction forms and mostly schewing memory to memory instructions. (Arguably, the Model 44 comes pretty close to RISC!) I don’t think it’s a coincidence that x86 and z are around today while the more aggressively assembly-friendly architectures like VAX and 68k died.

    3
    dbremner 7 months ago | link | 
    It isn’t at all a coincidence; x86 and the Z series are much easier to implement than the 68k or the VAX.
    John Mashey discusses the difficulties with implementing a high-speed VAX here.

2 hwj e-mailed 7 months ago

link
    If you’re interested in learning an assembly language, you’d be hard-pressed to find a better one than m68k.

Someone who wrote several assemblers thinks MSP430, MIPS, and AVR8 are the cleanest architectures:

https://github.com/mikeakohn/naken_asm/issues/60#issuecomment-471514168

    2
    david_chisnall 7 months ago | link | 

Having worked on a MIPS implementation and the LLVM MIPS back end, I’d agree that MIPS is clean from the perspective of writing an assembler or instruction decoder, as long as we’re talking about MIPS IV and not the newer MIPS32 and MIPS64. That is; however, the only positive thing that I could think of to say about the ISA.

2 coypoop edited 7 months ago

link

I think that refers to parsing by a machine, and indeed MIPS was designed to be very easy to parse (as were other RISC ISAs), but not to ease of writing the instructions by a human.

I’ve found MIPS to be somewhat obnoxious to write, but I realize my experiences refer to privileged code intended to work on multiple machines, so aren’t the typical MIPS experience.

2 lorddimwit 7 months ago

link
    While I still love the m68k, See MIPS Run is the best processor architecture book I’ve ever read…

1 utz 7 months ago

link

I think AVR is a good, modern-day contender that I would recommend to anyone looking to get started with Assembly.

[23]

---

"ARMv8.2 or newer is a very well designed ISA, while RISC-V is a very bad ISA and I would hate to be forced to use it." -- adrian_b

---

" The RISC-V project has had a weird mix from the start of explicitly saying that it’s not a research project and wants to be simple and also depending on research ideas. The core ISA is a fairly mediocre mid-90s ISA. Its fine, but turning it into something that’s competitive with modern x86 or AArch64 is a huge amount of work. Some ... AArch64 is a pretty well-designed instruction set that learns a lot of lessons from AArch32 and other competing ISAs. RISC-V is very close to MIPS III at the core. The extensions are somewhat better, but they’re squeezed into the tiny amount of left-over encoding space. The value of an ecosystem with no fragmentation is huge. For RISC-V to succeed, it needs to get a load of the important extensions standardised quickly, define and standardise the platform specs (underway, but slow, and without enough of the people who actually understand the problem space contributing, not helped by the fact that the RISC-V Foundation is set up to discourage contributions), and get software vendors to agree on those baselines. The problem is that, for a silicon vendor, one big reason to pick RISC-V over ARM is the ability to differentiate your cores by adding custom instructions. Every RISC-V vendor’s incentives are therefore diametrically opposed to the goals of the ecosystem as a whole. " -- David Chisnall

" If you want to get an understanding of a simple close-to-the-metal environment, RISC-V is fine. If you want to write assembly code, it’s painful. The lack of complex addressing modes means that you end up burning registers and doing arithmetic for simple tasks. If you want to do complex things like bitfield manipulation, you either need to write a lot of logic with shifts and masks or you need to use an extension (I think the bitmanip extension is standardised now, but the cores from ETH have their own variants). There are lots of clunky things in RISC-V.

ARM (AArch32 or AArch64) is much nicer to use as an assembly programmer. Both are big instruction sets, but the addressing modes on ARM are really nice to work with (it’s almost as if they, unlike the RISC-V project, followed the RISC I methodology of examining the output from compilers and working out what the common sequences of operations were, before designing an instruction set).

Note that ARM doesn’t call itself a RISC ISA anymore, it calls itself a load-store architecture. This is one of the key points of RISC (memory-register and memory-memory instructions make out-of-order execution difficult), but they’re definitely not a small ISA. They do have a much more efficient encoding than RISC-V (which, in a massive case of premature optimisation, optimised the ISA to be simple to decode in an in-order pipeline). " -- David Chisnall

---

" Having worked on a MIPS implementation and the LLVM MIPS back end, I’d agree that MIPS is clean from the perspective of writing an assembler or instruction decoder, as long as we’re talking about MIPS IV and not the newer MIPS32 and MIPS64. That is; however, the only positive thing that I could think of to say about the ISA. " -- David Chisnall

---

"If you’re interested in learning an assembly language, you’d be hard-pressed to find a better one than m68k." [24]

"Someone who wrote several assemblers thinks MSP430, MIPS, and AVR8 are the cleanest architectures:

https://github.com/mikeakohn/naken_asm/issues/60#issuecomment-471514168"

"Well, it’s just my opinion of course.. but the cleanest to me are msp430, MIPS, and avr8. From having to write an assembler, MSP430 (without the 24 bit instructions) is a simple instruction set. MIPS too. All their instructions fit a very simple pattern. From a programmer’s point of view those 3 instruction sets make the most sense. Easy to memorize the syntax and instruction names. The negatives of those msp40 / MIPS being the syntax when doing indexing. Using ( ) around the indexing register instead of [ ] like Intel makes writing an assembler much more difficult. " [25]

    2
    lorddimwit 1 year ago | link | 

While I still love the m68k, See MIPS Run is the best processor architecture book I’ve ever read…

---

RISC-V Geneology surveys 18 instruction set architectures prior to RISC-V, "chosen primarily from earlier UC Berkeley RISC architectures and major proprietary RISC instruction sets", and present a matrix of which instructions in each instruction set correspond to which RISC-V instructions

---

somewhere i should make a shortlist of modern and older well-regarded smallish ISAs to look at:

---

Great Microprocessors of the Past and Present (V 13.4.0)

---

x86 vs ARM: " there are some microarchitectural decisions that the ISA forces. The big one for x86 is that you absolutely need to have an efficient microcode engine: it is completely impractical to implement the entire ISA in fixed-function silicon. This has a huge impact on the rest of the microarchitecture.

There are a bunch of different ways of implementing microcode. The simplest is to just crack it into a bunch of normal instructions using a little state machine that pumps out decoded instructions. You typically don’t want to use architectural registers for these, so you add a few names that the rename engine can use that aren’t exposed to non-microcoded instructions. This is very easy to do but has a couple of downsides. The first is that those registers, because they are not part of architectural state, cannot be saved on context switch. This means that you need to either ensure that the microcoded instruction sequences are idempotent, or you need to disable interrupts across the instructions. When you go down this path, you often have to pause the decoder and issue a single microcoded instruction at a time.

This approach is very low impact on the silicon but if your common workloads are very low on microcoded instructions. If you want to be able to execute multiple microcoded instructions in parallel then you need to have a lot more extra logic (for example, keeping enough state that you can completely roll back all side effects of an interrupted multi-cycle instruction).

In AArch32, about the only instructions that were typically microcoded were the load and store multiple. These were a big win in code size, because they were often a single 32-bit instruction for frame setup and tear down but were microarchitectually painful. They could span pages and so might fault in the middle, which led to some horrible complexity in implementations. AArch64 replaces these with load/store pair instructions that are valid only within a stack frame. These don’t give quite as dense code but are vastly simpler to implement. x86, on the other hand has a lot of common instructions that need to be implemented in microcode and so you need an efficient microcode benchmark.

There’s also complexity in terms of the decoder. This can be quite painful from a power perspective because the decoder has to be powered almost all of the time. The only time that you don’t need it on x86 is in high-end systems when you’re in a hot loop that lives entirely in the trace cache. Arm has three instruction sets, AArch32, Thumb-2, and AArch64. The first and last of these are fixed-width encodings and so are fairly trivial to decode, Thumb-2 instructions are 32- or 16- bits, but can all be fairly easily expanded to single AArch32 instructions. AArch64 has some complexity around SVE (it’s effectively a two-width instruction set, they just pretend it isn’t).

As a colleague once said, x86 chips don’t have an instruction decoder, they have an instruction parser. Any instruction is 1-15 bytes. You need a fairly complex state machine to decode it and, because of the way that prefixes can be chained, you need to parse the whole thing before you can figure out where the next one starts. Doing that on a superscalar chip that wants to issue multiple instructions is really hard and so Intel chips don’t actually do that, they decode into trace caches that contain fixed-width micro-ops and then try to execute from there. Arm cores don’t need any of that logic and can typically cache either raw instructions or the result of some very simple expansion.

The original RISC project was designed by looking at the instructions that C compilers actually generated on CISC systems and building an ISA optimised for those sequences. AArch32 was designed the same way. AArch64 used a variety of workloads including some managed languages to go through the same process. Somewhat depressingly, RISC-V did not. x86 gradually accreted over time. AArch64 and AArch32 in ARMv7, for example, have very powerful bitfield insert and extract instructions. These are really useful for a bunch of things (such as NaN? boxing in JavaScript? JITs), but are present only in very recent x86 chips.

Arm does not have subregisters. On x86, you have AL, AH, AX, EAX, and RAX all update the same registers. For anything shorter than RAX, this means you need a read-modify-write operation on the rename register. This adds complexity in the rename engine. Arm briefly had a floating-point mode that let you treat the FPU registers as either 32 64-bit or 16 64-bit floating point registers. This caused similar pain and was abandoned (32-bit FPU ops needed to track the enclosing register pair so that they correctly half-overwrote or were overwritten by 64-bit operations). Recent Intel chips make updating subregisters quite fast but at the expense of microarchitectural complexity (i.e. power).

The Arm instruction set is designed to avoid data-dependent exceptions. Only loads and stores will trap based on the data and loads and stores will only trap on the address, not the data (unless you have ECC memory or are Morello). x86, in contrast, has a load of instructions that can trap based on the value (e.g. integer division by zero). This means that you need to start a new speculation window every time you hit a divide instruction in x86, because you may have to reset the pipeline to the state at that instruction if you discover that it traps.

In summary, there are some fundamental differences between the two that mean that, within a fixed power and area budget, you should be able to make AArch64 faster. This matters somewhat less at the very high end, because a lot of these overheads are more-or-less fixed and so there isn’t a huge amount of difference when you scale everything else up. That said, instruction scheduling / register rename are some of the biggest fixed costs on a high-end core and anything that adds to that can end up being a bottleneck. x86 is made entirely out of things that add rename and scheduler complexity. If you’re building a laptop / tablet / phone core, these fixed costs are a very noticeable part of your power / area budget. Even without being especially clever, you can spend all of the saved transistor budget on extra cache and get a big speedup. " -- David Chisnall

---

" How close is RISC-V to RISC-I? ... While architects talk about the differences (register windows, condition codes, delayed branch), it’s amazing that there aren’t more after 30 years of innovation in computer architecture fueled by Moore’s Law and Dennard Scaling (1981 to 2011). Common features of RISC-I and RV32I:

    A 32-bit byte-addressable address space
    All instructions are 32-bit long
    31 registers, with register 0 hardwired to zero, all 32 bits wide
    All operations are register-to-register (none are register-to-memory)
    The same arithmetic, logical, and shift operations
    The same load word and store word instructions
    Signed and unsigned versions of load and store byte and halfword (called “short” in RISC-I)
    Immediate option for all arithmetic, logical, and shift instructions
    Immediates are always sign-extended
    One data addressing mode (register + immediate)
    PC-relative branch addressing
    No multiply or divide instructions
    An instruction to load a wide immediate into the upper part of register so that a 32-bit constant takes only two instructions

Below is the complete ISA for both architectures, aligned by operation.

https://web.archive.org/web/20220331041830im_/https://aspire.eecs.berkeley.edu/wp/wp-content/uploads/2017/06/RISC-IvRISC-VB.png

" -- [26]

"Interestingly since this article most of the "In RISC-V, but not RISC-I" instructions have been moved to extensions: Zicsr for the CSR instructions and Zifencei for the FENCE.I instruction (which turns out not to be useful for user processes running on a multicore processor with a multitasking OS like FreeBSD? or Linux). That leaves only AUIPC, SBREAK (now called EBREAK), and FENCE. So it's even closer now than it was in 02017. " -- [27]


https://web.eece.maine.edu/~vweaver/papers/iccd09/ll_document.pdf

ll: Exploring the Limits of Code Density Vincent M. Weaver University of Maine vincent.weaver@maine.edu May 26, 2017

---

" Findecanor 7 hours ago

parent prev next [–]

A related question is that of code density, but that depends on more factors than just on which raw arithmetic or bitwise ops you have available. For instance, having a greater number of registers can be a benefit for some algorithms. Then, how many bits are used to encode each instruction.

There have been studies comparing code-density of common algorithms compiled to or hand-optimised for different instruction sets. For instance: <https://web.eece.maine.edu/~vweaver/papers/iccd09/ll_documen...>

In general, the top densest have been 2-address CISC ISAs such as Motorola 680x0 and x86 variants, followed by 2-address RISC for embedded processors such as Super-H, ARM Thumb and RISC-V's C-extension.

reply " -- [28]

---

" Just for fun, let’s compare what happens on different processor architectures when you shift a register by more than the register size. ... Unsigned Signed mod register size Alpha AXP, x86-32, x86-64, MIPS, AArch64, SPARC, RISC-V SH-4 mod 2 × register size PowerPC?, 68000 mod 256 8086, Thumb-2 full value ia64

For x86-32, I’m kind of cheating and ignoring the registers smaller than 32 bits.

Bonus chatter: The wide variety of behavior when shifting by more than the register size is one of the reasons why the C and C++ languages leave undefined what happens when you shift by more than the bit width of the shifted type. " -- Just for fun: What happens when you shift a register by more than the register size?

---

Links