proj-plbook-plChRiscvIsa

RISC-V

open source

clearest concise summary of major opcodes is in tables at the end of http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-146.pdf

younger than (2010) and claims to have learned from SPARC V8 (1994) and OpenRISC? (2000).

http://riscv.org/

http://riscv.org/download.html#tab_isaspec (base opcode listing in Chapter 8)

I'm not sure what addressing modes are supported, but i'm guessing it's non-uniform, with different opcodes for different modes, and mostly register, except for the 'immediate' opcodes which have an 'immediate' component, and loads and stores which have a base+offset mode, with base address in register rs1. Unconditional jumps have PC-relative addressing.

interesting comparison of RISC-V with Epiphany (the one used in Parallella) http://www.adapteva.com/andreas-blog/analyzing-the-risc-v-instruction-set-architecture/

note: RISC-V instructions that the Epiphany guy thought maybe could have been left out:

AUIPC (but in the comments a RISC-V guy says AUIPC was important for relocatable code), SLT/SLTI/STLU/SLTIU (compare: set-less-than, with unsigned and immediate and unsigned immediate variants), XORI/ORI/ANDI (boolean logic with immediate values), FENCE (mb; sync threads), MULH/MULHSU/MULHU (multiply variants with the 'upper half' variant), FSGNJ/FSGNJN/FSGNJX (Sign Inject: Sign source), FCLASS (Categorization: Classify Type). Then there was a bunch for which he said "Not needed for epiphany", which i dunno if he means 'this is good but since Epiphany had a restricted use case target (DSP) we didn't include it'. These are: FLW/FSW (load/store 'W'; i didn't note this below), FMV.X/FMV.S (move from/to integer), FRSCSR (typo? read status regs? i didn't note this below), FSRM/FSRMI (swap rounding mode; i didn't note this below), FSFLAGS (swap flags; i didn't note this below). Then there are some for which he said and 'Needed?', these are: FNMSUB (Negative Multiply-SUBtract), FMIN/FMAX (min/max), FCMP (i can't find this in the table in http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-146.pdf , so i didn't note this below).

Epiphany instructions that the Epiphany said RISC-V left out that are good: LDRD (load/store double), LDR and STR with the POSTMOD addressing mode (postincrement).

RISC-V has a choice of 16 or 32 integer registers (32 is more typical, i think?) and also optionally, 32 additional floating point registers. Memory is addressed as 8-bit bytes. The instruction encoding is 32-bit, but the 'Compressed' instruction encoding has 16-bit instructions. Instructions tend to have 32-bit, 64-bit, and 128-bit variants; arithmetic is done in at least 32-bit width ("RISC-V can load and store 8 and 16-bit items, but it lacks 8 and 16-bit arithmetic, including comparison-and-branch instructions." [7] ). Register 0 is constant 0.

RISC; no indirect or memory-memory addr modes, but instead there are LOAD and STORE instructions. No autoincrement addr modes. Some opcodes indicate immediate addr mode, others indicate register direct. Little-endian. Branching is compare-and-branch. Variable-length encoding.

"The RISC-V ISA has been designed to include small, fast, and low-power real-world implementations,[2][3] but without over-architecting for a particular microarchitecture style." [8]

"the RISC-V instruction set is designed for practicality of implementation, with features to increase a computer's speed, while reducing its cost and power use. These include placing most-significant bits at a fixed location to speed sign-extension, and a bit-arrangement designed to reduce the number of multiplexers in a CPU." [9]

"RISC-V intentionally lacks condition codes, and even a carry bit.[3] The designers claim that this can simplify CPU designs by minimizing interactions between instructions.[3] Instead RISC-V builds comparison operations into its conditional-jumps.[3] Use of comparisons may slightly increase its power use in some applications. The lack of a carry bit complicates multiple-precision arithmetic. RISC-V does not detect or flag most arithmetic errors, including overflow, underflow, and divide by zero.[3] RISC-V also lacks the "count leading zero" and bit-field operations normally used to speed software floating-point in a pure-integer processor." [10]

" A load or store can add a twelve-bit signed offset to a register that contains an address. A further 20 bits (yielding a 32-bit address) can be generated at an absolute address.[3]

RISC-V was designed to permit position-independent code. It has a special instruction to generate 20 upper address bits that are relative to the program counter. The lower twelve bits are provided by normal loads, stores and jumps.[3] " [11]

" RISC-V does define a special set of integer multiplication instructions. This includes a recommended sequence of instructions that a compiler can generate and a CPU can interpret to perform a fused multiply-accumulate operation. Multiply-accumulate is a core primitive of numerical linear algebra, and so is incorporated as part of a common benchmark, Coremark.[3][15] " [12]

Tutorials:

Retrospectives

Risc-V Compressed (16-bit encoding) opcodes

From [13] (Draft version 1.9):

Summary of Risc-V Compressed (16-bit encoding) opcodes

MOVs and loads and stores and LOADK:

Jumps and branches:

Other stack-pointer-related:

Arithmetic and boolean logic:

Misc:

Details of Risc-V Compressed (16-bit encoding) opcodes

Variants are bit-width and integer vs. floating-point (there are (optional) floating-point registers in RISC-V).

C.(F)L(W

C.(F)S(WC.(F)L(WC.(F)S(W
DQ)SP: Load value from stack (stack-pointer + 6-bit offset) into register. (there is no FLQSP though)
DQ)SP: Store value from register to stack (stack-pointer + 6-bit offset). (there is no FSQSP though)
DQ): Load value from memory (memory address in a register, plus a 5-bit immediate offset) into register (there is no FLQ though)
DQ): Store value from register into memory (memory address in a register, plus a 5-bit immediate offset) (there is no FSQ though)

C.J: Jump to offset given as an immediate constant (PC-relative, signed 11-bit, +-2k range (so +-1k instructions) C.JAL: Like C.J but also writes the current PC (plus 1 instruction) to the link register. C.JR: Jump to PC-relative offset given by register. C.JALR: Like C.JR but also writes the current PC (plus 1 instruction) to the link register.

C.BEQZ: Branch if the value in the given register zero. Offset is signed 8-bit, +-256 (so +-128 instructions). C.BNEQZ: Like C.BEQZ but branch if NOT zero.

C.LI: Load 6-bit immediate into register. C.LUI: Load 6-bit immediate into bits 17-12 of register.

C.ADDI(W): Add 6-bit immediate to register (mutating the register) C.ADDI16SP: Scale 6-bit immediate by 16 then add to stack-pointer (mutating the stack pointer). "used to adjust the stack pointer in procedure prologues and epilogues". C.ADDI4SPN: Scale 8-bit immediate by 4, add to stack pointer, and write the result to register. "used to generate pointers to stack-allocated variables".

C.S(L

R)(LA)I (logicalarithmetic) (leftright)-shifts a register (mutating it) (6-bit immediate shift amount). These variants have a non-uniform scheme for interpreting the immediate to allow it to be most useful. (there is no SLAI though)

C.ANDI is bitwise AND of a register and a 6-bit immediate (mutating the register).

C.MV is register-register MOV.

C.(ADD

SUB)(W) addssubtracts two registers and writes the result over one of the input registers.

C.AND, C.OR, C.XOR is bitwise AND of two registers, writing the result over one of the input registers.

C.BAD, the all-zero instruction, is illegal (not mnemonic is given, i made up 'BAD')

C.NOP is NOP

C.EBREAK breaks into the debugging environment.

Base instructions (32-bit encoding)

from https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf ; see also http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-146.pdf or https://www.cl.cam.ac.uk/teaching/1617/ECAD+Arch/files/docs/RISCVGreenCardv8-20151013.pdf although they have an older version of the ISA:

Multiply-divide extension ('M') instructions

from https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf or http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-146.pdf :

(note: the Epiphany guy thought the 'upper half' multiply variants could have been left out)

(mul, mulh, mulhsu, mulhu, div, divu, rem, remu)

Floating-point extension ('F') instructions

from https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf or http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-146.pdf :

Load and store:

Arithmetic

Mul-Add:

Move (note: the Epiphany guy thought these could have been left out):

Sign Inject (note: the Epiphany guy thought these could have been left out):

Min/Max (note: the Epiphany guy thought these could have been left out):

Compare:

Convert:

Categorization (note: the Epiphany guy thought these could have been left out):

Configuration instructions (read/write the Floating-Point Control and Status Register, fcsr):

Configuration pseudo-op instructions:

(flw fsw fadd fsub fmul fdiv fsqrt fmadd fmsub fnmsub fnmadd fmv.w.x fmv.x.w fsgnj fsgnjn fsgnjx fmin fmax feq flt fle fcvt.s.w fcvt.s.wu fcvt.w.s fcvt.wu.s fclass frcsr fscsr frrm fsrm fsrmi frflags fsflags fsflagsi)

Atomicity extension opcodes

from http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-146.pdf :

suggested register usage

-- [14]

RISC-V interrupts

" In this second RISC-V article I talk about its interrupt and exception system and about SiFive‘s? FE310G, the first commercial silicon implementation of a RISC-V ...

RISC-V ISA defines two major interrupt types: global and local. Basically, global interrupts are designed for multicore environments, while local interrupts are always associated with one specific core. Local interrupts suffer less overhead as there is no need for arbitration (which is the case of global interrupts on multicore systems).

...

Local interrupt system is responsible for processing a limited (and usually small) number of interrupt sources. The CLINT (Coreplex Local Interrupts) module has three basic interrupt sources: software interrupt (SI), timer interrupt (TI) and external interrupt (EI). RISC-V ISA also defines sixteen other optional local interrupt sources (which are not present on E31). One important note: all global interrupts from PLIC (Platform-level Interrupt Controller) are applied to the external interrupt input within CLINT!

RISC-V interrupt system will suspend execution flow and branch to an ISR if a local interrupt source (as long as it is previously enabled) sets its pending interrupt flag. There is also a global interrupt enable bit (MIE/SIE/UIE according to the current mode) available on MSTATUS register. This register also controls interrupt nesting, memory access privileges, etc. For further information regarding take a look at the RISC-V privileged instructions manual.

There are two ways to deal with interrupts on RISC-V: by using a single vector or multiple vectors. On the single vector mode, register MTVEC (CSR number 0x305) points to the ISR base address, that is, MTVEC points to the single/unique entry point for all ISR code. On the multiple vector mode, on the other hand, MTVEC works as a pointer to the vector table base address and the index for that table is taken from the MCAUSE register (CSR number 0x342). " [15]

RISC-V variants

DarkRiscV subset

I think it contains:

Note that it does not contain the fence*, e*, and csr* instructions (memory fences, privilege levels and configuration registers). I believe that it also omits the SCALL, SBREAK, and the counter (RD*) instructions. The above instructions are all of the RV32I instructions except for these omissions.

RISC-V links

RISC-V discussion

"

Lack of execute-only/read-only memory

" tropo 51 days ago [-]

Security:

It still won't do execute-only and true read-only memory. We've had true read-only for ages now on x86, and just got execute-only. You need these: rw- r-- --x

It still has poor support for ASLR, especially the limited-MMU variants. Even the most limited version should be able to require that the uppermost address bits be something randomish, even if it's only a per-priv-level random cookie. " -- [16]

Lack of overflow checks

" pizlonator 51 days ago [-]

"We did not include special instruction set support for overflow checks on integer arithmetic operations, as many overflow checks can be cheaply implemented using RISC-V branches."

False. For example, JavaScript? add/sub will require 3x more instructions on RISC-V than x86 or ARM. Same will be true for any other language requires (either implicitly, like JS, or explicitly, like .NET) overflow checking. Good luck with that, lol.

__s 50 days ago [-]

Many overflow checks can be removed with optimization. RISC-V's compressed encoding has shown to be ~70-80% more compact, so it has room for overflow checks. The efficiency of the architecture can always compile it out by time it hits microcode

http://joeduffyblog.com/2015/12/19/safe-native-code

pizlonator 50 days ago [-]

I pioneered most of WebKit?'s overflow check optimizations, and our compiler is bleeding-edge when it comes to eliminating them. Still, the overwhelming majority of the checks remain, because most integer values are not friendly to analysis (because they came from some heap location, or they came from some hard math, etc).

I doubt that the architecture will compile out signed integer addition overflow checks, which are the most common. They are brutal to express correctly without an overflow bit, and the architecture will have a hard time with this.

zxcdw 50 days ago [-]

Why do you suppose they left it out? Is it merely a matter of "cheap implementation" being purely subjective, and hence they might have thought it as cheap, while you seem to disagree? Or could there be a more pressing reason, but "Oh well, its cheap enough anyway" is more of an excuse?

pizlonator 50 days ago [-]

I don't think they knew that modern languages rely on overflow checks so heavily and that the perf of overflow checks dominates perf overall. " -- [17]

Opinions

Overall opinions

http://www.adapteva.com/andreas-blog/analyzing-the-risc-v-instruction-set-architecture/

" There are a lot of things about the RISCV design that come from a very ideological place and hurt in a high end design. Yes, there are extensions and designs with high end features, that's certainly true, and I'm sure people someone will be making a high end version at some point. But the ISA isn't very well suited to it compared to Power or ARM.

By default, code density on RISC-V is pretty bad. You can try to solve that by using variable length instructions which many high end RISC-V projects intend to do but having variable length instructions means your front end is going to have to be more complicated to reach the same level of performance that a fixed width instruction machine can achieve.

More instructions for a task means your back end also has to execute more instructions to reach the same level of performance. One way to do better is to fuse together ISA-level instructions into a smaller number of more complex instructions that get executed in your core. This is something that basically every high end design does but RISC-V would have to do it far more extensively than other architectures to achieve a similar level of density on the back end which makes designing a high end core more complex and possibly uses extra pipeline stages making mispredicts more costly.

And more more criticisms here: https://gist.github.com/erincandescent/8a10eeeea1918ee4f9d9982f7618ef68

EDIT: But in fairness it looks like conditional move might be getting added to the bit manipulation RISC-V extension which would fix one big pain point.

This isn't to say that RISC-V is bad. It's simplicity makes it wonderful for low end designs. It's extensibility makes it great for higher level embedded uses where you might want to add some instruction that makes your life easier for your hard driver controller or whatever in a way that would require a very expensive architecture license if you were using ARM. It's open, which would be great if you were making a high end open-source core for other people to use except the Power ISA just opened up so if I were to start a project like that I'd use that instead. " -- Symmetry

"Aarch64 has more complex addressing modes (base + index<<shift in particular) whereas RISC-V needs both RVC and fusion to do the same with similar code size and execution slot occupation. Personally, I'm leaving towards thinking that it was a mistake for RISC-V to not support such addressing modes. Unless you're aiming for something really super-constrained in terms of gate counts, having an adder and small shifter as part of your memory pipeline(s) seem like an obvious choice. And thus, having single instructions to use those pipelines isn't really committing any sins against the RISC philosophy. " -- jabl

brucehoult 1 day ago [–]

(On ARM) "NEON is guaranteed to exist on everything, and this means you're never going to see Aarch64 replace the Cortex M0 and M3. That's fragmentation right there. Severe fragmentation. Two completely incompatible ISAs. Small 32 bit RISC-V comes in smaller and lower power than An M0, and small 64 bit RISC-V is not much bigger than an M0 and is rather popular controlling something in the corner of a larger 64 bit SoC?." -- brucehoult

Misc opinions

Selected criticisms from https://gist.github.com/erincandescent/8a10eeeea1918ee4f9d9982f7618ef68 (Erin Shepherd):

    3
    ethoh edited 6 months ago | link | 

Ignoring the parent and focusing on hard data instead, RV64GC has higher code density than ARM, x86 and even MIPS16, so the encoding they chose isn’t exactly bad, objectively speaking.

    8
    david_chisnall 6 months ago | link | 

Note that Andrew’s dissertation is using integer-heavy, single-threaded, C code as the evaluation and even then, RISC-V does worse than Thumb-2 (see Figure 8 of the linked dissertation). Once you add atomics, higher-level languages, or vector instructions, you see a different story. For example, RISC-V made an early decision to make the offset of loads and stores scaled with the size of the memory value. Unfortunately, a lot of dynamic languages set one of the low bits to differentiate between a pointer and a boxed value. They then use a complex addressing mode to combine the subtraction of one with the addition of the field offset for field addressing. With RISC-V, this requires two instructions. You won’t see that pattern in pure C code anywhere but you’ll see it all over the place in dynamic language interpreters and JITs.

    1
    ethoh 6 months ago | link | 

Interesting. There’s work on an extension to help interpreters, JITs, which might or might not help mitigate this.

In any event, it is far from ready.

    6
    david_chisnall 6 months ago | link | 

I was the chair of that working group but I stepped down because I was unhappy with the way the Foundation was being run.

The others involved are producing some interesting proposals though a depressing amount of it is trying to fix fundamentally bad design decisions in the core spec. For example, the i-cache is not coherent with respect to the d-cache on RISC-V. That means you need explicit sync instructions after every modification to a code page. The hardware cost of making them coherent is small (i-cache lines need to participate in cache coherency, but they can only ever be in shared state, so the cache doesn’t have to do much. If you have an inclusive L2, then the logic can all live in L2) but the overheads from not doing it are surprisingly high. SPARC changed this choice because the overhead on process creating from the run-time linker having to do i-cache invalidates on every mapped page were huge. Worse, RISC-V’s i-cache invalidate instruction is local to the current core. That means that you actually need to do a syscall, which does an IPI to all cores, which then invalidates the i-cache. That’s insanely expensive but the initial measurements were from C code on a port of Linux that didn’t do the invalidates (and didn’t break because the i-cache was so small you were never seeing the stale entries). " [35]

It probably shouldn’t come as a surprise that saying to people ‘we need your expertise, please pay us money so that you can provide it’ didn’t lead to a huge influx of expert contributors. There were a few, but not enough. " [36]

https://www.youtube.com/watch?v=_6sh097Dk5k

Check at 51:30 where he says “I would Google RISC-V and find out all about it. They’ve done a fine instruction set, a fine job […] it’s the state of the art now for 32-bit general purpose instruction sets. And it’s got the 16-bit compressed stuff. So, yeah, learning about that, you’re learning from the best.” "

" Krste is a strong believer in macro-op fusion but I remain unconvinced. It requires decoder complexity (power and complexity), more i-cache space (power), trace caches if you want to avoid having it on the hot path in loops (power and complexity), weird performance anomalies when the macro-ops span a fetch granule and so the fusion doesn’t happen (software pain). And, in exchange for all of this, you get something that you could have got for free from a well-designed instruction set. " -- David Chisnall

" 1 smaddox 8 months ago

link

The paper linked in the article appears to show RV64GC, the compressed variant of RV64G, results in smaller program sizes than x86_64. If that’s true, wouldn’t that mean you would need less i-cache space? This isn’t my area of expertise, but I find it fascinating.

    3
    david_chisnall 8 months ago | link | 

There are a lot of variables here. One is the input corpus. As I recall, that particular paper evaluated almost exclusively C code. The generated code for C++ will use a slightly different instruction mix, for other languages the difference is even greater. To give a concrete example, C/C++ do not have (in the standard) any checking for integer overflow. It either wraps for unsigned arithmetic or is undefined for signed. This means that a+b on any C integer type up to [u]int64_t is a single RISC-V instruction. A lot of other languages (including Rust, I believe, and the implementations of most dynamic languages) depend on overflow-checked arithmetic on their fast paths. With Arm or x86 (32- or 64-bit variants), the add instructions set a condition code that you can then branch on, accumulate in a GPR, or use in a conditional move instruction. If you want to have a good fast path, you accumulate the condition code after each arithmetic op in a hot path then branch at the end and hit a slow path if any of the calculations overflowed. This is very dense on x86 or Arm.

RISC-V does not have condition codes. This is great for microarchitects. Condition code registers are somewhat painful because they’re an implicit data dependency from any arithmetic instruction to a load of others. In spite of this, Arm kept them with AArch64 (though dramatically reduced the number of predicated instructions, to simplify the microarchitecture) because they did a lot of measurement and found that a carefully optimised compiler made significant use of them.

RISC-V also doesn’t have a conditional move instruction. Krste likes to cite a paper by the Alpha authors regretting their choice of a conditional move, because it required one extra read port on the register file. These days, conditional moves are typically folded into the register rename engine and so are quite cheap in the microarchitecture of anything doing a non-trivial amount of register rename (they’re just an update in the rename directory telling subsequent instructions which value to use). Compilers have become really good at if-conversion, turning small if blocks into a path that does both versions and selects the results. This is so common that LLVM has a select instruction in the IR. To do the equivalent with RISC-V, you need to have logic in decode that recognises a small branch forward and converts it into a predicated sequence. That’s a lot more difficult to do than simply having a conditional move instruction and reduces code density.

I had a student try adding a conditional move to a small RISC-V processor a few years ago and they reproduced the result that Arm used in making this decision: Without conditional moves, you need roughly four times as much branch predictor state to get the same overall performance.

Note, also, that these results predate any vector extensions for RISC-V. They are not comparing autovectorised code with SSE, AVX, Neon, or SVE. RISC-V has used up all of its 16-bit and most of its 32-bit instruction space and so can’t add the instructions that other architectures have introduced to improve code density without going into the larger 48-bit encoding space.

1 dbremner 8 months ago

link
    The paper linked in the article appears to show RV64GC, the compressed variant of RV64G, results in smaller program sizes than x86_64

x86-64 has pretty large binary sizes; if your compressed instruction set doesn’t have smaller binaries than it should be redesigned.

I would take the measurements in that paper with a grain of salt; they aren’t comparing like-to-like. The Cortex A-15 Core benchmarks, for example, should have also been run in Thumb-2 mode. Thumb-2 causes substantial reductions in code size; it’s dubious to compare your compressed ISA to a competitor’s uncompressed ISA.

1 Forty-Bot 8 months ago

link

This paper has some size comparisons against thumb (along with Huawei’s custom extensions). This page has sone as well. " -- [37]

(links in the last comment above are: [HW/SW approaches for RISC-V code size reduction and Zephyr code examples)

"The 2 most annoying missing features are the lack of support for multi-word operations, which are needed to compute with numbers larger than 64 bits, but also the lack of support for detecting overflow in the operations with standard-size integers. If you either want larger integers or safe computations with normal integers, the number of RISC-V instructions needed for implementation is very large compared to any other ISA." -- adrian_b

" aidenn0 * " There's a blog page somewhere that's a rant for implementing saturating and other arithmetic modes. Would be a really good idea. Main one is interrupt on overflow." -- R0b0t " I agree. A lot of software only wants protection against overflows but does not depend on them for functionality. If something wants to read out the carry bit, it should be explicit and although it is unfortunate, indicating that requires a full instruction. " [48]

RISC-V's RV32V vector extension vs fixed-width SIMD

A comment on that article with an opposing view:

" 1. If you work with long dense vectors and nothing else, you don't need any CPU instructions. GPGPUs win performance and power efficiency by a factor of magnitude.

Current SIMD can be used for more than that.

A register can be treated as a complete small vector as opposed to a chunk in a long vector. Try implementing 3D vectors cross product with your approach and you'll see.

A register can be treated as a 2D bitmap as opposed to vector, here's an example: https://github.com/Const-me/SimdIntroArticle/blob/master/FloodFill/Vector/vectorFill.cpp#L135-L138

2. But the main problem with vector architectures is this part: "Vector architectures then scatter the results back from the vector registers to main memory." Main memory is very slow. When you work on SIMD algorithms, you want to avoid main memory access, instead you do as much as possible with the data while it's in vector registers. The approach you're advocating for can't quite do that. You can't invent a sane calling convention for a function that takes or returns kilobytes of data. Current architectures all pass arguments and return values in these vector registers, because their count and size are part of the ISA, i.e. stable and known to compilers. " -- Soonts

A similar argument is made by glangdale at https://news.ycombinator.com/item?id=19198758 :

" This argument is less effective given that SIMD is not always a straightforward substitute for vector processing. Sometimes we want 128, 256 or 512 bits of processing as a unit and will follow it up with something different, not a repeated instance of that same process. ... We also used SIMD quite extensively as a 'wider GPR' - not doing stuff over tons of input characters but instead using the superior size of SIMD registers to implement things like bitwise string and NFA matchers.

A SIMD instruction can be a reasonable proxy for a wide vector processor but the reverse is not true - a specialized vector architecture is unlikely to be very helpful for this kind of 'mixed' SIMD processing. Almost any "argument from DAXPY" fails for the much richer uses of SIMD processing among active practitioners using modern SIMD. "

Down that thread, other commentators point out that permute/shuffle instructions are useful but don't scale up (the arbitrary-sized 'permute' is gather/scatter, but that uses main memory (or at least cache) which is slower).

maybe see also ARM SVE, SVE2 (Scalable Vector Extension)

---

https://gist.github.com/erincandescent/8a10eeeea1918ee4f9d9982f7618ef68

A post by 'erincandescent', a former ARM engineer, with some detailed complaints about RISC-V.

---

A negative comment on RISC-V, from https://www.anandtech.com/Show/Index/15036?cPage=5&all=False&sort=0&page=1&slug=sifive-announces-first-riscv-ooo-cpu-core-the-u8series-processor-ip:

"...RISC-V has insisted on a certain kind of intellectual purity that makes no sense in terms of commerce, or the future properties of CPU manufacturing (plentiful transistors)...." -- name99

"... My point is it's the same people repeating the exact same mistakes. It has the same issues as MIPS like no register offset addressing or base with update. Some things are worse, for example branch ranges and immediate ranges are smaller than MIPS. That's what you get when you're stuck in the 80's dogma of making decode as simple as possible... " -- Wilco1

"...Saving a fraction of a mm^2 due to simplified decode is a great marketing story without doubt. However if you look at a modern SoC?, typically less than 5% is devoted to the actual CPU cores. If the resulting larger codesize means you need to add more cache/flash/DRAM, increase clock frequency to deal with the extra instructions or makes it harder for a compiler to produce efficient code, is it really an optimal system-wide decision?" -- Wilco1

" RISC-V is very similar to MIPS - MIPS never was great at codesize. When optimizing for size, compilers call special library functions to emulate instructions which are available on Arm. So you pay for saving a few transistors with lower performance and higher power consumption. " -- Wilco1

"It's not a MIPS variant. MIPS is based on work at Stanford. RISC-V is the latest incarnation of the Berkeley RISC project. You are probably thinking of SPARC which is a derivative of earlier RISC project work. MIPS is only related in that it comes from similar ideas but the two projects, Stanford and Berkeley were different." -- zmatt

"You're applying 80's RISC dogma which are no longer relevant. Transistors are cheap and efficient today, so we don't need to minimize them. We no longer optimize just the core or decoder but optimize the system as a whole. Who cares if you saved a few mW in the decoder when moving the extra instructions between DRAM and caches costs 10-100 times as much?

The RISC-V focus on simple instructions and decode is as crazy as a cult. They even want to add instruction fusion for eg. indexed accesses. So first simplify decode by leaving out useful instructions, then make it more complex again to try to make up for the missing instructions..." -- wilco1

---

" The two modern ARM instruction sets, the 16-bit-encoded ARMv7-M / ARMv8-M (for microcontrollers) and the 64-bit (32-bit-encoded) ARMv8-A, are very different from the traditional ARM ISA and they both are very well designed, incomparably better than RISC-V.

RISC-V is primitive even compared to the instructions sets used 50 years ago. It includes a few good ideas and the RISC-V team has the merit of popularizing the fact that the older vector ISAs of the seventies were better than the more recent SIMD ISAs of the nineties, which lead to modern vector ISAs, e.g. the RISC-V vector extension and ARM SVE.

However the base RISC-V ISA is extremely weak and its only merit is that it is simple enough to be easy to implement in student projects. " -- adrian_b

--- why is it risc-V?

i dunno, but this comment in the spec suggests that RISC-IV was SPUR " "Decoding register specifiers is usually on the critical paths in implementations, and so the in- 2021-12-17 struction format was chosen to keep all register specifiers at the same position in all formats at 2021-12-17 the expense of having to move immediate bits across formats (a property shared with RISC-IV 2021-12-17 aka. SPUR [11])." "

a commentator on an unrelated web discussion forum says:

"The previous design, RISC-IV (named SPUR), was a Lisp machine.

http://pages.cs.wisc.edu/~markhill/papers/computer86_spur.pd...

The basic instructions (Table 3) are close enough to RISC-V but it also had a few Lisp specific instructions (Table 4). Though the name J Extension for RISC-V might lead you to think it is only about Java or Javascript, the group working on it is also interested in making it as good as possible for Lisp. " -- https://news.ycombinator.com/item?id=19266152

---

(in the middle of a discussion about a RISC-V implementation that doesn't necessarily have RAM): "Many of the applications for a CPU like this don t need any state outside of the CPU registers especially as RISC-V lets you do multiple levels of subroutine call without touching RAM if you manually allocate different registers and a different return address register for each function (which means programming in asm not C). A lot of 8051 / PIC / AVR have been sold without any RAM (or with RAM == memory mapped registers) " -- https://lobste.rs/s/nqxfoc/serv_is_award_winning_bit_serial_risc_v#c_cv88ud


Links