Bayle Shanks's website: proj-oot-ootAssemblyNotes27

"Thumb-2 with its unequaled code density" -- https://news.ycombinator.com/item?id=21988614

---

"Interesting that the B5000 didn't make this list. Berkeley CS252 has been reading the Design of the B5000 System paper for years....

 Aloha 1 day ago [-]

I was also surprised - but I wonder thats the computer architectures language designers like, not computer architects.

--- https://news.ycombinator.com/item?id=21988096

---

https://people.cs.clemson.edu/~mark/admired_designs.html

" ... CDC-6600 and 7600 - listed by Fisher, Sites, Smith, Worley ... Cray-1 - listed by Hill, Patterson, Sites, Smith, Sohi, Wallach (also Bell, sorta) ... IBM S/360 and S/370 - listed by Alpert, Hill, Patterson, Sites (also Bell) ... MIPS - listed by Alpert, Hill, Patterson, Sohi, Worley ... "

" Related Lists

Perspectives from Eckert-Mauchly Award Winners

Bruce Shriver and Bennett Smith asked six Eckert-Maychly Awards winners what were the 5 or 6 most important books or articles that affected the way they approached the central issues of computer architecture and what 5 or 6 books or articles they would recommend for others to read because of likely impact on future architectures. See pp. 52-61 in Shriver and Bennett, The Anatomy of a High-Performance Microprocessor: A Systems Perspective, IEEE Computer Society Press, 1998. The six award winners are: John Cocke, Harvey Cragon, Mike Flynn, Yale Patt, Dan Siewiorek, and Robert Tomasulo.

Processor design pitfalls

Grant Martin and Steve Leibson listed thirteen failed processor design styles in "Beyond the Valley of the Lost Processors: Problems, Fallacies, and Pitfalls in Processor Design." See Chapter 3 inJari Nurmi (ed.), Processor Design: System-On-Chip Computing for ASICs and FPGAs, Springer, 2007.

    Designing a high-level ISA to support a specific language of language domain
    Use of intermediate ISAs to allow a simple machine to emulate its betters
    Stack machines
    Extreme CISC and extreme RISC
    VLIW
    Overly aggressive pipelining
    Unbalanced processor design
    Omitting pipeline interlocks
    Non-power-of-2 data-word widths for general-purpose computing
    Too small an address space
    Memory segmentation
    Multithreading
    Symmetric multiprocessing"

https://news.ycombinator.com/item?id=21988096

 tachyonbeam 2 days ago [-]

I think one of the most influential designs of recent times has been the DEC Alpha lineage of 64-bit RISC processors[1].

1 point by bshanks 0 minutes ago

parent

edit

delete

on: Which Machines Do Computer Architects Admire? (201...

The ones listed by 4 or more people (not including Bell) were:

CDC-6600 and 7600 - listed by Fisher, Sites, Smith, Worley
Cray-1 - listed by Hill, Patterson, Sites, Smith, Sohi, Wallach (also Bell, sorta)
IBM S/360 and S/370 - listed by Alpert, Hill, Patterson, Sites (also Bell)
MIPS - listed by Alpert, Hill, Patterson, Sohi, Worley

Special mention:

6502 - only listed by Wilson, but she was the chief architect of ARM so i think her choice is important to note

---

some notes on Forth implementations:

kazinator 23 hours ago [-]

> Now onto the new instruction:

Adding a special VM instruction for such operations as inheriting from a class is a strikingly bad design decision.

You want to minimize the proliferation of VM instructions as much as possible.

A rule of thumb is: is it unreasonable to compile into a function call? Or else, is it likely to be heavily used in inner loops, requiring the fastest possible dispatch? If no, don't add an instruction for it. Compile it into an ordinary call of a run-time support function.

You're not going to inherit from a class hundreds of millions of times in some hot loop where the application spends 80% of its time; and if someone contrives such a thing, they don't necessarily deserve the language level support for cutting their run-time down.

munificent 22 hours ago [-]

This is a good point. One of the things I have to balance with the book is teaching the optimal way to do something without dragging through reader through a large amount of verbose, grungy code. So sometimes (but not too often) I take a simpler approach even if it's not the best one.

With this VM, we have plenty of opcode space left, so it's natural to just add another instruction for inheritance, even if it does mean that the bytecode dispatch loop doesn't fit in the instruction cache quite as well.

I'll think about this some more. I'm very hesitant to make sweeping changes to the chapter (I want nothing more in life than for the book to be done), but this would be a good place to teach readers this fundamental technique of compiling language constructs to runtime function calls.

---

" The Classical Forth Registers

The classical Forth model has five "virtual registers." These are abstract entities which are used in the primitive operations of Forth. NEXT, ENTER, and EXIT were defined earlier in terms of these abstract registers.

Each of these is one cell wide -- i.e., in a 16-bit Forth, these are 16-bit registers. (There are exceptions to this rule, as you will see later.) These may not all be CPU registers. If your CPU doesn't have enough registers, some of these can be kept in memory. I'll describe them in the order of their importance; i.e., the bottom of this list are the best candidates to be stored in memory.

W is the Working register. It is used for many things, including memory reference, so it should be an address register; i.e., you must be able to fetch and store memory using the contents of W as the address. You also need to be able to do arithmetic on W. (In DTC Forths, you must also be able to jump indirect using W.) W is used by the interpreter in every Forth word. In a CPU having only one register, you would use it for W and keep everything else in memory (and the system would be incredibly slow).

IP is the Interpreter Pointer. This is used by every Forth word (through NEXT, ENTER, or EXIT). IP must be an address register. You also need to be able to increment IP. Subroutine threaded Forths don't need this register.

PSP is the Parameter Stack (or "data stack") Pointer, sometimes called simply SP. I prefer PSP because SP is frequently the name of a CPU register, and they shouldn't be confused. Most CODE words use this. PSP must be a stack pointer, or an address register which can be incremented and decremented. It's also a plus if you can do indexed addressing from PSP.

RSP is the Return Stack Pointer, sometimes called simply RP. This is used by colon definitions in ITC and DTC Forths, and by all words in STC Forths. RSP must be a stack pointer, or an address register which can be incremented and decremented.

If at all possible, put W, IP, PSP, and RSP in registers. The virtual registers that follow can be kept in memory, but there is usually a speed advantage to keeping them in CPU registers.

X is a working register, not considered one of the "classical" Forth registers, even though the classical ITC Forths need it for the second indirection. In ITC you must be able to jump indirect using X. X may also be used by a few CODE words to do arithmetic and such. This is particularly important on processors that cannot use memory as an operand. For example, ADD on a Z80 might be (in pseudo-code)

   POP W   POP X   X+W -> W   PUSH W

Sometimes another working register, Y, is also defined.

UP is the User Pointer, holding the base address of the task's user area. UP is usually added to an offset, and used by high-level Forth code, so it can be just stored somewhere. But if the CPU can do indexed addressing from the UP register, CODE words can more easily and quickly access user variables. If you have a surplus of address registers, use one for UP. Single-task Forths don't need UP.

...

Use of the Hardware Stack

Most CPUs have a stack pointer as part of their hardware, used by interrupts and subroutine calls. How does this map into the Forth registers? Should it be the PSP or the RSP?

The short answer is, it depends. It is said that the PSP is used more than the RSP in ITC and DTC Forths. If your CPU has few address registers, and PUSH and POP are faster than explicit reference, use the hardware stack as the Parameter Stack.

On the other hand, if your CPU is rich in addressing modes -- and allows indexed addressing -- there's a plus in having the PSP as a general-purpose address register. In this case, use the hardware stack as the Return Stack.

Sometimes you do neither! The TMS320C25's hardware stack is only eight cells deep -- all but useless for Forth. So its hardware stack is used only for interrupts, and both PSP and RSP are general-purpose address registers. (ANS Forth specifies a minimum of 32 cells of Parameter Stack and 24 cells of Return Stack; I prefer 64 cells of each.)

...

Top-Of-Stack in Register

Forth's performance can be improved considerably by keeping the top element of the Parameter Stack in a register!

...

If you have at least six cell-size CPU registers, I recommend keeping the TOS in a register. I consider TOS more important than UP to have in register, but less important than W, IP, PSP, and RSP. (TOS in register performs many of the functions of the X register.) It's useful if this register can perform memory addressing. PDP-11s, Z8s, and 68000s are good candidates.

Nine of the 19 IBM PC Forths studied by Guy Kelly [KEL92] keep TOS in register.

...

What about buffering two stack elements in registers? When you keep the top of stack in a register, the total number of operations performed remains essentially the same. A push remains a push, regardless of whether it is before or after the operation you're performing. On the other hand, buffering two stack elements in registers adds a large number of instructions -- a push becomes a push followed by a move. Only dedicated Forth processors like the RTX2000 and fantastically clever optimizing compilers can benefit from buffering two stack elements in registers.

Some examples

Here are the register assignments made by Forths for a number of different CPUs. Try to deduce the design decisions of the authors from this list.

             Figure 5. Register Assignments

            W     IP    PSP   RSP   UP     TOS

8086[1] BX SI SP BP memory memory [LAX84] 8086[2] AX SI SP BP none BX [SER90] 68000 A5 A4 A3 A7=SP A6 memory [CUR86] PDP-11 R2 R4 R5 R6=SP R3 memory [JAM80] 6809 X Y U S memory memory [TAL80] 6502 Zpage Zpage X SP Zpage memory [KUN81] Z80 DE BC SP IX none memory [LOE81] Z8 RR6 RR12 RR14 SP RR10 RR8 [MPE92] 8051 R0,1 R2,3 R4,5 R6,7 fixed memory [PAY90]

[1] F83. [2] Pygmy Forth.

"SP" refers to the hardware stack pointer. "Zpage" refers to values kept in the 6502's memory page zero, which are almost as useful as -- sometimes more useful than -- values kept in registers; e.g., they can be used for memory addressing. "Fixed" means that Payne's 8051 Forth has a single, immovable user area, and UP is a hard-coded constant.

Narrow Registers

Notice anything odd in the previous list? The 6502 Forth -- a 16-bit model -- uses 8-bit stack pointers!

It is possible to make PSP, RSP, and UP smaller than the cell size of the Forth. This is because the stacks and user area are both relatively small areas of memory. Each stack may be as small as 64 cells in length, and the user area rarely exceeds 128 cells. You simply need to ensure that either a) these data areas are confined to a small area of memory, so a short address can be used, or b) the high address bits are provided in some other way, e.g., a memory page select.

In the 6502, the hardware stack is confined to page one of RAM (addresses 01xxh) by the design of the CPU. The 8-bit stack pointer can be used for the Return Stack. The Parameter Stack is kept in page zero of RAM, which can be indirectly accessed by the 8-bit index register X. (Question for the advanced student: why use the 6502's X, and not Y? Hint: look at the addressing modes available.) " -- https://www.bradrodriguez.com/papers/moving1.htm

eForth (a (popular?) portable Forth with 31 primitives):

" The following registers are required for a virtual Forth computer:

Forth Register 8086 Register Function IP SI Interpreter Pointer SP SP Data Stack Pointer RP RP Return Stack Pointer WP AX Word or Work Pointer UP (in memory) User Area Pointer

...

eForth Kernel

System interface: BYE, ?rx, tx!, !io Inner interpreters: doLIT, doLIST, next, ?branch, branch, EXECUTE, EXIT Memory access: ! , @, C!, C@ Return stack: RP@, RP!, R>, R@, R> Data stack: SP@, SP!, DROP, DUP, SWAP, OVER Logic: 0<, AND, OR, XOR Arithmetic: UM+ " -- http://www.exemark.com/FORTH/eForthOverviewv5.pdf

---

here are CamelForth?