Difference between revision 17 and current revision
No diff available." The following registers are required for a virtual Forth computer:
Forth Register 8086 Register Function IP SI Interpreter Pointer SP SP Data Stack Pointer RP RP Return Stack Pointer WP AX Word or Work Pointer UP (in memory) User Area Pointer
...
eForth Kernel
System interface:BYE, ?rx, tx!, !io Inner interpreters: doLIT, doLIST, next, ?branch, branch, EXECUTE, EXIT Memory access: ! , @, C!, C@ Return stack: RP@, RP!, R>, R@, R> Data stack: SP@, SP!, DROP, DUP, SWAP, OVER Logic: 0<, AND, OR, XOR Arithmetic: UM+ " -- http://www.exemark.com/FORTH/eForthOverviewv5.pdf
---
here are CamelForth?'s ~70 Primitives:
http://www.bradrodriguez.com/papers/glosslo.txt
design rationale for this choice:
" 1. Fundamental arithmetic, logic, and memory operators are CODE. 2. If a word can't be easily or efficiently written (or written at all) in terms of other Forth words, it should be CODE (e.g., U<, RSHIFT). 3. If a simple word is used frequently, CODE may be worthwhile (e.g., NIP, TUCK). 4. If a word requires fewer bytes when written in CODE, do so (a rule I learned from Charles Curley). 5. If the processor includes instruction support for a word's function, put it in CODE (e.g. CMOVE or SCAN on a Z80 or 8086). 6. If a word juggles many parameters on the stack, but has relatively simple logic, it may be better in CODE, where the parameters can be kept in registers. 7. If the logic or control flow of a word is complex, it's probably better in high-level Forth. " -- http://www.bradrodriguez.com/papers/moving5.htm
ANS Forth Core wordsThese are required words whose definitions are specified by the ANS Forth document.
! x a-addr -- store cell in memory + n1/u1 n2/u2 -- n3/u3 add n1+n2 +! n/u a-addr -- add cell to memory
> n1 n2 -- flag test n1>n2, signed >R x -- R: -- x push to return stack ?DUP x -- 0
| x x DUP if nonzero |
ANS Forth ExtensionsThese are optional words whose definitions are specified by the ANS Forth document.
<> x1 x2 -- flag test not equal BYE i*x -- return to CP/M CMOVE c-addr1 c-addr2 u -- move from bottom CMOVE> c-addr1 c-addr2 u -- move from top KEY? -- flag return true if char waiting M+ d1 n -- d2 add single to double NIP x1 x2 -- x2 per stack diagram TUCK x1 x2 -- x2 x1 x2 per stack diagram U> u1 u2 -- flag test u1>u2, unsigned
Private ExtensionsThese are words which are unique to CamelForth?. Many of these are necessary to implement ANS Forth words, but are not specified by the ANS document. Others are functions I find useful.
(do) n1
| u1 n2 | u2 -- R: -- sys1 sys2 |
run-time code for DO(loop) R: sys1 sys2 --
| sys1 sys2 |
run-time code for LOOP(+loop) n -- R: sys1 sys2 --
| sys1 sys2 |
run-time code for +LOOP>< x1 -- x2 swap bytes ?branch x -- branch if TOS zero BDOS DE C -- A call CP/M BDOS branch -- branch always lit -- x fetch inline literal to stack PC! c p-addr -- output char to port PC@ p-addr -- c input char from port RP! a-addr -- set return stack pointer RP@ -- a-addr get return stack pointer SCAN c-addr1 u1 c -- c-addr2 u2 find matching char SKIP c-addr1 u1 c -- c-addr2 u2 skip matching chars SP! a-addr -- set data stack pointer SP@ -- a-addr get data stack pointer S= c-addr1 c-addr2 u -- n string compare n<0: s1<s2, n=0: s1=s2, n>0: s1>s2 USER n -- define user variable 'n' "
http://www.forthworks.com/retro https://forthworks.com/retro http://forthworks.com/retro/book.html http://forth.works/doc.html http://retroforth.org/Handbook-Latest.txt
The following notes are about Retro version 12.
Retro can be built in a reduced memory configuration that requires only 96KiB? memory (~70KiB? for the image, 0.5KiB? data stack, 1KiB? address stack, ~24KiB? remaining RAM available for use by the VM).
By contrast, "The standard system is configured with a very deep data stack (around 2,000 items) and an address stack that is 3x deeper." [1]
Much of this section is copied from http://forthworks.com/retro/book.html
Input is divided into a series of whitespace delimited tokens. Each of these is then processed individually. There are no parsing words in RETRO.
Tokens may have a single character prefix, which RETRO will use to decide how to process the token.
The major prefixes are: Prefix Used For @ Fetch from variable ! Store into variable & Pointer to named item
Comments are delimited by ().
Constant literals:
Data types used in Retro's stack notation (provided via comments in many word definitions):
To add words (which are Forth's equivalent to functions) to the "dictionary", use the ':' prefix with the name of the word, then the word definition, then ';' to terminate, e.g.:
:palindrome dup s:reverse s:eq? ;
Most Retro code is written using a literate file format called Unu, in which only code within ~~~-delimited blocks is executed. Unu has #-style headings: "# This is a heading".
" Strings in RETRO are NULL terminated sequences of values representing characters. Being NULL terminated, they can’t contain a NULL (ASCII 0).
The character words in RETRO are built around ASCII, but strings can contain UTF8 encoded data if the host platform allows. Words like s:length will return the number of bytes, not the number of logical characters in this case.
...
Some RETRO systems include support for floating point numbers. When present, this is built over the system libm using the C double type. ... Floating point values exist on a separate stack, and are bigger than the standard memory cells, so can not be directly stored and fetched from memory.
The floating point system also provides an alternate stack that can be used to temporarily store values. ... Floating point words are in the f: namespace. There is also a related e: namespace for encoded values, which allows storing of floats in standard memory. ... A combinator is a function that consumes functions as input. They are used heavily by the RETRO system. " -- http://forthworks.com/retro/book.html
" Words are grouped into broad namespaces by attaching a short prefix string to the start of a name.
The common namespaces are:
Prefix Contains
The "Glossary" is the (documentation of the?) standard library. It can be browsed online at http://forthworks.com:9999/
The words that i noticed mentioned in http://forthworks.com/retro/book.html are (much of the following is copied from there):
Combinators:
Stack ops:
Arrays:
Buffers:
Simple linear LIFO buffer. RETRO provides words for operating on a linear memory area. This can be useful in building strings or custom data structures.
A buffer is a linear sequence of memory. The buffer words provide a means of incrementally storing and retrieving values from it. The buffer words keep track of the start and end of the buffer. They also ensure that an ASCII:NULL is written after the last value, which make using them for string data easy.
Only one buffer can be active at a time. RETRO provides a buffer:preserve combinator to allow using a second one before returning to the prior one.
Example: 'Test d:create #1025 allot &Test buffer:set #100 buffer:add buffer:get n:put nl
(note: i think n:put prints numeric data, and nl prints newline)
Characters:
The Dictionary:
The Dictionary is a linked list containing the dictionary headers.
'Dictionary' is a variable holding a pointer to the most recent header.
Dictionary Header Structure (may change in future) Offset Contains
--------------------------- 0000 Link to Prior Header 0001 Link to XT 0002 Link to Class Handler 0003+ Word name (null terminated)
Don't use that structure directly; use the accessor words (see below).
Floating point:
Numbers in RETRO are signed, 32 bit integers with a range of -2,147,483,648 to 2,147,483,647.
Pointers:
Variables:
Strings:
At the interpreter, strings get allocated in a rotating buffer. If you need to keep them around, use s:keep or s:copy to move them to more permanent storage. In a definition, the string is compiled inline and so is in permanent memory.
Strings are mutable.
Printing output:
Memory:
Conventions for word naming : predicates end with '?'. Spread data flow combinators end with '*'. Apply data flow combinators end with '@'.
" Example
{{ 'A var :++A &A v:inc ; ---reveal--- :B ++A ++A @A n:put nl ; }}
In this example, the lexical namespace is created with {{. A variable (A) and word (++A) are defined. Then a marker is set with ---reveal---. Another word (B) is defined, and the lexical area is closed with }}.
The headers between {{ and ---reveal--- are then hidden from the dictionary, leaving only the headers between ---reveal--- and }} exposed. Notes
This only affects word visibility within the scoped area. As an example:
:a #1 ;
{{ :a #2 ; ---reveal--- :b 'a s:evaluate n:put ; }}
In this, after }} closes the area, the :a #2 ; is hidden and the s:evaluate will find the :a #1 ; when b is run. " -- http://forthworks.com/retro/book.html
"...temporary strings are allocated in a rotating buffer." -- http://forthworks.com/retro/book.html
" Primitives
These are words that map directly to Nga instructions.
dup drop swap call eq? -eq? lt? gt? fetch store + - * /mod and or xor shift push pop 0;
Memory
fetch-next store-next , s,
Strings
s:to-number s:eq? s:length
Flow Control
choose if -if repeat again
Compiler & Interpreter
Compiler Heap ; [ ] Dictionary d:link d:class d:xt d:name d:add-header class:word class:primitive class:data class:macro prefix:: prefix:# prefix:& prefix:$ interpret d:lookup err:notfound
I could slightly reduce this. The $ prefix could be defined in higher level code, and I don’t strictly need to expose the fetch-next and store-next here. But since the are already implemented as dependencies of the words in the kernel, it would be a bit wasteful to redefine them later in higher level code.
With these words the rest of the language can be built up. Note that the Rx kernel does not provide any I/O words. It’s assumed that the RETRO interfaces will add these as best suited for the systems they run on. " -- http://forthworks.com/retro/book.html
The words that are not described or obvious from the above are:
" There is another small bit. All images start with a few key pointers in fixed offsets of memory. These are:
| Offset | Contains |
| ------ | --------------------------- |
| 0 | lit call nop nop |
| 1 | Pointer to main entry point |
| 2 | Dictionary |
| 3 | Heap |
| 4 | RETRO version identifier |
An interface can use the dictionary pointer and knowledge of the dictionary format for a specific RETRO version to identify the location of essential words like interpret and err:notfound when implementing the user facing interface. " -- http://forthworks.com/retro/book.html
" I’ve been pleased with Nga. On its own it really isn’t useful though. So with RETRO I embed it into a larger framework that adds some basic I/O functionality. The interfaces handle the details of passing tokens into the language and capturing any output. They are free to do this in whatever model makes most sense on a given platform.
So far I’ve implemented:
In all cases, the only common I/O word that has to map to an exposed instruction is putc, to display a single character to some output device. There is no requirement for a traditional keyboard input model.
By doing this I was able to solve the biggest portability issue with the RETRO 10/11 model (...RETRO 11 and the Ngaro VM assumed the existence of a console environment. All input was required to be input at the keyboard, and all output was to be shown on screen...), and make a much simpler, cleaner language in the end. " -- http://forthworks.com/retro/book.html
" RETRO does only minimal error checking. Non-Fatal
A non-fatal error will be reported on word not found during interactive or compile time. Note that this only applies to calls: if you try to get a pointer to an undefined word, the returned pointer will be zero. Fatal
A number of conditions are known to cause fatal errors. The main ones are stack overflow, stack underflow, and division by zero.
On these, RETRO will generally exit. For stack depth issues, the VM will attempt to display an error prior to exiting.
In some cases, the VM may get stuck in an endless loop. If this occurs, try using CTRL+C to kill the process, or kill it using whatever means your host system provides. Rationale
Error checks are useful, but slow - especially on a minimal system like RETRO. The overhead of doing depth or other checks adds up quickly.
As an example, adding a depth check to drop increases the time to use it 250,000 times in a loop from 0.16 seconds to 1.69 seconds. " -- http://forthworks.com/retro/book.html
" Prefixes as a Language Element
A big change in RETRO 12 was the elimination of the traditional parser from the language. This was a sacrifice due to the lack of an I/O model. RETRO has no way to know how input is given to the interpret word, or whether anything else will ever be passed into it.
And so interpret operates only on the current token. The core language does not track what came before or attempt to guess at what might come in the future.
This leads into the prefixes. RETRO 11 had a complicated system for prefixes, with different types of prefixes for words that parsed ahead (e.g., strings) and words that operated on the current token (e.g., @). RETRO 12 eliminates all of these in favor of just having a single prefix model.
The first thing interpret does is look to see if the first character in a token matches a prefix: word. If it does, it passes the rest of the token as a string pointer to the prefix specific handler to deal with. If there is no valid prefix found, it tries to find it in the dictionary. Assuming that it finds the words, it passes the d:xt field to the handler that d:class points to. Otherwise it calls err:notfound.
This has an important implication: words can not reliably have names that start with a prefix character.
It also simplifies things. Anything that would normally parse becomes a prefix handler. So creating a new word? Use the : prefix. Strings? Use '. Pointers? Try &. And so on. E.g.,
In ANS
| In RETRO |
| :foo ... ; |
| &foo |
| :bar ... &foo ; |
| 'hello_world! |
If you are familiar with ColorForth?, prefixes are a similar idea to colors, but can be defined by the user as normal words.
After doing this for quite a while I rather like it. I can see why Chuck Moore eventually went towards ColorForth? as using color (or prefixes in my case) does simplify the implementation in many ways. " -- http://forthworks.com/retro/book.html
" The standard RETRO is not a good choice for applications needing to be highly secure. Runtime Checks
The RETRO system performs only minimal checks. It will not load an image larger than the max set at build time. And stack over/underflow are checked for as code executes.
The system does not attempt to validate anything else, it’s quite easy to crash. Isolation
The VM itself and the core code is self contained. Nga does not make use of malloc/free, and uses only standard system libraries. It’s possible for buffer overruns within the image (overwriting Nga code), but the RETRO image shouldn’t leak into the C portions.
I/O presents a bigger issue. Anything involving I/O, especially with the unix: words, may be a vector for attacks. " -- http://forthworks.com/retro/book.html
" Proven software techniques of forty years ago have yet to reach widespread use, in deference to the “latest and greatest” proprietary solutions of dubious value. ... The Retro philosophy is a simple alternative for those willing to make a clean break with legacy software. ... At first Retro will appeal to computer hobbyists and electronic engineers. Once the rough edges are smoothed out, it could catch on with ordinary folks who don’t like waiting five minutes just to check their email (not to mention the long hours of setup and maintenance). Game programmers who take their craft seriously may also be interested. ... I strive to avoid the extraneous. That applies even to proven technologies, if I don’t need them. If my computer isn’t set up for people to log in over the network, I don’t want security features; they just get in the way. ... The thousands of languages in existence all fall into a handful of archetypes: Assembler, LISP, FORTRAN and FORTH represent the earliest descendants of nearly all languages. I hesitate to name a definitive “object-oriented” language, and here’s why: Object-Oriented programming is just a technique, and any language will suffice, even Assembler. The complexites of fancy languages like Ada and C++ are a departure from reality – the reality of the actual physical machine. When it all boils down, even LISP, FORTRAN and FORTH are only extensions of the machine.
I chose FORTH as the “native tongue” of Retro. LISP, FORTRAN, and other languages can be efficiently implemented as extensions of FORTH, but the reverse isn’t so efficient. Theoretically all languages are equivalent, but when design time, compilation time, and complexity are accounted for, FORTH is most efficient. FORTH also translates most directly to the hardware. (In fact, FORTH has been implemented in hardware; these “stack machines” are extremely efficient.) FORTH is also the easiest language to implement from scratch - a major concern when you’re trying to make a clean break. So with simplicity in mind, FORTH was the obvious choice. ...
I’m perfectly happy working with text only, and I go to great lengths to avoid using the standard graphical environments, which have major problems: windows, pulldown menus, and mice. Windows can’t share the screen nicely; that idea is hopeless. Pulldowns are tedious. Mice get in the way of typing without reducing the need for it; all they give me is tendonitis. Their main use is for drawing.
Some of my favorite interfaces: Telix, Telegard BBS, Pine, Pico, Lynx, and ScreamTracker?