proj-oot-lowEndTargets-lowEndTargets

Wouldn't it be interesting if Oot could be a good choice for embedded systems too? This is not one of my main goals, but Lua is inspiring in that it seems it started out just trying to a small, portable, simple, language suitable for embedding into host applications, and it turned out to be suitable for running on low-end hardware too.

Oot has similar, if different, goals, so maybe we can think about targetting low-end hardware too. Our similar goals are: simplicity; embeddability into host applications, because it's hard to start using a new language if you have to write a project entirely in that language, but easier if you can start by adding new functionality to an existing project in the new language. Unlike Lua, we are not as concerned with performance, however.

Survey of potential low-end hardware targets

PIC AVR ARM Intel Quark

ARM L1 cache

486 had 8K of L1 cache

OK, seems like we want the core of Oot's VM to fit in 16K or 32K or 64K or 128K. All of these fit in L2 cache on the Core i3s (which is 256k).

If we can get it into 16K, then it'll fit in L1 cache even in some lower-end solutions, and it'll fit in ROM in all but the lowest-end cards.

If we can get it into 32K, then it'll fit in L1 cache in Core i3, and it'll fit in ROM in most of the lowest-end cards.

If we can get it into 64K, it'll fit in ROM in most of the lower-end cards, but it'll take up most or all of the ROM in many of them.

If we can get it into 128K, it'll fit in ROM in some of the lower-end cards, but it'll take up most or all of the ROM in many of them.

(noting that even if the VM interpreter fits in the L1 cache, it would be nice if some of the application code would too, so that we could run small loops entirely in L1)

The embedded Javas and Pythons range from 16K to 64K.

The Apple IIe had 48K of non-bank-switched memory, plus a 16K ROM-like "language card". "Integer BASIC" fit into this 16K "language card" and could run using the other 48K as its RAM. Otoh that 16K/48K is for an 8-bit machine with 16-bit addresses, so for processors with larger instruction sizes and 32-bit or 64-bit addressing, one should scale that up:

so imo one should expect that numbers from the 6502 to at least double on modern processors (16->32 bit addresses, 1 byte -> 2 byte opcodes), so think 32K/96K = 128K total.

.NET Micro needs 256K so that's not crazy.

a "stripped down" eLua needs 128K so i doubt we'll do much better than that.

So let's dream of 16K, initially shoot for 32K, slip to 64K, and not be too unhappy if we end up at 128K. (of course in reality i'm not going to sweat these things too much so it'll probably end up much much larger, but it's interesting to think about). Note that it would be useful to have the most common loops fit into 32k even if the whole runtime or interpreter does not.

otoh if the ISA is more complex, then that's actually better for memory b/c programs can fit in less memory

but the macro (custom instruction) facility should help

hmmm... i guess macros/custom instructions are no different than subroutines (where you push return addr and args onto the stack then JMP), except that they are more compact (higher code density)

i guess that is part of the genius of Forth.

---

---

looks like consumer-priced massively parallel computers are still not available. Afaict the Pararella project is only contemplating a 64-CPU for $100, and that's the only one out there. Similarly, http://en.wikipedia.org/wiki/Intel_MIC has 32 cores. Some http://en.wikipedia.org/wiki/Nvidia_Tesla models at least offer on the order of 2048 cores -- but for a price of $3000.

so we're not getting much lower than $1/core yet in any offering. We need at least about 64k for $2000, or about $0.03/core. This is on the order of $0.01/core, so let's just say we need "a penny per core". Actually, that makes sense, because i was saying $2000 because a computer can cost $2000, but the CPU in that computer is much cheaper, on the order of $200 (retail). So $600 for the processors is already asking a lot.

If we have 64k processors at a penny each, that's about $655.36. At that point, enough hobbyists will be able to purchase one for applications to start being discovered at a reasonable rate.

The pararella 64-core is built off of this chip:

http://www.adapteva.com/epiphanyiv/

which has "2 MB On-Chip Distributed Shared Memory". They say:

" Memory System: The Epiphany memory architecture is based on a flat memory map in which each compute node has a small amount of local memory as a unique addressable slice of the total 32-bit address space. A processor can access its own local memory and other processors memory through regular load/store instructions, with the only difference being the latency and effective throughput of the transactions. The local memory system is comprised of 4 separate banks, allowing for simultaneous memory access by the instruction fetch engine, local load-store instructions, and by load/store transactions initiated by other processors within system. "

so that's 32k per chip

the new parallax propeller looks more minimal:

http://forums.parallax.com/showthread.php/155132-The-New-16-Cog-512KB-64-analog-I-O-Propeller-Chip

(32k per processor)

so it's anyone's guess how much k/core we'll have when we have 64k cores for $600, but between 8k and 32k is a good guess; more likely we can assume 32k.

so the Oot runtime, including the really core libraries, shouldn't take more than half of this, 16k. sheesh, that's small. Still, it's double what the old PDP Basic version had to work with (slightly more, 'cause i think user memory had to fit in 8k along with the interpreter on that one). https://www.google.com/search?q=+basic+8k shows various BASIC versions fit in 8k. There's even some 4k BASICs: https://www.google.com/search?q=basic+4k

this suggests that if a Oot VM or Oot Assembly has (or initially has) fixed pseudo-pointer sizes, that 16-bit pseudo-pointers will be more than enough (especially since, if we make our unboxed primitive data elements of a uniform sizeof which is larger than 1 byte, 16k bytes of memory is less than 16 objects; e.g. 16k 16-bit objects take 32k bytes).

so, a 16-bit word size, and a corresponding 2^16 = 64k pseudo-memory size (ie limits such as no more than 64k local variables in a function, etc), seems reasonable for Oot Assembly.

a parallax propeller cog is a 32-bit CPU, btw. So if our VM is 16 bits, we're undershooting that. Really, i just like 16 because 2^(2^2) = 16.

The Lua 5.1 VM 3-operand format operands are only 9 and 8 bits, so this is already bigger than that (although the 2-operand formats have an 8 bit operand and an 18-bit operand).