Embedded and low-end hardware survey: Embeddable language implementations

also libraries and OSs

JVM and java

in a separate file, [1].

LuaVMs for low-end embedded systems

according to http://lua-users.org/lists/lua-l/2007-11/msg00248.html , Lua was under 100k on cell phones, and according to http://www.lua.org/about.html , "Under Linux, the Lua interpreter built with all standard Lua libraries takes 182K...", and according to http://www.schulze-mueller.de/download/lua-poster-090207.pdf , Lua fit into 128k ROM.

http://www.luafaq.org/#T1.33 says "Embedding Lua will only add about 150-200K to your project, depending on what extra libraries are chosen. It was designed to be an extension language and it is straightforward to ensure that any user scripts operate in a 'safe' environment (see Sandboxing.) You do not even have to embed the compiler front-end of Lua, and just use the core with pre-compiled scripts. This can get the memory footprint down to about 40K."

and as noted above, there's also the eLua project:

http://www.eluaproject.net/

" It's hard to give a precise answer to this, because this is not only dependable on the footprint of eLua or it's resource requirements but on the final user applications as well. As a general rule, for a 32-bit CPU, we recommend at least 256k of Flash and at least 64k of RAM. However, this isn't a strict requirement. A stripped down, integer-only version of eLua can definitely fit in 128k of Flash and depending on your type of application, 32k of RAM might prove just fine. We have built eLua for targets with less than 10K RAM but you can't do much than blinking an LED with them. It really largely depends on your needs. "

note that instruction sizes affect things somewhat here. if you measure things in words instead of bytes, then we have x86 variable length instruction sizes, compared with (i think?) PDP's 16-bit instruction size, and (i think) APPLE's 6502 8-bit opcodes. And newer machines require more bits per each address. Presumably then the same number of instructions may take up more room in newer machines.

eLua

http://www.eluaproject.net/

" eLua is not a stripped down set of Lua to fit in the embedded environment. Much on the contrary, it strives to offer the same features as the desktop version of Lua, complementing them with specific features for embedded use and discarting the need of an operating system running on the microcontrollers. Besides offering different flavors of the full Lua implementation (like the possibility of choosing between an integer-only and a floating point numbers implementation), a lot of work was and will be done in the direction of making Lua more "embedded-friendly" by augmenting the core language with features that allow lower memory requirements and faster embedded performance. "

Python VMs for low-end embedded systems

PyMite

http://code.google.com/p/python-on-a-chip/

" Requires roughly 55 KB program memory Initializes in 4KB RAM; print "hello world" needs 5KB; 8KB is the minimum recommended RAM. Supports integers, floats, tuples, lists, dicts, functions, modules, classes, generators, decorators and closures Supports 25 of 29 keywords and 89 of 112 bytecodes from Python 2.6 Can run multiple stackless green threads (round-robin) Has a mark-sweep garbage collector Has a hosted interactive prompt for live coding Licensed under the GNU GPL ver. 2

The PyMite? VM DOES NOT HAVE:

    A built-in compiler
    Any of Python's libraries (no batteries included)
    A ready-to-go solution for the beginner (you need to know C and how to work with microcontrollers) "

" Does the PyMite? VM have a GIL?

No. "

Keywords ~~~~~~~~

PyMite? supports the following subset of Python's keywords::

and assert break class continue def del elif else for from global if import in is lambda not or pass print raise return while yield

PyMite? DOES NOT support these keywords::

except exec finally try

" PyMite? DOES NOT support Long or Complex numbers "

Sequence slicing is NOT fully supported. Only the sequence-copy form `seq[:]` is supported. Slicing is NOT supported on the bytearray type.

" PyMite? supports Dictionaries having up to 32767 key, value pairs. "

" PyMite? DOES NOT support overriding type operators using the special forms of identifiers. For example, ``__add__()`` WILL NOT implement or override the ``+`` operator. "

Library Modules

PyMite? DOES NOT offer any of the library modules from Python. Instead, PyMite? offers its own set of library modules, some of which have the same name as a module name from Python.

PyMite? offers the following library modules::

dict func list string sys

Idiom Hints

PyMite? does NOT support the idiom ``if __name__ == "__main__":``, instead this should be used: ``if ismain():`` where the ``ismain()`` function is part of the builtins module. "

bytecode interpreter:

http://code.google.com/p/python-on-a-chip/source/browse/src/vm/interp.c

micropython

http://docs.micropython.org/en/latest/pyboard/genrst/index.html

chillingeffect 1 hour ago

link

It appears from [0] that the chip of choice is the STM32F045RGT (datasheet [1]). This is from the Cortex M4f series, which includes such wonderful things as a hardware floating-point unit. That is wonderful news, although, this board appears to have no external memory, so it would be limited to 128kB.

[0] https://raw.githubusercontent.com/micropython/pyboard/master... [1] http://www.alldatasheet.com/datasheet-pdf/pdf/510587/STMICRO...

" Micro Python has the following features:

Supports almost full Python 3 syntax, including yield (compiles 99.99% of the Python 3 standard library).
Most scripts use significantly less RAM in Micro Python, and various benchmark programs run faster, compared with CPython.
A minimal ARM build fits in 80k of program space, and with all features enabled it fits in around 200k on Linux.
Micro Python needs only 2k RAM for a basic REPL.
It has 2 modes of AOT (ahead of time) compilation to native machine code, doubling execution speed.
There is an inline assembler for use in time-critical microcontroller applications.
It is written in C99 ANSI C and compiles cleanly under Unix (POSIX), Mac OS X, Windows and certain ARM based microcontrollers.
It supports a growing subset of Python 3 types and operations.
Part of the Python 3 standard library has already been ported to Micro Python, and work is ongoing to port as much as feasible.

More info at:

http://micropython.org/

You can follow the progress and contribute at github:

www.github.com/micropython/micropython www.github.com/micropython/micropython-lib "

concerning the CircuitPython? derivative of MicroPython?, from https://learn.adafruit.com/adafruit-circuit-playground-express?view=all :

" Things that are Built In and Work flow control

All the usual if, elif, else, for, while... work just as expected math

import math will give you a range of handy mathematical functions

>>> dir(math) ['__name__', 'e', 'pi', 'sqrt', 'pow', 'exp', 'log', 'cos', 'sin', 'tan', 'acos', 'asin', 'atan', 'atan2', 'ceil', 'copysign', 'fabs', 'floor', 'fmod', 'frexp', 'ldexp', 'modf', 'isfinite', 'isinf', 'isnan', 'trunc', 'radians', 'degrees']

CircuitPython? supports 30-bit wide floating point values so you can use int's and float's whenever you expect tuples, lists, arrays, and dictionaries

You can organize data in ()', []'s , and {}'s including strings, objects, floats, etc classes/objects and functions

We use objects and functions extensively in our libraries so check out one of our many examples like this MCP9808 library for class examples lambdas

Yep! You can create function-functions with lambda just the way you like em:

>>> g = lambda x: x2 >>> g(8) 64 "

https://github.com/adafruit/circuitpython

" Differences from MicroPython? ... Behavior

    The order that files are run and the state that is shared between them. CircuitPython's goal is to clarify the role of each file and make each file independent from each other.
        boot.py (or settings.py) runs only once on start up before USB is initialized. This lays the ground work for configuring USB at startup rather than it being fixed. Since serial is not available, output is written to boot_out.txt.
        code.py (or main.py) is run after every reload until it finishes or is interrupted. After it is done running, the vm and hardware is reinitialized. This means you cannot read state from code.py in the REPL anymore. CircuitPython's goal for this change includes reduce confusion about pins and memory being used.
        After code.py the REPL can be entered by pressing any key. It no longer shares state with code.py so it is a fresh vm.
        Autoreload state will be maintained across reload.
    Adds a safe mode that does not run user code after a hard crash or brown out. The hope is that this will make it easier to fix code that causes nasty crashes by making it available through mass storage after the crash. A reset (the button) is needed after its fixed to get back into normal mode.

API

    Unified hardware APIs: audioio, analogio, busio, digitalio, pulseio, touchio, microcontroller, board, bitbangio (Only available on atmel-samd21 and ESP8266 currently.)
    No machine API on Atmel SAMD21 port.

Modules

    No module aliasing. (uos and utime are not available as os and time respectively.) Instead os, time, and random are CPython compatible.
    New storage module which manages file system mounts. (Functionality from uos in MicroPython.)
    Modules with a CPython counterpart, such as time, os and random, are strict subsets of their CPython version. Therefore, code from CircuitPython is runnable on CPython but not necessarily the reverse.
    tick count is available as time.monotonic()

atmel-samd21 features

    RGB status LED
    Auto-reload after file write over mass storage. (Disable with samd.disable_autoreload())
    Wait state after boot and main run, before REPL.
    Main is one of these: code.txt, code.py, main.py, main.txt
    Boot is one of these: settings.txt, settings.py, boot.py, boot.txt"

.NET VMs for low-end embedded systems

" The .NET Micro Framework (NETMF) is an open source .NET platform for resource-constrained devices with at least 256 KBytes of flash and 64 KBytes of RAM . It includes a small version of the .NET CLR and supports development in C#, Visual Basic .NET, and debugging (in an emulator or on hardware) using Microsoft Visual Studio. NETMF features a subset of the .NET base class libraries (about 70 classes with about 420 methods), an implementation of Windows Communication Foundation (WCF), a GUI framework loosely based on Windows Presentation Foundation (WPF), and a Web Services stack based on SOAP and WSDL. NETMF also features additional libraries specific to embedded applications "

Fourth for low-end embedded systems

"Bernard Hodson.. has a Forth interpreter and a library of subroutines that occupies less than 32K"

Languages just for low-end embedding

http://www.clifford.at/embedvm/

"The VM itself takes up about 3kB of program memory on an AVR microcontroller. "

" EmbedVM? is a small embeddable virtual machine for microcontrollers with a C-like language frontend "

" The VM simulates a 16bit CPU that can access up to 64kB of memory. It can only operate on 16bit values and arrays of 16bit and 8bit values. There is no support for complex data structures (struct, objects, etc.). A function can have a maximum of 32 local variables and 32 arguments.

Besides the memory for the VM, a small structure holding the VM state and the reasonable amount of memory the EmbedVM? functions need on the stack there are no additional memory requirements for the VM. Especially the VM does not depend on any dymaic memory management.

EmbedVM? is optimized for size and simplicity, not execution speed. ...On an AVR ATmega168 running at 16MHz the VM can execute about 75 VM instructions per millisecond.

All memory accesses done by the VM are parformed using user callback functions. So it is possible to have some or all of the VM memory on external memory devices, flash memory, etc. or "memory-map" hardware functions to the VM. "

http://svn.clifford.at/embedvm/trunk/README.VM

Address space and word width

The machine can address up to 64k of memory. The memory can be addressed in units of single bytes, but local variables are always 16bit words.

It operates on 16bit words and are always interpreted as signed integers in contexts where signed/unsigned does make a difference (multiply/divide and the compare operators).

The memory is always accessed using reader/writer functions that are provided by the host environment.

All 16bit memory operations are performed in big endian byte order.

Machine State

The machine state itself is managed using three state variables:

IP (Instruction Pointer) The address of the next instruction to execute

SP (Stack Pointer) The address of the last pushed word on the stack.

SFP (Stack Frame Pointer) The address of the first local variable on the stack.

In addition to that there are three function pointers in the machine state for embedding the VM in a host environment: a reader and a writer for accessing the VM memory and a callback for calling user functions.

The Stack

The stack grows from high address to low addresses.

Each local variable is accessed using an SFA (Stack Frame Address). The address of a local variable is:

          SFA >= 0  ?  SFP - 2*SFA - 2 :  SFP - 2*SFA + 2

There is a maximum of 32 local variables in a stack frame and 32 arguments to a function (arguments have negative SFA).

Calculations are done in a stackmachine like manner (each operation pops its operands from the stack and pushes the result).

The return address for the current function is stored in the 16-bit word on SFP + 2 and the parent stack frame pointer at SFP + 4.

All variables on the stack must be aligned on even addresses. So the stack pointer must be initialized to an even address (LSB not set).

Instruction Encoding

Most instructions fit in a single byte. Some have a 1 or 2 bytes argument. Some instructions push and/or pop values to/from the stack. Not all of the instructions are actually used by the compiler.

+-------------------------------+----------------------------------------+----+

+---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+ +---+---+---+---+---+---+---+---+----------------------------------------+----+

MSB LSB	Instruction	Off
0	0	SFA	Push local variable to stack	00
0	1	SFA	Pop local variable from stack	40
1	0	0	0	0 .. 0x0	Add (pop 2, push 1)	80
1	0	0	0	1 .. 0x1	Sub (pop 2, push 1)
1	0	0	0	2 .. 0x2	Mul (pop 2, push 1)
1	0	0	0	3 .. 0x3	Div (pop 2, push 1)
1	0	0	0	4 .. 0x4	Mod (pop 2, push 1)
1	0	0	0	5 .. 0x5	Shift Left (pop 2, push 1)
1	0	0	0	6 .. 0x6	Shift Right (pop 2, push 1)
1	0	0	0	7 .. 0x7	Bitwise AND (pop 2, push 1)
1	0	0	0	8 .. 0x8	Bitwise OR (pop 2, push 1)
1	0	0	0	9 .. 0x9	Bitwise XOR (pop 2, push 1)
1	0	0	0	10 .. 0xa	Logic AND (pop 2, push 1)
1	0	0	0	11 .. 0xb	Logic OR (pop 2, push 1)
1	0	0	0	12 .. 0xc	Bitwise NOT (pop 1, push 1)
1	0	0	0	13 .. 0xd	Arithmetic invert (pop 1, push 1)
1	0	0	0	14 .. 0xe	Logic NOT (pop 1, push 1)
1	0	0	0	15 .. 0xf	Reserved
1	0	0	1	0	VAL	Push immediate (VAL is signed)	90
1	0	0	1	1	0	0	0	Push unsigned 1-byte argument	98
1	0	0	1	1	0	0	1	Push signed 1-byte argument	99
1	0	0	1	1	0	1	0	Push signed 2-byte argument	9a
1	0	0	1	1	0	1	1	Return from function (pop 1)	9b
1	0	0	1	1	1	0	0	Return from function without value	9c
1	0	0	1	1	1	0	1	Drop value (pop 1)	9d
1	0	0	1	1	1	1	0	Call address (pop 1)	9e
1	0	0	1	1	1	1	1	Jump to address (pop 1)	9f
1	0	1	0	0	0	Jump (1-byte rel. address)	a0
1	0	1	0	0	1	Jump (2-byte rel. address)
1	0	1	0	0	2	Call (1-byte rel. address)
1	0	1	0	0	3	Call (2-byte rel. address)
1	0	1	0	0	4	Jump IF (pop 1, 1-byte rel. address)
1	0	1	0	0	5	Jump IF (pop 1, 2-byte rel. address)
1	0	1	0	0	6	Jump UNLESS (pop 1, 1-byte rel. addr.)
1	0	1	0	0	7	Jump UNLESS (pop 1, 2-byte rel. addr.)
1	0	1	0	1	0	Compare "<" (pop 2, push 1)	a8
1	0	1	0	1	1	Compare "<=" (pop 2, push 1)
1	0	1	0	1	2	Compare "==" (pop 2, push 1)
1	0	1	0	1	3	Compare "!=" (pop 2, push 1)
1	0	1	0	1	4	Compare ">=" (pop 2, push 1)
1	0	1	0	1	5	Compare ">" (pop 2, push 1)
1	0	1	0	1	6	Stack Pointer (push 1)	ae
1	0	1	0	1	7	Stack Frame Pointer (push 1)	af
1	0	1	1	Func-ID	Call User Function	b0
1	1	0	M	Load 8u (push 1, M is addr mode)	c0
1	1	1	M	Store 8u (pop 1, M is addr mode)	c8
1	1	2	M	Load 8s (push 1, M is addr mode)	d0
1	1	3	M	Store 8s (pop 1, M is addr mode)	d8
1	1	4	M	Load 16 (push 1, M is addr mode)	e0
1	1	5	M	Store 16 (pop 1, M is addr mode)	e8
1	1	K	5	Bury dup of top in depth K (max 5)
1	1	K	6	Dig up the element in depth K (max 5)
1	1	K	7	Reserved
1	1	6	N	Push N+1 zeros to the stack	f0
1	1	7	N	Pop N+1 values but keep top	f8

Adress Modes (M) for "Load" and "Store" instructions:

   0 ..... address as 1 byte (unsigned) argument
   1 ..... address as 2 byte argument
   2 ..... address as popped value (pop before data)
   3 ..... address as popped + 1 byte (unsigned) argument
   4 ..... address as popped + 2 byte argument
   5-7 ... shuffle instructions (not load/store)

Note: for 16bit load and store operations in mode 3 and 4 the popped data from the stack is multiplied by 2.

A "bury" with K=0 simply duplicates the top element on the stack; A "dig" with K=0 exchages the two top elements on the stack

Execution Model

The function embedvm_exec() executes one single VM instruction and then returns. The user provided callback functions are used to access the vm memory and call user functions.

libc

---

discussion on musl:

https://news.ycombinator.com/item?id=4058663

vasco 830 days ago

link

Don't really get the theory behind changing the default stack size for threads. Feels like they did it just to be different which might get someone scratching their heads for a bit.

dalias 830 days ago

link

The glibc default thread stack size is unacceptable/broken for a couple reasons. It eats memory like crazy (usually 8-10 megs per thread), and by memory I mean commit charge, which is a highly finite resource on a well-configured system without overcommit. Even if you allow overcommit, on 32-bit systems you'll exhaust virtual memory quickly, putting a low cap on the number of threads you can create (just 300 threads will use all 3GB of address space).

With that said, musl's current default is way too low. It's caused problems with several major applications such as git. We're in the process of trying to establish a good value for the default, which will likely end up being somewhere between 32k and 256k. I'm thinking 80k right now (96k including guard page and POSIX thread-local storage) but I would welcome evidence/data that helps make a good choice.

---

"I'm currently exploring the TI CC3200, it has a good price point (30$ via TI) and is quite capable (80Mhz M4 Cortex, 256k RAM) - for me its close to the sweet spot. If TI lowered the price even more, they'd own the market."

---

http://www.etalabs.net/compare_libcs.html

Bloat comparison musl uClibc dietlibc glibc Complete .a set 412k 360k 120k 2.0M † Complete .so set 516k 520k 185k 7.9M † Smallest static C program 1.8k 7k 0.2k 662k Static hello (using printf) 13k 51k 6k 662k Dynamic overhead (min. dirty) 20k 40k 40k 48k Static overhead (min. dirty) 8k 12k 8k 28k Static stdio overhead (min. dirty) 8k 20k 16k 36k Configurable featureset no yes minimal minimal

License MIT LGPL 2.1 GPL 2 LGPL 2.1+ w/exceptions

so we see from the above that even the smallest libc is probably too big for our 4k/8k/16k/128k dreams

lisp

lisp machines

random article on Lisp machines:

https://news.ycombinator.com/item?id=8340283

also mentioned the Reduceron which apparently ran Haskell or something like it

http://thorn.ws/reduceron/Reduceron/Practical_Reduceron.html

" Reduceron is a high performance FPGA soft-core for running lazy functional programs, complete with hardware garbage collection. Reduceron has been implemented on various FPGAs with clock frequency ranging from 60 to 150 MHz depending on the FPGA. A high degree of parallelism allows Reduceron to implement graph evaluation very efficiently. "

javascript (js)

http://duktape.org/

Duktape

Duktape is an embeddable Javascript engine, with a focus on portability and compact footprint.

Duktape is easy to integrate into a C/C++ project: add duktape.c and duktape.h to your build, and use the Duktape API to call Ecmascript functions from C code and vice versa.

Main features:

    Embeddable, portable, compact:
        200kB code
        46kB startup RAM (x86, default options)
        22kB startup RAM (x86, lowmem options)
        42kLoC source (excluding comments etc)
        Can run on platforms with 256kB flash and 96kB system RAM
    Ecmascript E5/E5.1 compliant, some features borrowed from E6 draft
    Built-in regular expression engine
    Built-in Unicode support
    Minimal platform dependencies
    Combined reference counting and mark-and-sweep garbage collection with finalization
    Custom features like coroutines, built-in logging framework, and built-in CommonJS-based module loading framework
    Property virtualization using a subset of Ecmascript E6 Proxy object"

userbinator 265 days ago

link

The source is around 60kLOC and 2MB, so it's somewhere between the big ones like V8 and the smaller JS interpreters I know of:

http://code.google.com/p/tiny-js/ (~2kLOC)

http://code.google.com/p/quad-wheel/

svaarala 10 days ago

link

Around 200kB of code (flash), and around 25kB of RAM after startup, with low memory optimizations in the master branch (for the upcoming Duktape 1.1 release). This doesn't include User Javascript, just what Duktape needs to start up.

You can run Duktape with 128kB RAM, and to some extent with 96kB, see e.g.: http://www.slideshare.net/seoyounghwang77/js-onmicrocontroll...

chubot 266 days ago

link

This looks really cool. I always wanted a small embeddable JS engine with a C interface. And it looks like you modelled the C API after the Lua C API? That is what I wanted as well.

saurik 266 days ago

link

JavaScriptCore? (the engine from WebKit?) also has a very simple C interface (and I've seen it embedded by people as a static library without serious difficulty).

mamod 8 days ago

link

I've been using duktape for some while now and it's totally amazing, for the wonder minds about why using duktape instead of V8 or SM here's some of my list

First of all duktape written in C not C++ I hope this is a good point for you as it was for me :)

Not everyone seeks speed, for me, simplicity and ease of use beats speed any given day, after all I'm not after complex computations and fib that loops for ever :) one file to include and pretty sure it well compile in many platforms to get a full working Ecmascript E5/E5.1 how awesome!

Compilation time, don't know about current SM compilation time but I tried once to compile V8 and I still feel sorry for doing that to my poor old machine :)

Memory management, I'm not sure how, but duktape's GC is amazing.

Last and the most appealing point for me "unfortunately no one mentioned that yet" is documentations, I don't mean just api, please visit duktape.org and see how beautiful well written everything there, Sami "the author" put a great effort not just to write duktape but it's documentation too, api, usage, comments, specifications, examples ... and best of all it was clear and very easy to follow.

The only thing I hope is more people join to help Sami maintain the project, currently the man is very active and responsive, but I think the project will expand in the next few weeks with more and more users, so I guess he will need some help then :)

SCHiM? 10 days ago

link

You've got to love it when a library is at that point where ease-of-use and features perfectly intercept.

This library is great:

On the simplicity side:

Only include two files, no linking, no shared objects, no cruft.

On the feature side:

The functions are simple, the usage-pattern is familiar (alloc, use, free) and the code is fast (written in C, which _almost_ makes it fast by default).

Great library, will probably use it when I need to write some more scrapers in the future.

mempko 10 days ago

link

seems the C interface is modelled after Lua. Is there any inspiration there?

bodyfour 10 days ago

link

Yes, very much so. If you've embedded lua before it's very easy to do the same with duktape. A few conventions are different -- for instance stack indicies are 0-based instead of 1-based (mirroring the difference in JS vs Lua) and some of the strategies for interfacing with native-code objects is different (JS prototypes aren't exact analogues to Lua metatables, etc)

RyJones? 10 days ago

link

We use Duktape at AllSeen? Alliance (disclaimer: I work there) to put JavaScript? bindings on small devices.

https://git.allseenalliance.org/cgit/core/alljoyn-js.git/sum...

errordeveloper 9 days ago

link

It's pretty cool indeed what you guys are doing with it, although currently there are not as many MCUs where Duktape fits... The STM32F4 Discovery board that AllJoyn?+Duktape targets, is the same one Java Embedded also runs on, IIRC. From my knowledge the STM32F4's are some of the fatter micros currently on the market.

my notes on this being "some of the fatter micros currently on the market": the STM32F4 Discovery board (costs about $15) has a STM32F407VGT6 (costs about $8 to $12) with 1MB flash and 192kb RAM :

STM32F407VGT6 microcontroller featuring 32-bit ARM Cortex-M4F core, 1 MB Flash, 192 KB RAM in an LQFP100 package

according to the STM32 product page, the STM32s are divided by core and by frequency into 8 product lines. The F4s are described as "high-performance MCUs with DSP and FPU" with M4 cores. Going below that, we have the F3, "mixed-signal MCUs with DSP and FPU" on the M4, and then in the cortex-M3, the F2 ("high-performance MCUs"), the F1 ("mainstream MCUs"), the L1 ("ultra-low-power MCUs", in the M0+ the L0 ("ultra-low-power MCUs"), and in the M0 the F0 ("entry-level MCUs").

in k:

F0: 16-256 flash, 3-32 RAM, M0, "entry-level MCUs" (about $0.52 to $5.3, median about $3, 16-64 flash, 8 RAM) L0: 32-64 flash, 8 RAM, M0+,"ultra-low-power MCUs" L1: 32-512 flash, 4-80 RAM, M3, "ultra-low-power MCUs" F1: 16-512 flash, 4-96 RAM, M3, "mainstream MCUs" (about $1.7 to $15, median about $7, 64-512 flash, 20-64 RAM) F2: 128-1024 flash, 64-128 RAM, M3, "high-performance MCUs" F3: 32-512 flash, 16-80 RAM, M4, "mixed-signal MCUs with DSP and FPU"" F4: 128-1024 flash, 96-256 RAM, M4, "high-performance MCUs with DSP and FPU" (about $3.5 to $20, median about $7.5, 512-1024 flash, 192 RAM)

the duktape specs (above) say "Can run on platforms with 256kB flash and 96kB system RAM" so i guess the next step better would be to run on platforms with 128k flash and 64k ram. This would put you in reach of the median (larger side of median) f1 "mainstream" series of STM32s. Duktape however only uses 200k flash and 46k RAM, again according to the above specs (and this is without the lowmem RAM), so in order to have at least 25% of flash and RAM free for the program on 128k flash 64k RAM systems, your interpreter would have to fit within about 96k flash and 48k RAM, and in order to have at least 25% of flash and 50% of RAM free, 96k flash and 32k RAM.

it would be really nice to fit in the F0, but going from 256k and 96k to 64k and 8k is a lot to ask

so if Duktape is too large because the F4s are too big to require, then the next step down would appear to be an interpreter that fits within 96k of flash (ARM M4 instruction set) and uses no more than 32k RAM upon startup. Even better would be 64k flash and 8k ram.

BASIC

" Another note: In high school or my first year of college I told my dad that someday I'd own a 4K Data General NOVA. He said it cost as much as a down payment on an expensive house. I was stunned and told him I'd live in an apartment.

Why 4KB?

Because that was the minimum needed to run a higher level language. To me a computer had to have more than switches an lights. It had to be able to run programs.

" -- http://gizmodo.com/how-steve-wozniak-wrote-basic-for-the-original-apple-fr-1570573636/all

according to "dec-11-ajpb-d pdp-11 basic programming manual", available from http://bitsavers.trailing-edge.com/pdf/dec/pdp11/basic/DEC-11-AJPB-D_PDP-11_BASIC_Programming_Manual_Dec70.pdf , or as text at https://archive.org/stream/bitsavers_decpdp11baASICProgrammingManualDec70_5936477/DEC-11-AJPB-D_PDP-11_BASIC_Programming_Manual_Dec70_djvu.txt ,

" A. 2 USER STORAGE REQUIREMENTS

BASIC can be run in the minimal 4K PDP-11/20 configuration. With the BASIC program in core, and deducting space reserved for the Bootstrap and Absolute Loaders, approximately 450 words are left for total user storage (program storage plus working storage) . "

i believe this 4k is 4k WORDS, and each word is two bytes, so BASIC takes up most of 8k, with about 900 bytes to spare; that is to say, BASIC takes about 7292 bytes, or just over 7k.

Erlang

http://www.erlang.org/faq/implementations.html

" 8.9 Is Erlang small enough for embedded systems?

 Rule of thumb: if the embedded system can run an operating system like linux, then it is possible to get current implementations of Erlang running on it with a reasonable amount of effort.

Getting Erlang to run on, say, an 8 bit CPU with 32kByte of RAM is not feasible.

People successfully run the Ericsson implementation of Erlang on systems with as little as 16MByte of RAM. It is reasonably straightforward to fit Erlang itself into 2MByte of persistant storage (e.g. a flash disk).

 A 2MByte stripped Erlang system can include the beam emulator and almost all of the stdlib, sasl, kernel, inets and runtime_tools libraries, provided the libraries are compiled without debugging information and are compressed: "

Contiki OS

http://www.wired.com/2014/06/contiki

" While Linux requires one megabyte of RAM, Contiki needs just a few kilobytes to run. Its inventor, Adam Dunkels, has managed to fit an entire operating system, including a graphical user interface, networking software, and a web browser into less than 30 kilobytes of space. That makes it much easier to run on small, low powered chips–exactly the sort of things used for connected devices–but it’s also been ported to many older systems like the Apple IIe and the Commodore 64. "

Scheme

"tinyScheme, which is a BSD licensed, very small, very fast implementation of Scheme that can be compiled down into about a 20K executable if you know what you’re doing."

copx 21 hours ago [-]

> And it's small. Currently the interpreter, JIT, GC, and stdlib clock in at about 10.3MB once compiled down to an executable.

Oh how the definition of "small" has changed. I actually would like to know how they managed to make something like this so big.

To compare, LuaJIT? is about 400 KB, and that includes the Lua standard library, a JIT almost certainly more advanced than Pixie's current one, an incremental GC, and a C FFI.

Neither compilers (well, except C++ ones), nor stuff you usually find in standard libraries, nor a GC should require much code to implement, relatively speaking (e.g. compared to a WYSIWYG word processor). These things are usually small. The compilers for almost every language were < 1 MB in size for the longest time.

I am not saying that Pixie being 10 MB in size is a problem. We have a lot more bandwidth and disk space nowadays, 10 MB is nothing. My point is that a "JIT, GC, and stdlib" package weighing this much cannot claim to be "small" for what it does.

haberman 17 hours ago [-]

I agree with your point completely. I just want to add that throwing out raw numbers like "10.3MB" or "400 KB" is not very precise. Binaries can vary immensely based on whether they have debug info, string tables, etc. or whether these have been stripped away.

I wrote a size profiling tool that can give much more precise measurements (like size(1) on steroids, see: https://github.com/google/bloaty). Here is output for LuaJIT?:

    $ bloaty src/luajit -n 5
         VM SIZE                     FILE SIZE
     --------------               --------------
      74.3%   323Ki .text           323Ki  73.8%
      12.5%  54.5Ki .eh_frame      54.5Ki  12.4%
       7.6%  33.2Ki .rodata        33.2Ki   7.6%
       2.2%  9.72Ki [Other]        12.9Ki   2.9%
       2.1%  9.03Ki .eh_frame_hdr  9.03Ki   2.1%
       1.2%  5.41Ki .dynsym        5.41Ki   1.2%
     100.0%   435Ki TOTAL           438Ki 100.0%

And for Pixie:

    $ bloaty pixie/pixie-vm -n 5
         VM SIZE               FILE SIZE
     --------------         --------------
      57.5%  4.39Mi .text    4.39Mi  44.7%
      33.7%  2.58Mi .data    2.58Mi  26.3%
       0.0%       0 .symtab  1.31Mi  13.4%
       0.0%       0 .strtab   978Ki   9.7%
       8.8%   688Ki [Other]   595Ki   5.9%
       0.0%       8 [None]        0   0.0%
     100.0%  7.64Mi TOTAL    9.82Mi 100.0%

In this case, neither binary had debug info. Pixie does appear to have a symbol table though, which LuaJIT? has mostly stripped.

In general, I think "VM size" is the best general number to cite when talking about binary size, since it avoids penalizing binaries for keeping around debug info or symbol tables. Symbol tables and debug info are useful; we don't want people to feel pressured to strip them just to avoid looking bad in conversations about binary size.

sabauma 15 hours ago [-]

Its worth noting that the design of the RPython JIT will always result in a large amount of static data in the resulting binary. The RPython translator basically generates a bytecode representation of most of your interpreter and bakes that into the binary. You can probably expect at least a 2x size increase in the size of your binary. As a reference point, after stripping [Pycket](https://github.com/pycket/pycket)'s binaries are 6.1Mi without the JIT and 16Mi with the JIT.

stcredzero 21 hours ago [-]

To compare, LuaJIT? is about 400 KB, and that includes the Lua standard library, a JIT almost certainly more advanced than Pixie's current one, an incremental GC, and a C FFI.

Back in the day, Smalltalk was criticized as bloated because one ended up with stripped binary+image of about 2MB. Someone around that time got SqueakVM?+image down to just under 400k. There were specialized Smalltalks in company R&D that got their image down to 45k.

oska 18 hours ago [-]

The Oberon Operating System and compiler was 131 KB. [1]

[1] http://users.cms.caltech.edu/~cs140/140a/Oberon/system_faq.h...

---

Forth for Atmel AVR8 Atmega micro controller family and some variants of the TI MSP430. "AmForth? for the AVR8 needs 8 to 12 KB Flash memory, 80 bytes EEPROM, and 200 bytes RAM for the core system. A similar code for the MSP430 fits into 8KB flash"
Forth for MSP430 microcontrollers. "fits into 11 kb of flash or fram and runs with at least 512 bytes of ram"
Mecrisp-Stellaris runs on various ARM Cortex M chips. "fits into 16 kb of flash and runs with at least 1 kb of ram"

proj-oot-lowEndTargets-embeddableLanguageImplementations

Embedded and low-end hardware survey: Embeddable language implementations

JVM and java

LuaVMs for low-end embedded systems

eLua

Python VMs for low-end embedded systems

PyMite

micropython

.NET VMs for low-end embedded systems

Fourth for low-end embedded systems

Languages just for low-end embedding

libc

lisp

lisp machines

javascript (js)

BASIC

Erlang

Contiki OS

Scheme