Bayle Shanks's website: proj-oot-lowEndTargets-lowEndTargetsUnsorted4

---

https://internalregister.github.io/2019/03/14/Homebrew-Console.html

64k program ram (of which 8k used by bootloader), z80

---

Philips Hue Bridge 2.0

Philips Zigbee IP Bridge 2.0 (2.1) Availability: now

Manuf/OEM/ODM Lite-On 324131201801

FCC approval date: 28 July 2015 (Est.) initial retail price (in USD): $60 UPC: 046677458478 (UPC DB, On eBay) EAN: 046677455286 (UPC DB, On eBay) Country of manuf.: China

Amazon image

ASIN B016H0QZ7I (Flag of the United States.svg, On Amazon, On CCC, multiple uses) multiple revisions of this device, use caution

Type: home automation switch, IoT? hub

FCC ID: O3M324131201801, 2AGBW324131201801, 2AGBW3241312018AX Industry Canada ID: 10469A-BRIDGE, 20812-2018X

Power: 5.0 VDC, 1 A Connector type: barrel CPU1: Qualcomm Atheros QCA4531 (650 MHz) FLA1: 128 MiB? (Winbond W25N01GVZEIG) RAM1: 64 MiB? (Winbond W9751G6KB-25)

Expansion IFs: none specified

WI1 chip1: Qualcomm Atheros QCA4531 WI1 802dot11 protocols: bgn WI1 MIMO config: 2x2:2 WI1 antenna connector: none

ETH chip1: Qualcomm Atheros QCA4531 LAN speed: 10/100 LAN ports: 1

bgn

Additional chips 2.4GHz ZigBee?/802.15.4 SoC?/RF Transceiver (ARM Cortex-M0+ MCU);Atmel (Microchip);ATSAMR21E;Atmel, ATSAMR21E, 18A-F;;1; 2.4GHz ZigBee?/802.15.4 Power Amplifier;Skyworks;SE2438T;;1; USB to Serial Bridge Controller;Prolific;PL2303SA;;1;

Third party firmware supported: OpenWrt?

Flags: ZigBee?, Alexa AVS

802dot11 OUI: 00:17:88 CPU1 brand WI1 chip1 brand WI1 chip2 brand Philips Hue Bridge STMicro Texas Instruments Texas Instruments Philips Hue Bridge 2.0 Qualcomm Atheros Qualcomm Atheros For a list of all currently documented Qualcomm Atheros chipsets with specifications, see Qualcomm Atheros.

Philips Hue Smart Bridge

    Product page (2.0)  • Spec.

Ethernet-controlled Lighting controller and ZigBee? bridge,

    part of the Philips Hue system.

Compatible with Amazon Alexa, Apple HomeKit? and Google Assistant. Specifications

    Philips Hue Bridge 2.0 Teardown on Reddit (images)

    ZigBee: Atmel (Microchip) ATSAMR21E18A SoC
    @48MHz ARM Cortex-M0+ (32-bit) MCU
    2.4GHz RF Transceiver, 802.15.4, ZigBee
    MAC Address: 00:17:88 (Philips Lighting BV)

    Phillips Hue Dimmers Disassembled (images)

See also

    Philips Hue Bridge - ZigBee: TI CC2530
    Philips Hue Bridge 2.0 - IC: 10469A-BRIDGE,

    FCC ID: O3M324131201801 (2015-07-28)
    FCC ID: 2AGBW324131201801 (2015-11-12)

    Philips Hue Bridge 2.1 - Lite-On, IC: 20812-2018X,

    FCC ID: 2AGBW3241312018AX (2016-07-20)

---

https://blog.adafruit.com/2016/06/14/teardown-of-a-philips-hue-led-lightbulb-with-zigbee-and-atmega2564-avr-iot-iotuesday/

---

https://www.engadget.com/2019/03/18/nvidia-jetson-nano-ai-computer/

---

rad-tolerant and rad-hard (i think these are terms without a precise defn but in this case i think the rad-hard one is a higher level of radiation tolerance/hardness):

https://www.microchip.com/wwwproducts/en/SAMV71Q21RT

https://www.microchip.com/wwwproducts/en/SAMRH71

the RH one doesn't have a datasheet but the other one has 2k flash and 384k RAM

ARM Cortex M7, 100 mhz (for the RH one)

---

"MicroPython?, a version of the Python 3.4 programming language customized to run on low-power microcontrollers with as little as 16KB of RAM."

---

https://hackaday.com/2019/02/04/openisa-launches-free-risc-v-vegaboard/

"72 MHz RISC-V RI5CY/ZERO-RISCY cores with up to 1280 KB and 384 KB of SRAM."

---

https://www.phoronix.com/scan.php?page=news_item&px=MIPS-Open-Source-2019

http://linuxgizmos.com/this-under-6-sbc-runs-linux-on-risc-v-based-c-sky-chip/ Linux 4.20~5.0 kernel support for its new C-SKY CK810 SoC? design based on its new C-SKY ISA architecture. Now, Hangzhou C-SKY has launched a development board that runs Linux on a similar CK610M SoC?. The C-SKY Linux Development Board sells for 39-40 Yuan ($5.60 to $7.05) on Taobao and $19.50 to $21.50 on AliExpress?. 64MB DDR2 RAM and 4MB SPI flash for bootloader and media player code

---

MAIX Development Boards with Sipeed M1 RISC-V AI Module Launched for $5 and Up (Crowdfunding) 8 MB general purpose SRAM including 5.9MB usable as AI SRAM memory Storage – micro SD card slot, 8MB SPI flash

---

https://vocore.io/v2.html

MEMORY 128MB, DDR2, 166MHz STORAGE 16M NOR on board, support SDXC up to 2TB MIPS

---

great awesome looking riscv risc-v fpga:

http://www.electronicdesign.com/embedded-revolution/risc-v-fpga-design-leaps-forward-mi-v

summarizing the interesting parts (for my purposes) of some of the above:

the IGLOO2-based Creative Board is a board with the Mi-V
- $100, Microchip 64Mbit serial flash (8 MB i guess), Alliance 32M x 16 bit DDR2 synchronous DRAM (SDRAM) (so is this 64 MB?)
3 popular interfaces are: PMOD, MikroBus?, Arduino Shields
2 popular standards for FPGA IP are: MISRA, QNX Neutrino
3 popular FOSS RTOSs are: FreeRTOS?, Huawei LiteOS?, Apache’s MyNewt?
"The Microsemi platforms actually have an advantage of being able to utilize the FPGA’s flash memory for code and data, enabling a self-contained platform. It also allows for a secure-boot implementation. In addition, Mi-V includes CoreBootStrap? support."
"The Mi_VRV32IMAC core can deliver 2.01 EMMBC CoreMarks?/MHz that bests the soft competition."

all of the three RISC-V cores noted have:

32-bit
compressed instruction set support

two of the 3 have:

8k i and d L1 cache (i assume this means 2 caches, each 8k?)
atomic instrution set support
multiply and divide instrution set support

and one of the 3 has:

single-precision floating-point support

---

random IOT protocol: http://mqtt.org/

---

even recent AVR lines have a multiply instruction now:

"Note that tinyAVR parts prior to the tinyAVR 1-Series are essentially completely different MCUs with a less-capable AVR core that has no multiplier."

---

https://en.wikipedia.org/wiki/MIFARE

https://www.indiegogo.com/projects/sipeed-maix-the-world-first-risc-v-64-ai-module#/

---

GAP8

https://www.cnx-software.com/2018/02/27/greenwaves-gap8-is-a-low-power-risc-v-iot-processor-optimized-for-artificial-intelligence-applications/

"Sub $15 machine vision and voice control solutions for consumer robotics"

" 1x extended RISC-V fabric controller core with 16 kB data and 4 kB instruction cache for system control 8x extended RISC-V compute cores with 64 kB shared data memory and 16 kB shared instruction cache "

"Another way to look at power consumption, is the company’s claim that the processor can classify a QVGA image every three minutes for 10 years on a small 3.6 Wh battery."

https://greenwaves-technologies.com/sdk-manuals/

" 1 + 8 high-performance cores: extended RISC-V ISA1 high performance micro-controller referred to as Fabric Controller or FC (150 MHz @ 1.0V; 250MHz @ 1.2V) 8 cores that execute in parallel for compute intensive tasks referred to as Cluster (87 MHz @ 1.0V; 170MHz @ 1.2V) Ultra low Power : maximum 25mA @ 1.0V

Memories: A level 2 Memory (512KB) for all the cores A level 1 Memory (64 KB) shared by all the cores in Cluster (0 wait state memory access) A level 1 memory (8 KB) owned by FC (0 wait state memory access) Memory Protection Unit HyperBus? Interface to connect external HyperFlash? or HyperRAM? "

https://www.cnx-software.com/2018/08/01/gapuino-gap8-risc-v-mcu-developer-kit-ai/ "GAPUINO GAP8 is a $229 RISC-V MCU Developer Kit for A.I. Applications

SoC? – GAP8 IoT? Application Processor with 8x RISC-V compute cores, 1x RISC-V fabric controller core delivering up to 200 MOPS at 1mW and >8 GOPS at a few tens of mW Memory / Storage – HyperBus? combo DRAM/Flash with 512 Mbit Flash + 64 Mbit DRAM; 256 Mbit Quad SPI flash

The design is based on RISC-V based Parallel Ultra Low Power (PULP) computing open-source platform.

https://www.cnx-software.com/2016/04/06/pulpino-open-source-risc-v-mcu-is-designed-for-iot-and-wearables/

---

good detailed 2006 slide presentation on the old Cell architecture, including a list of synchronization commands (page 18)

https://arcb.csc.ncsu.edu/~mueller/cluster/ps3/workshop/Day1_03_CourseCode_L1T1H1-10_CellArchitecture.pdf

" Synchronization Commands

Lockline (Atomic Update) Commands: getllar- DMA 128 bytes from EA to LS and set Reservation putllc- Conditionally DMA 128 bytes from LS to EA putlluc- Unconditionally DMA 128 bytes from LS to EA

barrier- all previous commands complete before subsiquent commands are started mfcsync- Results of all previous commands in Tag groupare remotely visible mfceieio- Results of all preceding Puts commands in same group visible with respect to succeeding Get commands

Command Parameters

LSA- Local Store Address (32 bit) EA- Effective Address (32 or 64 bit) TS- Transfer Size (16 bytes to 16K bytes) LS- DMA List Size (8 bytes to 16 K bytes) TG- Tag Group(5 bit) CL- Cache Management / Bandwidth Class "

data bus is 16-byte (128-bit)

" Power Processor Element (PPE):

General purpose, 64-bit RISCprocessor (PowerPC? AS 2.0.2)
2-Way hardware multithreaded•L1 : 32KB I ; 32KB D "

https://www.google.com/search?q=%22mfceieio%22+%22mfcsync%22+barrier+%22putlluc%22&oq=%22mfceieio%22+%22mfcsync%22+barrier+%22putlluc%22

---

really low-end (0.03 CPUs with 41-256 bytes of RAM and 0.5k to 2k words of flash):

https://cpldcpu-wordpress-com.cdn.ampproject.org/v/s/cpldcpu.wordpress.com/2019/08/12/the-terrible-3-cent-mcu/amp/?amp_js_v=0.1#referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Fcpldcpu.wordpress.com%2F2019%2F08%2F12%2Fthe-terrible-3-cent-mcu%2F

hardware stacks around 2 to 8 levels

---

https://www.cnx-software.com/2019/08/23/gigadevice-gd32v-risc-v-mcu-development-board/

Memory – 8KB to 32KB SRAM Storage – 16KB to 128KB flash

"Best of all, you can already buy the microcontroller on Tmall, with for example GD32VF103TBU6 (QFN36, 64KB flash) going for 9 RMB ($1.27)"

GD32VF103C-START board sells for 79 RMB, or about $11.

If you are interested in GD32VF103C-EVAL, it can be yours for 325 RMB (~$46).

https://www-cnx--software-com.cdn.ampproject.org/v/s/www.cnx-software.com/2019/08/29/longan-nano-gd32v-risc-v-development-board-comes-with-lcd-display-and-enclosure/amp/?amp_js_v=0.1#referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Fwww.cnx-software.com%2F2019%2F08%2F29%2Flongan-nano-gd32v-risc-v-development-board-comes-with-lcd-display-and-enclosure%2F

---

" The embedded systems littlefs targets are usually 32-bit microcontrollers with around 32 KiB? of RAM and 512 KiB? of ROM. These are often paired with SPI NOR flash chips with about 4 MiB? of flash storage. These devices are too small for Linux and most existing filesystems, requiring code written specifically with size in mind. "

---

https://blog.hackster.io/gigadevice-unveils-new-risc-v-based-gd32v-microcontroller-c0a2b147568b

"outfitted with 8Kb to 32Kb of RAM and 16Kb to 128Kb of Flash"

---

" The ANSI certified the Ada 83 specification in 1983; Intel’s 80286 had just been released and Motorola’s 68000 was still only four years old. It was the dawn of home computers, but it was also the awkward transition of the 1970s into the 1980s, when microcontrollers were becoming more popular. Think of the Intel 8051 and its amazing 4 kB EPROM and 128 bytes of RAM. "

---

https://fuse.wikichip.org/news/2659/ibm-introduces-next-gen-z-mainframe-the-z15-wider-cores-more-cores-more-cache-still-5-2-ghz/ section 'L1I Cache Comparison' shows L1i cache sizes of 32k-128k even on these mainframe chips

---

https://hackaday.io/project/167605-kobold-k2-risc-ttl-computer

---

Texas Instrument’s best-selling graphing calculator, the TI-84, is a woefully outdated piece of technology.

" Since its debut in 2004, its specs and components have remained virtually unchanged. With 24 kilobytes of RAM, a 96×64 pixel screen, and a power system that still relies on 4 AAA batteries, it has been usurped by hundreds of modern handheld devices. While " --- fpga

https://www-forbes-com.cdn.ampproject.org/v/s/www.forbes.com/sites/davealtavilla/2019/10/01/xilinx-unveils-vitis-disruptive-open-source-design-software-tools-for-adaptable-processing-engines/amp/?usqp=mq331AQCKAE%3D&amp_js_v=0.1#referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Fwww.forbes.com%2Fsites%2Fdavealtavilla%2F2019%2F10%2F01%2Fxilinx-unveils-vitis-disruptive-open-source-design-software-tools-for-adaptable-processing-engines%2F

---

https://gobot.io/

great list of platforms here too

---

https://www-cnx--software-com.cdn.ampproject.org/v/s/www.cnx-software.com/2019/10/04/wio-lite-risc-v-wifi-board-with-esp8266-module/amp/?usqp=mq331AQCKAE%3D&amp_js_v=0.1#referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Fwww.cnx-software.com%2F2019%2F10%2F04%2Fwio-lite-risc-v-wifi-board-with-esp8266-module%2F

Wio Lite RISC-V Wio Lite specifications:

    MCU – Gigadevice GD32VF103CBT6 RISC-V (rv32imac) microcontroller @ 108 MHz with 128KB Flash, 32KB SRAM
    Wireless Module – ESP8266 WiFi Wio Core with 802.11b/g/n/ WiFi 4 connectivity... Power Consumption – RISC-V core power consumption is only 1/3 of that of a traditional Cortex-M3. ... Wio Link RISC-V WiFi? board is up for pre-order for $6.90 with shipping scheduled to start at the end of November.

---

https://www.adafruit.com/product/3333

Circuit Playground Express ATSAMD21 ARM Cortex M0 Processor, running at 3.3V and 48MHz (ATSAMD21 datasheet shows 4/8/16/32KB SRAM Memory)

---

LoFive? R1 RISC-V SoC? evaluation kit. This new LoFive? R1 board features the latest SiFive? Freedom E310, 32-bit RV32IMAC processor, which operates up to 320 megahertz. The board also offers 16 kilobytes of RAM, 128-megabit SPI flash storage, and two 14-pin headers with JTAG, GPIO, PWM, SPI, I2C, and UART, plus power and ground

---

https://hackaday.com/2018/04/11/zephyr-adds-features-platforms-and-windows/ "Another common RTOS is FreeRTOS? which has an extra C++ class wrapper. Or, for something smaller have a look at ChibioS?."

https://hackaday.com/2016/09/22/arduino-sketch-the-next-generation/

 ChibiOS.

---

http://www.chrisfenton.com/the-zedripper-part-1/

hobby project; 16 parallel Z80 processors, each with 64k local memory

with 2k to 4k local memory (i think?)

" sitkack 22 hours ago [-]

If you are interested in Transputers you should purchase an XMOS dev kit https://en.wikipedia.org/wiki/XMOS https://www.xmos.com/

https://www.digikey.com/products/en/development-boards-kits-...

Confiks 17 hours ago [-]

Could you be a bit more specific in how the two companies and / or products are related, apart from the nameplay? I noticed that the then chief architect of Inmos also co-founded XMOS.

There seems to be a pretty active forum here: https://www.xcore.com

joshu 17 hours ago [-]

“ The name XMOS is a loose reference to Inmos. Some concepts found in XMOS technology (such as channels and threads) are part of the Transputer legacy.”

sitkack 17 hours ago [-]

They were both started by the same person, https://en.m.wikipedia.org/wiki/David_May_(computer_scientis...

The XMOS processors share many of the same architectural properties as Transputers.

...

You have to use their XC dialect, but it is mostly C anyway... Their "xc" is C with language extensions to support CSP style parallel processing.

https://en.wikipedia.org/wiki/XC_(programming_language)

the CPU currently in one of XMOS's general-purpose dev boards:

https://www.xmos.com/download/XE216-512-TQ128-Datasheet(1.16).pdf

" SRAMEach xCORE Tile integrates a single 256KB SRAM bank for both instructions anddata. All internal memory is 32 bits wide, and instructions are either 16-bit or32-bit. Byte (8-bit), half-word (16-bit) or word (32-bit) accesses are supported andare executed within one tile clock cycle "

---

 missosoup 1 day ago [-]

This won't replace HAM radio in a disaster scenario. This might provide a long-distance digital comms capability post-disaster. Even then, LoRa? throughput is so low that I don't see how it would be useful unless an entire region goes full zombie apocalypse and this network replaces... I don't know, messenger pigeons?

In a long range configuration, LoRa? has a theoretical max throughput of about 140bps (bits) and that's assuming only one device is transmitting and no packet loss.

LoRa? is built for low power, long range, low throughput sensor networks. It's not really suitable for a chatty mesh network. To get an idea of the intended use for it, take a look at LoRaWAN? limits:

An average of 30 seconds uplink time on air, per day, per device.
At most 10 downlink messages per day, including the ACKs for confirmed uplinks.
A good goal is to keep the application payload under 12 bytes, and the interval between messages at least several minutes.

---

https://www.embeddedrelated.com/showthread/comp.arch.embedded/14362-1.php

http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

the 6502 had between ~3-6k transistors. Modern (2004) 32-bit mcus have above ~50-100k gates, not including memory, with about 4 transistors per gate. "AVR core is 12,000 gates, and the megaAVR core is 20,000 gates...The ATmega128 is probably somewhere between 600k-1M transistors.".

---

STM32l0 used in https://craigjb.com/2019/11/26/gameslab-overview/

he actually used the stm32l073

https://www.st.com/en/microcontrollers-microprocessors/stm32l073rz.html has 20k RAM

https://www.st.com/content/ccc/resource/sales_and_marketing/promotional_material/brochure/a5/a6/6b/c0/0f/32/40/2e/brstm32l0.pdf/files/brstm32l0.pdf/jcr:content/translations/en.brstm32l0.pdf up to 20k RAM

https://www.digikey.com/en/product-highlight/s/stmicroelectronics/stm32-l0 8k RAM

his system also has a "giant Zynq FPGA subsystem (GZF) contains the Zynq part with 512 MB of DDR3 DRAM"

---

jacquesm 1 day ago [-]

The 6502 was an enabler if there ever was one. Yes, memory was expensive. But we used only a very small fraction of what we think of as normal today. A typical system would have 2K ram; maybe 4K and if you were rich you'd splurge for the 16K upgrade (but what would you do with so much memory?).

In comparision, the 6800 sold for a large multiple of the 6502, and Motorola wouldn't have slashed their prices if not for the 6502. Peddle & Co showed that a CPU did not have to be expensive and that was a game changer.

A whole generation of 8 bit machines depended on it: VIC-20, CBM 64, Atari, Acorn Atom, BBC Micro, Apple I, Apple II.

None of those would have happened without an affordable CPU to power them.

Very few chips can claim such a distinguished legacy, and without showing the market potential for this 'personal computer' thing IBM might not have entered the game at all. Visicalc (which required 32K!) was the breakthrough moment.

---

 lazyjones 9 hours ago [-]

The author guessed a few things wrong:

fmt.Println pulling in 300KB isn't proof that Go's standard library isn't "well modularized". It's the wonders of Unicode and other code that is actually used.
900K for the runtime isn't surprising when you have complex garbage collection and goroutine scheduling among other things

---

STM32F407 used in http://cliffle.com/blog/introducing-glitch/

i think it has 192k ram, 512k flash. Cortex M4 core (with floating point unit) running at 168 MHz.

---

https://www.academia.edu/26447075/A_real-time_virtual_machine_implementation_for_small_microcontrollers

"These small microcontrollers are usually8- or 16-bit devices with about 128–256K of ﬂash memory,and 128K of internal or external RAM while their perfor-mance usually ranges between 1 and 20 DMIPS"

" At the lowestend of the spectrum there are devices with code and dataspace measured in hundreds of bytes and often a perfor-mance rating below 1 DMIPS. These microcontrollers areused in very special devices that typically perform one task using a simple state machine. They are hereby designatedas “tiny” microcontrollers. Smartcard processors also gen-erally fall into this category. Smartcards can be low powerdevices such as the STMicroelectronics ST23YL80 whichhas a large code capacity in ROM but very little RAM [48].Smartcardscanalsobeveryhighpowereddevices.Althoughthe SC300 ARM Cortex M3 that is also found in smart-cards has a powerful processor and a large code space, itcan address only 30K of RAM and is thus listed in the tinycategory [47].At the other end of the embedded microcontroller spec-trum are products like the Texas Instruments Stellaris ver-sion of the ARM Cortex-M3 [49] which is rated at about100 DMIPS and can address large amounts of memory. Assuch,itmightbeconsidereda“large”microcontroller.Thesehigh end microcontrollers are commonly found in medicalequipment and industrial control systems.In the middle is a selection of “small” microcontrollers.These are commonly 8- or 16-bit devices with about 128–256K of ﬂash memory. They usually have a fair amountof internal RAM such as 64K or 128K, and they often canaddress a large amount of RAM externally. Their perfor-mance is generally rated between 1 and 20 DMIPS. Mostmicrocontrollers in the tiny and small range also have modi-ﬁed-Harvard architectures. "

---

https://www.academia.edu/26447075/A_real-time_virtual_machine_implementation_for_small_microcontrollers

" Examples of limited virtual machines on smallmicrocontrollersThere are some examples where limited virtual machineshave been made to execute on tiny and small microcon-trollers. In each case significant compromises were madeto the VM so that it would ﬁt in the limited space avail-able. Researchers at Berkeley developed the Mate’ VM thatruns on tiny sensor node platforms with only 8K to 128Kof instruction memory and 512 bytes to 4K of RAM [25].PICOBIT is a VM that runs the dynamic language Schemeon a Microchip PIC18 microcontroller platform [1,46]. There have also been many implementations of Java onsmall platforms. In each case, features were removed fromthe Java VM and many times post-processing of the exe-cutable was required to remove unnecessary functionalityresulting in proprietary Java implementations. One of theseis the Darjeeling virtual machine [7]. Similarly, TakaTukapost? processes the Java instructions to remove unused partsoftheinterpreterandtooptimizetheapplicationforeachspe-ciﬁc application, making it incompatible with standard Java[3]. Another implementation which is also a small subset of Java is NanoVM? [20,42]. This implementation is a greatly reduced Java subset to ﬁt on an AVR8 which has only 8K of ﬂash code memory. The primary purpose is to providethe capability to control a robot, which is controlled by themicrocontroller, and has support for I/O, string processing,andlowlevelcontrol.ParticleVMisanotherJavasubsetforasmallPICprocessor?[44].TheAmbiCompvirtualmachineisan? implementation that requires post-processing [14]. Sim- ilarly, Jelatine is a J2ME-CDLC Java implementation thatrequires post-processing [2]. The Squawk JVM was devel-oped to overcome the size and resource problems of embed-dedtargets[45].Javaclassesarepost-processedinto“suites” which compacts the byte codes and pre-links the classes forposition-independentexecutionfromread-onlymemory.Thefootprint of the interpreter without the extra library featureswas reduced to 149K bytes. "

" Java SE 6 68 MB Cited in literature [38]Java SE Embedded 6 29.5 MB Cited in literature [38]Java J2ME—CDLC? 1.1 128 KB Cited in literature [51]Java Card 3 256 KB Cited in literature [39].NET Compact Framework 2.0 5.5 MB Cited in literature [32].NET CLI (Mono) 1.1 4 MB Cited in literature [37].NET Micro Framework 4.0 200 KB Cited in literature [30]Squawk (Java derivative) 1.1 149K no libraries Cited in literature [45]Dis (Inferno) 4.0 311 KB Estimate, SLOC

28915 [13]Parrot 2.2.0 322 KB Estimate, SLOC

30019 [40]LLVM 2.6 1,336 KB Estimate, SLOC

124458 [50]LUA 5.1.4 109 KB Estimate, SLOC

10152 [28]eLUA 0.7 219 KB (includes BSP) Estimate, SLOC

20386 [16]Squirrel 3 106 KB Estimate, SLOC

9866 [12]PICOBIT (Scheme) N/A 15.6 KB Cited in literature [46]P-Code (Standard Pascal) P5 28 KB Estimate, SLOC

2602 [36]M-Code (Modula-2) M2 16 KB Measured from DOS exe [ "

" Choice of VM Dozens of VMs have been developed in recent years witha broad range of characteristics and there are a number of criteria for deciding on a VM to use. A major issue is ﬁndinga VM that will ﬁt into the memory footprint of the targetdevice. For small embedded platforms the target footprintturns out to be the most important factor because most VMsare far too large. If the target platform has 128K availablefor code, it is reasonable to expect that the VM should onlyoccupy a fraction of that space. This constraint is becauseroom is needed for underlying board support code as wellas a significant amount of room for the byte codes of the application itself. Some platforms have external storage thatcould be used to store byte codes and then the applicationcould be loaded at boot up into RAM. But this cannot beassumed for many small platforms where the code space inﬂash memory is all that is available. A secondary goal wasto ﬁnd a candidate VM that could be used on the broadestrange of small of microcontrollers, such as those with evenless code space available than 128K.An analysis of some existing candidate virtual machineswas made to determine the target code footprint size. SomeVMs? advertise an actual footprint size while others do notsupply this information. The method of estimating theexpectedfootprintwastocompileanexistingvirtualmachineusing the target platform compiler to evaluate how manybytes of object code were generated per source line of code.This was found to be between 7 and 14 bytes of code perline depending on the density of macros with about 11 bytesbeingcommon.Thusforthevirtualmachineswherenofoot-print size was cited, the SLOC was measured and multipliedby 11 bytes to obtain a rough order of magnitude of the sizeof the virtual machine if it were ported to the target platform.Many popular application VMs and their estimated targetfootprint are listed in Table 2.Whenthesmallembeddedtargetisdeﬁnedtohave128KBandatmost256Kofcodespace,mostexistingVMsareelim-inated. With the additional goal of ﬁnding a general purposeVM that can run on a tiny platform, even more are elimi-nated. The Java 2 Micro Edition (J2ME) Connected DeviceLimitedConﬁguration?(CDLC)isadvertisedtorequireamin-imum of 128K ROM and 32K RAM to contain the bare VM[51]. It recommends 512K total memory available for the VM. On the surface this seems like it might work. But whenone considers that there needs to be room for an underly-ing real-time native system, even this candidate exceeds thelimits of a small embedded microcontroller platform. Simi-larly, .NET Micro Framework requires a minimum of 200Kof code space and has been ported to run on many mid-rangeembedded platforms in such implementations as TinyCLR?[17]. But in the small microcontroller with ≤ 256K of codespace there is very little room left for application space. TheSquawk? VM advertises that the interpreter only needs 149Kof space if the libraries are excluded but this also exceeds thefootprint size [45]. FromtheabovelistonlyLUA?,eLUA,Squirrel,PICOBIT,Modula-2,andPascalvariantsareleftaspossiblecandidates.Eliminating the dynamic scripting languages leaves only thevenerable Pascal and Modula-2 virtual machines. After sig-nificant effort, Modula-2 was also eliminated based on thelackofpopularsupportforthelanguage,andadearthoftoolsavailable that could actually compile the virtual machine. Anumber of lesser known virtual machines might also be con-sidered,manyofthemfrompriorresearchefforts[10].These generallyarelimitedbythedemonstrationlanguageprovidedwith them and are not ready for general use. It was not thepurpose of this research to exhaust all possible candidates,but rather to ﬁnd a representative VM that meets minimumcriteria and evaluate its performance on the selected target.Basedonopensourcetoolavailabilityandlanguagecapa-bilities, for this research the Standard Pascal P5 VM wasselected. Despite its age, Pascal has been maintained andupdated over the years and exists in a modern form withISO standardization and a test suite to insure compliance "

" .1 Optimizing the virtual machineOptimizations to speed up the execution of the VM weremade incrementally, and with each one the Dhrystone testwas rerun. The results are added to Table 4 which shows theincremental improvement measurements.The ﬁrst optimization was to convert some critical codeto be inline, trading code size for speed of execution. ThePascal? VM instruction set has very few virtual registers. Theprimary ones are OP, P, and Q. A secondary Q, called Q1was also added to the instruction set for array operations.The original interpreter code made a function call to obtaineachofthesevaluesoutofthebytecodesforeachinstruction that used the registers. The optimization was to replace eachofthesecallswithinlinecodeasshowninFig.6.Thissimple change resulted in about a 10 % speed improvement but costover 20 % more code space.The next optimization involved the routines that get andput addresses on the stack. These two routines are called fre-quentlyintheinterpreterandtheoriginalPascalcodethatwastranslatedintoCwasnotwrittentotakeadvantageofimplicitcompiler optimization. The optimization as shown in Fig. 7resulted in nearly an additional 14 % speed improvementwith no additional code space required. The improvement islargely because the compiler is able to use the native instruc-tion set to build and use pointer values more efﬁciently thancan be done manually in C.Theroutinesthatgetandputinteger,Boolean,andcharac-tervariableswerealsosubjecttooptimization.Thetechniquefor optimizing these routines is similar to the address get andput routines. As shown in Table 4, this optimization resulted innearly18%morespeedimprovementandactuallyresultedin slightly smaller code.An optimization that failed was an attempt to speed upthe main case statement that processes the virtual opcodes.Theoverheadofmakingasubroutinecallforeachinstructionturned out to cost more than the switch statement. As shownin Table 4 this resulted in 38 % slower operation. As a resultthe jump table was taken back out of the code and the switchstatement was restored

...

Research has shown that even a highly optimized VMcan run magnitudes slower than its native counterpart [24].NanoVM? ran an average of 191 times slower than the nativeequivalent [20]. The Pascal VM for this research ran 187 times slower after the optimizations were made. The onlyway for a VM to be effective is if it adds value to the plat-form. Besides security and robustness, the primary advanta-gesarecodecompactionandhighlevelabstractionthroughapowerful instruction set. ...

5.2 Using virtual machine abstraction to supportconcurrency

One abstract extension to a VM that was found to providegreatoverallvaluetothesystemisnonpreemptive,user-levelconcurrency. As shown in Fig.11, a VM that maintains itsown concurrency has advantages over a real-time operatingsystem.SwitchingcontextsinaVMcanbedoneinonevirtualinstruction? whereas a native RTOS-driven system requiressaving and restoring contexts, which may involve dozens of variables and the saving of a task state. Also, a concurrentVM is more controlled than a real-time operating systemtask switch, which can occur at almost any time requiringcritical regions of code to protect against race conditionsand errors. ...

.2.1 Implementing a task instruction to support concurrency New instructions were added to the Pascal Virtual Machineto support multitasking and message passing as high levelabstract instructions. A new task instruction was added tothe interpreter to support multitasking along with the neces-sary Pascal language extension as follows:

status := task(function); function: TASKINIT=0: status: 1=parent, 2=child TASKPEND=1: status: is new task ID TASKWHO=2: status: is my task ID TASKKILL=3: No status

With this new special function call, new tasks could becreated using the subfunction TASKINIT which performs asimilar function as the UNIX fork() command within a sin-glePCodeinstruction.ModelingtaskcreationaftertheUNIXforkcommandwasamatterofconvenienceforthisresearch?,although itdoes have some advantages.

...

Theinterpreter was modiﬁed to support up to ﬁve tasks (an arbi-trary choice that depends on the memory available). When anew task is created, an exact duplicate of the working stack of the parent task is created and a new entry is made in aTaskState table for the new child task. The return value fromthis function call indicates whether the currently executingtask is still the parent task (1) or the child task (0) that is anexact copy.Thereturnvaluecanthenbeusedtocauseeachofthetasksto follow their own execution path depending on whether itis the parent or the child task. As shown in Fig. 12, when a new task is created only four values need to be rememberedto save and restore the task; namely the PC, SP, MP, andEP registers. PC is the program counter, which is saved toremember what instruction this task was executing when itwas suspended so that it can pick up where it left off whenit starts running again. SP is the stack pointer which indi-cates where the current top of the stack is for this task. Sinceeach new task has its own copy of the stack, each task mustremember where that is in memory for its own stack. MP isthe mark pointer which indicates where the base of the cur-rentstackframeis.EPistheextremepointerwhichindicatesthetopofthecurrentstackframesothattheVMknowswhereto? put the next stack frame if another procedure or functionis called from the current context. There is no need to saveany working registers because the PMachine is stack basedand nothing else is stored in registers between instructionsbesideswhatisonthestack,makingtaskswitchingrelativelyfast and efﬁcient.Another subfunction, TASKPEND, is to allow the cur-rentlyexecutingtasktovoluntarilygiveupthevirtualproces-sorandallowadifferenttasktorun. ...

TASKKILL can becalledfor a task to permanently kill itself and allow other tasks torun instead

5.3 Implementing a mail instruction to support concurrencyA second instruction was implemented to support messagepassing between tasks in the VM. The mail() instruction wascreated to handle this functionality:

Status := mail(function, parameter1, parameter2, parameter3); Function: MAILSEND=0 parameter1: destination task ID parameter2: length of message to send parameter3: buffer address of message to send

   MAILRCV=1
     parameter1: my task ID
     parameter2: size of message buffer for receiving
     parameter3: buffer address where to put received message

... In this implementation, a callby a task to receive mail will cause the task to pend if nomail is available. The task then becomes ready to run againwhen mail arrives. The task that sent the mail continues torun until it voluntarily gives up the processor or pends onits own mailbox. This implementation was chosen becausethe native operating system used in the timing comparisonperforms this way allowing the timing comparison to be fair.Another faster implementation might be to allow the receiv-ing task to wake up immediately which would make the sys-tem appear to switch tasks faster but is not commonly howembedded operating systems work.

...

Others showed that VMs are capable of doing things thatnative machine hardware is not effective at doing by imple-menting VMs that do not emulate real machines. The Esterel and the Harrison VMs are good examples of this [10,41]. Additionally,other researchshowed thatabstractionisawayto increase performance in a VM as shown by the Carbayo-nia and Hume VMs [18,19].

10. Craig ID (2005) Virtual Machines. Springer, New York 41. Plummer B, Khajanchi M, Edwards SA (2006) An Esterel VirtualMachine? for Embedded Systems. In: Proceedings of synchronouslanguages, applications, and programming (SLAP 2006) 18. Gutierrez DA, Soler FO (2008) Applying lightweight ﬂexible vir-tual machines to extensible embedded systems. In: Proceedings of the1stworkshoponisolationandintegrationinembeddedsystems.ACM, Glasgow, pp 23–28 19. Hammond K, Michaelson G (2003) Hume: a domain-speciﬁc lan-guage for real-time embedded systems. In: Proceedings of the 2ndinternational conference on generative programming and compo-nent engineering. Springer, Erfurt, pp 37–56 "

---

https://blog.benjojo.co.uk/post/why-is-ethernet-mtu-1500

---

https://spectrum.ieee.org/geek-life/hands-on/build-this-8bit-home-computer-with-just-5-chips

The result was the Amethyst. Just like a classic home computer, it has an integrated keyboard and can generate audio and video. It also has a built-in high-level programming language for users to write their own programs. And it uses just six chips—an ATMEGA1284P CPU, a USB interface, and four simple integrated circuits.

The ATMEGA1284P (or 1284P), introduced around 2008, has 128 kilobytes of flash memory for program storage and 16 kB of RAM. It can run at up to 20 megahertz, comes with built-in serial-interface controllers, and has 32 digital input/output pins.

 Compact Computer: The Amethyst is a single-board computer. It uses just six integrated circuits—a CPU, USB interface, and four 7400 chips, which are used to make 215-color video graphics possible. Keyboard switches are soldered directly to the board, which also supports audio and four serial I/O connections for peripherals like game controllers or storage devices. A built-in Forth virtual machine provides a programming environment.

Consequently, I needed a lightweight programming environment for users, which led me to choose Forth over the traditional Basic. Forth is an old language for embedded systems, and it has the nice feature of being both interactive and capable of efficient compilation of code. You can do a lot in a very small amount of space. Because the 1284P does not allow compiled machine code to be executed directly from its RAM, a user’s code is instead compiled to an intermediate bytecode. This bytecode is then fed as data to a virtual machine running from the 1284P’s flash memory. The virtual machine’s code was written in assembly code and hand-tuned to make it as fast as possible.

---

glouwbug 5 hours ago [-]

And for those wondering how much power resides in an 8 bit 20 MHZ AVR with 3 or 4 colors and sound over RCA checkout Craft by IFT:

https://youtu.be/sNCqrylNY-0

It is as good as domain specific mastery gets

fortran77 3 hours ago [-]

Try to reproduce that with a modern stack, and you'd need hundreds of megabytes of browser engines, npm libraries, etc. and a GHz clock speed.

---

https://blog.exodusorbitals.com/2020/01/28/splice-the-programming-language-for-extraterrestrial-applications/

---

"The entire Turbo Pascal 3.02 executable--the compiler and IDE--was 39,731 bytes."

geophile 1 day ago [-]

Turbo Pascal was an IDE and a blazingly fast compiler. In 1986, I ran it on my 64k IBM PC (8088 4MHz). It was like nothing else I had experienced previously. Prior to that, I had developed software in grad school on punch cards. I occasionally had access to a PDP-11. I don't remember what editor I used, but it certainly wasn't an IDE. It may have been line-oriented. At work, we had a VMS system, and I had access to a C compiler and my first emacs. Not too bad, but nothing as immediate and responsive as Turbo Pascal.

Viewed another way, Turbo Pascal recaptured the incredibly rapid edit/run cycle that I first experienced with a PDP-8M running BASIC. Only now, I had 64k instead of my shared of 12k shared among a maximum of four users, an IDE instead of a line editor, and a far better language.

Turbo Pascal also blew away anything else available on the PC. I had a C compiler from Microsoft that was far more expensive and far slower.

Turbo Pascal was a truly magical piece of software. I remember that I had some initial skepticism about it. As I recall, it took some liberties with the language. But once I finally used it, I was instantly converted.

IMHO, nothing has quite captured that experience of development until IntelliJ?, starting with 4.x or 5.x.

" The minified version of jquery 1.6 (90,518 bytes).

...

zlib.h in the Mac OS X Lion SDK (80,504 bytes).

The touch command under OS X Lion (44,016 bytes). "

---

https://copetti.org/projects/consoles/playstation/

" 4 KB instruction cache and 1 KB of data cache (the original CoreWare? CW33300 contained 2 KB of data cache): The data cache is actually Scratchpad RAM, meaning that it can have other uses apart from behaving as L1 cache.

Like other MIPS R3000-based CPUs, it supported configurations with up to four coprocessors, Sony customised it with two:

    System Control Coprocessor or ‘COP0’: A MMU that provides virtual memory by using a Translation Lookaside Buffer or ‘TLB’. Not all the features that come with virtual memory are available though, for example, memory protection is disabled since games are programmed for bare-metal (without running from an OS). On the other hand, this MMU still gives interrupt support, exception handling and breakpoints, these are used for debugging."

louthy 5 hours ago [-]

I remember developing for the PS back in 1996-1999. The first title I worked on I was building the graphics engine and animation systems. Originally in C, then in MIPS assembler to get as much perf as possible: with a fixed target the difference for your title would be mostly down to the performance of the graphics engine.

I got to the point where I'd fitted the entire graphics engine and animation system into 4K, so it would fit in the instruction cache, and moved as much regularly used data into the 1K scratchpad as I could fit (Yes, an L1 cache that you decided manually what to put in it!). Access to the scratchpad would take 1 cycle.

---

https://technical.city/en/cpu/Core-i7-6650U-vs-Ryzen-9-3900X

note that the better desktop chip has only 96k l1 per core; i suspect the 6650u has 64k l1 per core. Just more evidence for my assertion that you should try to squeeze stuff into 64k whenever possible

Intel Core i7-6650U vs AMD Ryzen 9 3900X Number of cores 2 12 Place in performance rating 904 16 Type Laptop Desktop Release date 1 September 2015 7 July 2019 Maximum frequency 3.4 GHz 4.6 GHz L1 cache 128 KB 96K (per core) L2 cache 512 KB 512K (per core) L3 cache 4 MB 64 MB

---

"Specifically for Wi-Fi, LWIP is -the- stack to use for embedded TCP/IP. It is under 30K with full TCP/IP and offers everything you need. Then there is COAP or MQTT, which are miniscule compared to HTTP. mbedTLS adds another 80K for full TLS compatibility on 40MHz processors. On top of this every major wifi vendor has an SDK that fits nicely into LWIP (some even ship it), some are small (CC5000) some are larger (like the WF200), but the APIs are pretty straightforward. Now, leaving TCP/IP and 802.11 will add headaches (BLE, ZigBee?, Zwave) due to additional learning curves, but only because most people in embedded networking already understand TCP/IP and 802.11 phy."

---

https://rc2014.co.uk/

" RC2014 is a simple 8 bit Z80 based modular computer originally built to run Microsoft BASIC. It is inspired by the home built computers of the late 70s and computer revolution of the early 80s. It is not a clone of anything specific, but there are suggestions of the ZX81, UK101, S100, Superboard II and Apple I in here. It nominally has 8K ROM, 32K RAM, runs at 7.3728MHz and communicates over serial at 115,200 baud. ... Development of the RC2014 has lead to a more powerful machine with pageable ROM, 64k RAM, compact flash storage and a whole range of expansion peripherals. With the right modules, it’s now possible to run CP/M, which opens the RC2014 up to a wide range of software. "

https://groups.google.com/forum/#!forum/rc2014-z80

---

" The Z80 asm version of Collapse OS self-hosts on a RC2014 with a 5K shell on ROM, a 5K assembler binary loaded in RAM from SD card (but that could be in ROM, that's why I count it as ROM in my project's feature highlights) and 8K of RAM. That is, it can assemble itself from source within those resources. ... a Forth Collapse OS achieves self-hosting with as much resources than its Z80 counterpart. "

---

onetom 1 day ago [–]

I'm surprised that no one has mentioned https://flashforth.com/

There is no intriguing backstory for it, like for CollapseOS?, but it's a ~6 kiloword, practical 4th environment for Microchip PIC microcontrollers, which are a lot simpler than Z80, btw... The source code is trivial to understand too. My father is still using it daily to replace/substitute Caterpillar machine electronics or build custom instruments for biological research projects.

We started with Mary(Forth) back then, when the first, very constrained PIC models came out, with 8 deep stack and ~200 bytes of RAM. Later we used the https://rfc1149.net/devel/picforth.html compiler for those, which doesn't provide an interactive environment.

I made a MIDI "flute" with that for example, which was fabricated from sawing out a row of keys from a keyboard and used a pen house as a blow pipe and a bent razor with a photo-gate as the blow-pressure detector...

There are more minimal Forth OSes, which might be more accessible than a Z80-based one.

I would think those are more convenient for learning, how can you have video, keyboard and disk IO, an interactive REPL and compiler in less than 10KB

I remember, I played a lot with https://wiki.c2.com/?EnthForth

But if you really want to see something mind-bending, then you should study Moore's ColorForth?! I found it completely unusable, BUT I've learnt immense amount of stuff from it: https://colorforth.github.io/

There are more usable variants of it, btw. Also worth looking into Low Fat computing: http://www.ultratechnology.com/lowfat.htm I think it's still relevant today.

---

" I have always said that I will believe that 'C' is as small as Forth when I see a 1K source and object 'C' compiler. Both Chuck and I have done Forths with 1K of object code and Chuck has done Color Forth with with less than 1K of source. The source I was using was about 25K and only about 1000 times smaller than the source to a 'C' like GCC. "

---

https://colorforth.github.io/cf.htm

"Compact! 2K bytes for core software."

---

camelforth for z80

"About 6K of PROM and 640 bytes of RAM are used by CamelForth?, plus whatever additional PROM and RAM is needed by your program"

---

https://github.com/gioblu/PJON/blob/master/src/strategies/SoftwareBitBang/specification/PJDL-specification-v4.1.md

transmits at about 2kB-4kB/sec

compare to neurons, which have a refractory period of about 2ms:

2048 bytes/sec / 500 = 4 bytes per 2 ms

of course it's hard to directly say what one spike every 2 ms means; are we talking phase-coding or frequency-coding? If phase-coding, then we need to know the phase precision.

but 4 bytes is a 32-bit quantity, which seems like comfortably more than 1 spike every 2 ms will get you

---

" Notice anything odd in the previous list? The 6502 Forth -- a 16-bit model -- uses 8-bit stack pointers!

It is possible to make PSP, RSP, and UP smaller than the cell size of the Forth. This is because the stacks and user area are both relatively small areas of memory. Each stack may be as small as 64 cells in length, and the user area rarely exceeds 128 cells. ... (ANS Forth specifies a minimum of 32 cells of Parameter Stack and 24 cells of Return Stack; I prefer 64 cells of each.) " -- https://www.bradrodriguez.com/papers/moving1.htm

---

" Oftentimes the very limitations of a technological device are what allow for its expressive capacities to surface. Antonić’s microcomputer contained only 4K bytes of program memory—a veritable drop-in-the-bucket compared to any laptop today. Owing to this restriction, the system could only display three splendidly playful one-word error messages: users received a “WHAT?” if their BASIC code had a syntax error, a “HOW?” if their requested input was unrecognizable, and a “SORRY” if the machine exceeded its memory capacity. The 4K EPROM—erasable programmable read-only memory—was packed so tight that some bytes had to be used for multiple purposes; through this hack, Antonić says, his firmware now stands as proof that it is possible to use more than 100% of program memory. " -- https://tribunemag.co.uk/2020/07/make-your-own-self-managed-socialist-microcomputer

---

" About 25 years ago, an interactive text editor could be designed with as little as 8,000 bytes of storage. (Modern program editors request 100 times that much!) An operating system had to manage with 8,000 bytes, and a compiler had to fit into 32 Kbytes, whereas their modern descendants require megabytes. Has all this inflated software become any faster? On the contrary. Were it not for a thousand times faster hardware, modern software would be utterly unusable. -- Niklaus Wirth – A Plea for Lean Software"

---

"MicroPython? is packed full of advanced features such as an interactive prompt, arbitrary precision integers, closures, list comprehension, generators, exception handling and more. Yet it is compact enough to fit and run within just 256k of code space and 16k of RAM. "

---

albertzeyer 12 hours ago [–]

What is lightweight for you?

I bundled the official Python (CPython 2.7, that was some years ago; but I don't think CPython 3 would be so much different) in an application (http://albertz.github.io/music-player/), and the whole MacOSX? app zip file ended up at 11 MB. This includes my whole app + whole CPython 2.7 (including all stdlib) + extra Python libs (all the ObjC?/Cocoa bindings, which I use for the GUI). I find this is still pretty ok.

---

" While MicroPython? was designed to be "bare iron", it wasn't overly difficult to port it to run as a separate task inside the XBee firmware, and connect it up with internals for I/O and file system access. These are ARM Cortex products with firmware sizes in the area of 500KB to 700KB. "

---

teddyh 12 hours ago [–]

There also exists a Python-like language for when you only have a few kb of memory, and can’t even run MicroPython?. It’s called “Snek”:

https://sneklang.org/

cozzyd 11 hours ago [–]

There was a good LWN article about Snek a while ago: https://lwn.net/Articles/810201/

But, my impression is that Snek is mostly for "learning" while MicroPython? is intended to do "real work" (obviously those are not disjoint categories).

SV_BubbleTime? 6 hours ago [–]

That’s cool. Hadn’t seen that one.

What is a “few kb”? Because last I looked Micropython was like 256kB and the smallest JS interpreter that had foreign function interface was 53kB.

teddyh 2 hours ago [–]

> What is a “few kb”?

From the manual:

> […] could fit on our existing Arduino Duemilanove compatible hardware. This machine has:

> 32kB of Flash

> 2kB of RAM

[…]

---

" Processor (SAMD21G18A): ARM Cortex M0 with 256 kB of flash and 32 kB of RAM. The same chip found in many Arduino boards, including some from Adafruit. " -- https://www.crowdsupply.com/keith-packard/snekboard https://sneklang.org/snekboard/

---

" RTOS memory footprints range in size from as little as 8K up to 2MB — that is, about 8192 to 2,097,152* bytes, or equivalent to approximately 3 to 700 pages of plain printed text. Whether this is small or not depends on your perspective. A modern PC’s RAM is thousands of times larger, but the Sengled model E11-G13 smart lightbulb has only 12K of RAM. ..

Just as robot-control is different from WiFi?-routing, so there are different RTOSes built with features and design compromises to suit each ecosystem. There is no free lunch when coding, every instruction and variable consumes RAM. For example, the printf (print formatted) function from the standard library of the popular C programming language, alone is 8K. For comparison, Zephyr is a small open source RTOS, its entire minimum configuration is 8K. For that you get threading, interrupts and memory allocation, but if you need Bluetooth communication, that doubles the footprint to 16K. This is perfect for tiny Internet of Things (IoT?) devices that Zephyr is aimed at. General purpose RTOSes such as Integrity, LynxOS?, QNX and VxWorks? are larger. By comparison, the default configuration of LynxOS?-178® is 1.4MB. For this you get a POSIX RTOS with thread and process support, floating point, a filesystem, USB, networking, optional bash shell, and of course printf.

In short, how big your RTOS should be depends on your requirements. Expect a general purpose RTOS with lots of features to be about 1.5MB, whereas a minimal specialist RTOS like Zephyr would be around 16KB. Yes, RTOSes can be tiny, but this is not necessarily better; each RTOS is built as small as possible with the features it needs to satisfy its intended purpose. The inclusion—or not—of these features has a far bigger impact on RTOS size than clever optimizations, for e.g. "

---

https://www.copetti.org/projects/consoles/nintendo-ds/

32 KB of WRAM (Work RAM) using a 32-bit bus: To hold fast data shared between the ARM7 and ARM9.

    Bear in mind that only one CPU can access the same address at a time.

64 KB of WRAM using a 32-bit bus: For fast data as well, but only accessible from the ARM7, like the GBA had. 4 MB of PSRAM using a 16-bit bus: A slower type, available from either CPU and it’s controlled by a memory interface unit.

Entry point

At some point, the ARM7 and ARM9 will need to initialise the hardware and to do this, NTR-CPU includes two different small ROM chips:

    A 4 KB BIOS connected to the ARM9’s bus.
    A 16 KB BIOS connected to the ARM7’s bus.

---

a neuron's refractory period is about 2 ms. That's two 1000ths of a second. If we're trying to model things, we'd better be at least twice as fast as the thing we're modeling. Which means we need our processors to execute instructions at at least 1 kilohertz per second. Probably more, because we need to do a bunch of calculations in between each spike. But i can't imagine we'd even need 1 mhz (assuming we had 1 processor per neuron).

so, for my ideal massively parallel machine, we probably want:

about 2^16 processors
each processor running very slowly (i think even lowly PIC processors run at like 75 MhZ?, we need less than that)
each processor running async (no shared clock signal) and each processor can individually 'go dark' and not consume any power

2^16 is much less than the number of neurons in the human brain. But presumably our processors won't quite be as slow as neurons, so maybe they can simulate whole neuronal assemblies at once (although the necessary amount of incoming/outgoing routes/traffic may make this infeasible; if there are 1000 neurons and each neuron has 1000 dendritic synapses, half of which are from outside the assembly... that's 500000 synapses right there (even if there is overlap in the presynaptic neurons, so that set of presynaptic neurons overlaps is smaller, it's probably still pretty big)

---

Some digital pregnancy tests use Holtek HT48C06: "That's an 8-bit microcontroller. 64 bytes of RAM, 1024 words of ROM, 13 GPIO pins, running at either 4mhz or 8mhz. Given that they're only doing a 3v battery here and I don't see any voltage conversion circuitry, I think it's running at 4mhz." -- https://twitter.com/Foone/status/1301711498289405952

---

CUDA:

"L1 transactions are 128 bytes, and L2 and texture transactions are 32 bytes" -- https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/memorystatisticscaches.htm

"Note: GPUs typically do not have an L3 cache and the typical current L1 and L2 caches are rather small. For example, on the Pascal P100, the L1 cache is 24KB per SM and L2 cache is 4 MB per GPU ... Constant memory is also a read-only cache for global device memory. Constant memory is allocated by host on device. It is not particularly large - for example, it has 64KB on Pascal P100 GPU. Constant memory is cached to constant cache and it can be accessed by each thread. Constant memory is fast as long as threads in the same half-warp read the same address, otherwise accesses are serialized. Constant memory is managed by the compiler.

Shared memory is shared by all threads in a threadblock. The maximum size is 64KB per SM but only 48KB can be assigned to a single block of threads (on Pascal-class GPU). Again: shared memory can be accessed by all threads in the same block. Shared memory is explicitly managed by the programmer but it is allocated by device on device.

...

Register memory is the fastest GPU memory. Each SM has its own register file - it is not cached. While every thread can use up to 255 registers (on Pascal), the size of register file is 256K per SM, and the total size is 14,336 KB. ...

threads are executed in warps of 32 threads each " -- https://www.paranumal.com/single-post/2018/02/26/Basic-GPU-optimization-strategies

"CUDA Streaming Multiprocessor executes threads in warps (32 threads) There is a maximum of 1024 threads per block (for our GPU)" -- http://www.bu.edu/tech/files/2013/10/CUDA_1.pptx

Duktape is the (full, not subset) JS interpreter used by TIC-80.

" Code and RAM footprint

For a "Hello world" example: Config Code footprint (kB) Startup RAM (kB) thumb default 148 78 thumb lowmem 96 27 thumb full lowmem 119 1.5 x86 default 187 78 x86 lowmem 124 27 x86 full lowmem 148 1.5 " -- [1]

---

here's something cheap enough that you could actually have 64k of them for a reasonable price (if assembly and interconnection and everything else were free!):

https://jaycarlson.net/2019/09/06/whats-up-with-these-3-cent-microcontrollers/

however, this is 8-bit stuff, folks. 4k ROM, 256 bytes of RAM. 4 registers: 1 PC, 1 flag, 1 stack pointer, 1 accumulator. Not sure if even a Boot interpreter would fit in ROM, certaintly not in RAM. 4k of ROM is only 1k 32-bit words, and 256 bytes of RAM is only 64 32-bit words.

64*64k = 4194304 = 4MiB? words of RAM total across all 64k processors (before accounting for RAM taken up by the Boot implementation).

---

http://micropython.org/

" MicroPython? is packed full of advanced features such as an interactive prompt, arbitrary precision integers, closures, list comprehension, generators, exception handling and more. Yet it is compact enough to fit and run within just 256k of code space and 16k of RAM. "

"The pyboard is the official MicroPython? microcontroller board with full support for software features. The hardware has:

    STM32F405RG microcontroller
    168 MHz Cortex M4 CPU with hardware floating point
    1024KiB flash ROM and 192KiB RAM
    "

---

https://www.bunniestudios.com/blog/?p=5921 Introducing Precursor https://www.crowdsupply.com/sutajio-kosagi/precursor

    User-customizable CPUs
        Xilinx XC7S50 primary System on Chip (SoC) FPGA
            -L1 speed grade for longer battery life
            Tested with 100 MHz VexRISC-V, RV32IMAC + MMU, 4k L1 I/D cache
        iCE40UP5K secondary Embedded Controller (EC) FPGA
            Manages power, standby, and charging functions
            Tested with 18 MHz VexRISC-V, RV32I, no cache
    16 MB external SRAM
    128 MB FLASH
        100 MHz DDR 8-bit wide bus for fast XIP code performance

my note: RV32IMAC means: RISC-V, 32-bit integer, with standard extensions: Integer Multiplication and Division, Atomic Instructions, Compressed Instructions

compare to the 'G' standard bundle of extensions: G has floating point (both single and double) but does not have Compressed Instructions.

---

https://arstechnica-com.cdn.ampproject.org/v/s/arstechnica.com/features/2020/10/the-space-operating-systems-booting-up-where-no-one-has-gone-before/?amp=1&usqp=mq331AQFKAGwASA%3D&amp_js_v=0.1#referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Farstechnica.com%2Ffeatures%2F2020%2F10%2Fthe-space-operating-systems-booting-up-where-no-one-has-gone-before%2F

https://www.reddit.com/r/Qtum/comments/7qxq5l/spacechain_os_optimized_operating_system_for/

mentions VxWorks?, RTEMS (FOSS), Sylix (FOSS) (not much english docs for Sylix)

---

128 core GPU for $60

The Nvidia Jetson Nano 2GB is similar to a Raspberry Pi — it is a Linux computer on a single board. But it has a 128-core Nvidia GPU for accelerating deep learning models and it supports CUDA.

(so 64k cores would be 512 of these, which would cost $30k)

---

"Meanwhile, having been generally designed around modern machines with seemingly limitless resources, higher-level languages and environments are simply too full-featured to fit into (say) tens of kilobytes or into the (highly) constrained environment of a microcontroller. And even where one could cajole these other languages into the embedded use case, it has generally been as a reimplementation, leaving developers on a fork that isn’t necessarily benefiting from development in the underlying language." -- http://dtrace.org/blogs/bmc/2020/10/11/rust-after-the-honeymoon/

---

"By marking themselves as no_std, programs confine themselves to the functionality found in libcore. This functionality, in turn, makes no system assumptions — and in particular, performs no heap allocations. This is not easy for a system to do; it requires extraordinary discipline by those developing it (who must constantly differentiate between core functionality and standard functionality) and a broad empathy with the constraints of embedded software. Rust is blessed with both, and the upshot is remarkable: a safe, powerful language that can operate in the highly constrained environment of a microcontroller — with binaries every bit as small as those generated by C. " -- http://dtrace.org/blogs/bmc/2020/10/11/rust-after-the-honeymoon/

---

"We had a maximum call stack depth of 7 in our GSM handsets, and with C++ (and everything else object oriented) it is very hard to make that guarantee. "

---

CamperBob?2 1 hour ago [–]

Those questions are all too open-ended to lead to specific answers, but in general, if you don't actually need a serious OS in your project, I'd suggest getting your feet wet with Arduino and then moving on to ARM-based platforms such as Teensy when you need more power (which you eventually will.)

The advantage of Arduino is that it will give you a good introduction to embedded development and I/O principles. Working with Arduino, you'll find that the CPU is reasonably capable but that the programming abstractions are clunky, and that'll give you the excuse you need to get comfortable with direct I/O port access. What you learn along the way will apply on more sophisticated platforms.

As for the temperature sensor, it's most likely digital (I2C or similar interface). Without the part number it may be difficult to get it working. If you don't have the part number, it's best to chuck it and order one from Adafruit or Sparkfun that comes with the necessary documentation and support code. Reusing salvaged parts isn't the cost/productivity win that it used to be.

If you need more horsepower for your project than an Arduino or Teensy-class board can provide, then you would want to look at Raspberry Pi or Beaglebone Black. If you aren't already comfortable with *nix development, you have a massive learning curve ahead, for better or worse, and you're going to be spending a lot of time in Professor Google's classroom.

reply "

---

" Systems on a Module

A “system on a module” (SoM?) is a device that packages all the complex parts onto a single module that is ready to integrate with your project. You will still need to build a carrier PCB that the SoM? mounts to, but this is comparatively straightforward.

I’m grouping these boards separately from the ones above, for a couple reasons. First, they don’t have all the connectors present on the bigger boards. In turn, they are much smaller, and they are typically designed to be easy to embed into your own project. This makes them a sweet spot for many hobbyists who want to build a board without dealing with ultra-fine-pitch soldering. You can do pretty well with a price cap of $20.

My favorite in this price range is the Onion Omega2S. It’s MIPS-based MT7688 with built-in Wi-Fi plus 64MB of RAM and 16MB of NOR flash, all on a nice package with castellated edges. Even better, it’s available on Mouser. It also has a single-board cousin, the Omega2. " -- https://www.thirtythreeforty.net/posts/2019/12/mastering-embedded-linux-part-2-hardware/

MediaTek? MT7688 Datasheet says:

Embedded MIPS24KEc (575/580 MHz) with 64 KB I-Cache and 32 KB D-Cache

so, yeah, 32k again

---

" Buy it—single-board computers

The first angle of attack is to look for computers that are explicitly marketed as running Linux all in a single package — a Single Board Computer (SBC). There is an abundance of these, and the numbers only continue to grow.

The iconic Raspberry Pi is a great option for getting started. It has storage, memory, and connectivity in spades. If you are just getting started and would like to follow along with my software tutorial a little later, you cannot go wrong with a Pi. However, the Pi is somewhat bulky compared to some other less well-known boards, and for many embedded applications, it’s a lot of overkill. Also, if you need to rely on the board, the SD card is often considered a single point of failure because of its comparatively poor reliability.

The smaller $5 Pi Zero gets you an entry-level Pi that has 512MB of RAM. The Pi Zero W adds Wi-Fi and Bluetooth for $5 more. Both also offer a neat trick that isn’t found on larger models: they have a USB On-the-Go (OTG) port, which lets it emulate all kinds of functionality when it’s plugged into a computer, like a virtual Ethernet adapter or a flash drive.

via Wikimedia Foundation

Because it’s such a great platform, for much of this series, we’ll use a Raspberry Pi Zero W as our hardware.

There are other popular boards from various companies. OrangePi? makes a bewildering array of SBCs. In my mind, the key feature of most of these is their superior interconnection — even the Pi 4 cannot beat some of the features of these boards, including 4G LTE, PCIe, and mSATA adapters. Most of them also include eMMC, which is flash memory that’s designed for embedded, so it is more reliable than an SD card. High-end OrangePi? boards can be even pricier than Raspberry Pis, but they make some $10 SBCs too! The Orange Pi Zero Plus2 is $13 shipped

The Orange Pi Zero Plus2 is $13 shipped

Finally, there is a very good database of SBCs called Hackerboards. If you want to watch for new ones, keep tabs on sites like Linux Gizmos and CNXSoftware, which offer embedded news and frequently announce new Linux SBCs. " -- https://www.thirtythreeforty.net/posts/2019/12/mastering-embedded-linux-part-2-hardware/

---

i think the RPi Zero uses the same processors as the RPi 1.

" The Broadcom BCM2835 SoC? used in the first generation Raspberry Pi[33] includes a 700 MHz ARM1176JZF-S processor, VideoCore? IV graphics processing unit (GPU),[34] and RAM. It has a level 1 (L1) cache of 16 KiB? and a level 2 (L2) cache of 128 KiB? " -- https://en.wikipedia.org/wiki/Raspberry_Pi

so 16k again? although possible the icache is half that? nope, there is 16k icache PLUS 16k dcache actually, see table 2 in this paper:

http://web.eece.maine.edu/~vweaver/projects/vmwos/2018_memsys_os.pdf

that table also notes that the Pi 3B has 32k icache and 32k dcache.

---

" the Lichee Nano uses a very small Allwinner part called the F1C100s. Here’s the Nano next to an SD card. Pretty small.

Here’s the Nano next to an SD card. Pretty small.

This is an ARM9 design Allwinner apparently makes for dashcams. "

-- https://www.thirtythreeforty.net/posts/2019/12/designing-my-linux-business-card/

"The F1C100s and F1C200s are identical ARM9 SIP processors with either 32 MB (F1C100s) or 64 MB (F1C200s) SDRAM built-in. They nominally run at 400 MHz but will run reliably at 600 MHz or more." -- https://jaycarlson.net/embedded-linux/#f1c200s

"ARM926-EJS16KB I-Cache/16KB D-Cache" -- http://dl.linux-sunxi.org/F1C100/Allwinner_F1C100_datasheet_20110331.pdf

(so, yeah, 16k again)

---

so the previous few entries suggests again that we should try to make stuff (like the runtime for Oot) fit within 16k, although 32k is not too bad either.

recall that micropython can run in 16k RAM and 256k ROM [2]

---

https://jaycarlson.net/embedded-linux/ cleared up something for me:

the meaning of 'application processor' instead of MCU is that an application processor has an MMU (and therefore, it can run Linux, which requires an MMU (and presumably various other OSs like that)). The ARM Cortex A series are application processors, the ARM Cortex M series are MCUs. (interesting, Carlson says that in most cases, you should buy an A series over a beefier M series because a beefier M series is actually more expensive and less performant on many metrics than a corresponding A series part:

" The biggest difference between these application processors and a microcontroller is quite simple: microprocessors have a memory management unit (MMU), and microcontrollers don’t. Yes, you can run Linux without an MMU, but you usually shouldn’t: Cortex-M7 parts that can barely hit 500 MHz routinely go for double or quadruple the price of faster Cortex-A7s. They’re power-hungry: microcontrollers are built on larger processes than application processors to reduce their leakage current. And without an MMU and generally-low clock speeds, they’re downright slow. "

ARM9 is an older type that many things can't run on; "popular runtimes like Node.js and .NET Core simply will not run on an ARM9 processor".

The Cortex A series is ARMv7-A and ARMv8-A (and ARMv8*-A such as ARMv8.1-A, ARMv8.2-A, ARM8.3-A).

---

so although i did spend a lot of time looking at smaller parts, realistically i think no one is going to be running much Oot on less than a Cortex A series (although i do think it's useful to keep the even lesser applications in mind). So we can assume that the target platform has at least the resources of the worst Cortex A, and maybe a little more. That doesn't tell us much new however:

https://en.wikipedia.org/wiki/Comparison_of_ARMv7-A_cores

https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_cores

the lowest end ARMv7-A cores are lower-end than what we've been considering, with some 4k L1 cache. However, the mid-end ARMv7-A may provide a better target:

32-bit
32k + 32k L1 cache
FPU, with 32 64-bit FPU registers
64-bit or 128 bit SIMD (NEON)
at least 256k L2 cache

compare to the lower end (mb not lowEST tho, eg there is one that is 32-bit) ARMv8*-A cores:

64-bit
at least 8k/8k L1 cache
they all have FPU (32 regs)
they all have SIMD (NEON), 32 × 128-bit
at least 128 KiB? L2 cache

putting these together, i think we can assume at least:

32-bit
16k + 16k L1 cache
FPU, with 32 64-bit FPU registers
64-bit SIMD (NEON)
at least 128k L2 cache

so we might say that we want the runtime to fit in 16k and everything to fit in 128k (e.g. we assume that even when we are not in ROM we want to fit in L2)