proj-oot-lowEndTargets-mcuComparisons

---

comparison between cortex M0 and MSP430:

http://deltas.blog.com/2013/03/13/arm-cortex-m0-vs-msp430-or-are-m0-based-devices-really-16-bit-mcu-replacements-2/ https://web.archive.org/web/20160525065504/http://deltas.blog.com/2013/03/13/arm-cortex-m0-vs-msp430-or-are-m0-based-devices-really-16-bit-mcu-replacements-2/

---

https://jaycarlson.net/microcontrollers/

the following quotes are out of order:

" The Amazing $1 Microcontroller A new series that explores 21 different microcontrollers — all less than $1 — to help familiarize you with all the major ecosystems out there.

...

microcontrollers — i.e., processors with completely self-contained RAM, flash, and peripherals

...

Microcontrollers continue to divide into two camps — those with vendor-specific core architectures, and those who use a third-party core design. Out of the 21 microcontrollers reviewed here, eight of them use a 32-bit ARM core, which is becoming ubiquitous in the industry — even at this price point. Three of the microcontrollers use an 8-bit 8051-compatible ISA. The remaining ten use the vendor’s proprietary core design: six are 8-bit parts, three are 16-bit parts, and the PIC32MM is the sole 32-bit part that doesn’t use an ARM core.

AVR

The AVR core is a famous RISC design known for its clock-cycle efficiency ... The specific AVR instruction set and timing for both parts I reviewed is known as “AVRe” — this instruction set includes a two-cycle multiply and many single-cycle operations. Note that tinyAVR parts prior to the tinyAVR 1-Series are essentially completely different MCUs with a less-capable AVR core that has no multiplier.

The AVR core has a 16-bit instruction fetch width; most instructions are 16 bits wide; some are 32. Still, this is a RISC architecture, so the instruction set is anything but orthogonal; while there are 32 registers you can operate with, there are very few instructions for working directly with RAM; and of those 32 registers, I’d say that only 16 of them are true “general purpose” registers, as R0-R15 can’t be used with all register operations (load-immediate probably being the most important). ... It was also designed for C compilers, too — with 32 registers available at all times, compilers can efficiently juggle around many operands concurrently; the 8051, by comparison, has four banks of eight registers that are only easily switched between within interrupt contexts (which is actually quite useful).

And interrupts are one of the weak points of the AVR core: there’s only one interrupt priority, and depending on the ISR, many registers will have to be pushed to the stack and restored upon exit. In my testing, this often added 10 PUSH instructions or more — each taking 2 cycles.

Another issue with AVR is the generally slow clock speed ...

Microchip PIC16

There’s something fundamentally goofy about almost all aspects of the PIC16 that make it seem, at first glance, completely bizarre that it is as popular as it is.

PIC16 uses an odd-ball 14-bit-wide program memory, yet it’s an 8-bit machine. This dramatically simplifies the core architecture: a 14-bit word can hold just enough data to specify every CPU instruction — with enough free space left in the word to address up to 128 registers or 2K of program memory (for the two jump/call routines). ... Since real MCUs have more than 128 bytes of registers and 2K of program memory, this PIC has a bank selection register (BSR), which is written to whenever you need to swap banks (which happens a lot). ...

PIC16

The PIC16 is a single-register machine, and that register is named W. Everything you do will essentially be moving something into W, doing something with it, and then moving it back to somewhere. Consequently, programming it in assembly is easy, and downright fun.

Because this part can store 8192 14-bit program words, Microchip will tell you this part has 14 KB of flash (close to 16 KB, right?), but users will tell you that it has 8K of program memory — 8192 words of memory — since storing an 8192-element byte array will occupy all 14 KB of its flash memory. Keep this in mind when comparing memory.

Microchip PIC24

While the PIC10, 12, 16, and 18 are all 8-bit cores with 12-16 bit program memory, the PIC24 moves up to 16-bit data operated through 24-bit instructions (are you starting to catch onto the numbering system?)

...

The PIC24 has new indirect addressing modes that allow incrementing/decrementing and register-offset addressing, has a few more other instructions, and has three — instead of two — hardware breakpoints; but otherwise, the core is very much in the spirit of the PIC16.

The PIC24 carries the excellent power consumption figures that the PIC16 has, but many of the parts lack the clocking and oscillator options the MSP430 has (and apples-to-apples, the MSP430 is lower-power).

The dsPIC versions of these parts — which add DSP-friendly instructions — are popular for motor drivers,

...

Microchip PIC32

While everyone was migrating their 8-bit proprietary cores to Arm, Microchip was gleefully popping out PIC parts. But in 2007, they finally decided to add a new microcontroller — the PIC32 — which uses a third-party, industry-standard 32-bit core. Instead of following everyone to the Arm ecosystem, they took a different turn: PIC32 parts use the MIPS architecture — specifically the M4K core.

MIPS built this core for single-chip MCU applications. M4K has 32 registers, a 5-stage pipeline, vectored interrupts and exceptions, bit-manipulation, and 16-bit instruction encoding support.

It is not the same as an Arm processor, but at the C application level, they are similar enough that any Arm developer should have no problems (other than the usual manufacturer-to-manufacturer peripheral differences).

...

Arm Cortex-M0

The Arm Cortex-M0 3is a 32-bit RISC architecture that serves as the entry-level Arm architecture available to silicon vendors for microcontroller applications. Arm cores are designed by Arm Holdings and licensed to semiconductor manufacturers for integration into their products.

It’s important to understand the history of Arm because it explains a serious feature of Arm microcontrollers that differs substantially from the 8051 (the other multi-vendor architecture that dominates the field): Unlike the 8051, Arm is just a core, not a complete microcontroller. The ARM7TDMI-S didn’t come with any GPIO designs, or provisions for UARTs or ADCs or timers — it was designed as a microprocessor....Since many microcontroller projects spend 90% or more of the code base manipulating peripherals, this is a serious consideration when switching from one Arm MCU vendor to another: there’s absolutely zero peripheral compatibility between vendors, and even within a single vendor, their Arm parts can have wildly different peripherals.

Unlike other Arm parts, the M0 series only supports a subset of the 16-bit Thumb instruction set, which allows it to be about 1/3 the size of a Cortex-M3 core. Still, there’s a full 32-bit ALU, with a 32-bit hardware multiplier supporting a 32-bit result. Arm provides the option of either a single-cycle multiply, or a 32-cycle multiply instruction, but in my browsing, it seems as though most vendors use the single-cycle multiply option.

In addition to the normal CPU registers, Arm cores have 13 general-purpose working registers, which is roughly the sweet spot. The core has a nested vector interrupt controller, with up to 32 interrupt vectors and 4 interrupt priorities — plenty when compared to the 8-bit competition, but a far cry from the 240 interrupts at 256 interrupt priorities that the larger Arm parts support. The core also has full support for runtime exceptions, which isn’t a feature found on 8-bit architectures.

The M0+ is an improved version of the M0 that supports faster two-cycle branches (due to the pipeline going from three-stage to two-stage), and lower power consumption.

...

One of the biggest problems with ARM microcontrollers is their low code density for anything other than 16- and 32-bit math — even those that use the 16-bit Thumb instruction set. This means normal microcontroller type routines — shoving bytes out a communication port, wiggling bits around, performing software ADC conversions, and updating timers — can take a lot of code space on these parts. Exacerbating this problem is the peripherals, which tend to be more complex — I mean “flexible” — than 8-bit parts, often necessitating run-time peripheral libraries and tons of register manipulation.

Another problem with ARM processors is the severe 12-cycle interrupt latency. When coupled with the large number of registers that are saved and restored in the prologue and epilogue of the ISR handlers, these cycles start to add up. ISR latency is one area where a 16 MHz 8-bit part can easily beat a 72 MHz 32-bit Arm microcontroller.

8051

...

The 8-bit modified Harvard core has a fully-orthogonal variable-length CISC instruction set, hardware multiplier and hardware divider, bit-addressable RAM and specific bit-manipulation instructions, four switchable banks of eight registers each, two-priority interrupt controller with automatic register bank-switching, 64 KB of both program and extended RAM addressability, with 128 bytes of “scratch pad” RAM accessible with fast instructions.

...

The original had 4K of ROM 6, 128 bytes of RAM, four full 8-bit GPIO ports (32 I/O total), a UART, two or three timers, and a two-priority interrupt system.

The 8051 has a fully orthogonal CISC instruction set, which means you can do nearly any operation with immediate, direct, or indirect operands, and you can do these operations in RAM, registers, or the A accumulator.

...

Because of its small core and fast interrupt architecture, the 8051 architecture is extremely popular for managing peripherals used in real-time high-bandwidth systems, such as USB web cameras and audio DSPs, and is commonly deployed as a house-keeping processor in FPGAs used in audio/video processing and DSP work.

...

STM8

The STM8 core has six CPU registers: a single accumulator, two index registers, a 24-bit program counter, a 16-bit stack pointer, and a condition register. The STM8 has a Harvard architecture, but uses a unified address space. There’s a 32-bit-wide program memory bus which can fetch most instructions in a single cycle — and pipelined fetch/decode/execute operations permit many instructions to execute in a single cycle.

The claim to fame of the core is its comprehensive list of 20 addressing modes, including indexed indirect addressing and stack-pointer-relative modes. There’s three “reaches” for addressing — short (one-byte), long (two-byte), and extended (three-byte) — trading off memory area with performance.

This is the only architecture in this round-up that has this level of granularity — all the other chips are either RISC-style processors that have lots of general-purpose registers they do their work in, or 8051-style CISC parts that manipulate RAM directly — but pay a severe penalty when hitting 16-bit address space. The STM8 manages these trade-offs in an efficient manner.

...

In 2017, we saw several new MCUs hit the market, as well as general trends continuing in the industry: the migration to open-source, cross-platform development environments and toolchains; new code-generator tools that integrate seamlessly (or not so seamlessly…) into IDEs; and, most notably, the continued invasion of ARM Cortex-M0+ parts into the 8-bit space. ... I wanted to explore the $1 pricing zone specifically because it’s the least amount of money you can spend on an MCU that’s still general-purpose enough to be widely useful in a diverse array of projects.

Any cheaper, and you end up with 6- or 8-pin parts with only a few dozen bytes of RAM, no ADC, nor any peripherals other than a single timer and some GPIO.

Any more expensive, and the field completely opens up to an overwhelming number of parts — all with heavily-specialized peripherals and connectivity options.

These MCUs were selected to represent their entire families — or sub-families, depending on the architecture — and in my analysis, I’ll offer some information about the family as a whole.

If you want to scroll down and find out who the winner is, don’t bother — there’s really no sense in trying to declare the “king of $1 MCUs” as everyone knows the best microcontroller is the one that best matches your application needs. I mean, everyone knows the best microcontroller is the one you already know how to use. No, wait — the best microcontroller is definitely the one that is easiest to prototype with. Or maybe that has the lowest impact on BOM pricing?

I can’t even decide on the criteria for the best microcontroller — let alone crown a winner.

...

Compilers

The biggest change in the last 10 years is the democratization of tools — even proprietary, expensive compilers tend to have generous code-size limitations (64 KB or more in some cases — plenty for a quick evaluation or hobbyist projects).

...

The fastest IDE flash load times came from the Infineon XMC1100, running the J-Link firmware, which could fill its entire 8 KB of flash and run to main() in 2.47 seconds. That’s impressive, coming from an Eclipse-based IDE not known for its debugging kick-off abilities.

...

PIC18 devices can reach up to the PIC18F97J60 — a 100-pin beast with 128 KB of flash (64 K words), and almost 4K of RAM. While most of these 8-bit parts have similar peripherals across the board, I must note the Ethernet MAC and PHY present in the PIC18F97J60. While many higher-end microcontrollers have an Ethernet MAC, this low-end PIC18 part is one of the only microcontrollers — at any price — to also integrate a PHY (7).

7: The only other mainstream MCU that has an integrated Ethernet PHY is the $14 Tiva-C TM4C129x, a giant 128-pin 120 MHz Arm Cortex-M4 from Texas Instruments. There are a few other (albeit odd) choices out there: Freescale’s legacy ColdFire? microcontrollers include the MCF5223X, which has an integrated Ethernet PHY. Fabless designer ASIX manufacturers the AX11015, a 100 MHz 8051 with an integrated Ethernet PHY

...

One other thing to note is that GCC uses the normal convention for function calls: any call-saved registers the function needs will be pushed to the stack by the function and restored before returning. But there’s also a bunch of call-used registers available for user functions to clobber, which makes it easier to write assembly routines, and gives the compiler plenty of room for handling function locals.

This is normal if you come from PC or ARM development, but many MCU architectures and compilers don’t PUSH or POP registers at all; instead, specific registers (or RAM addresses) are set aside for specific functions. The advantage of GCC’s standard calling approach is simplicity, flexibility and the ability to support large projects efficiently — you also get reentrancy for free, which compilers like Keil’s C-51 require you to explicitly request when declaring the function. "

" These days, $1 buys you a mid-range, general-purpose basic microcontroller that's got dozens of I/O, half a dozen or more PWM channels, 10 or 12 bit ADC, decent sets of timers, and enough flash and RAM to cover most general-purpose entry-level needs. " -- the author, on [1]

"

Early observations (not finalized!):

    Siliicon Labs' 8051s are cycle-for-cycle similar to the AVRs in performance, but run at much higher clock rates.
    One of the biggest determiners for power consumption is whether the core is supplied by an internal 1.8V LDO or not; this is not as well advertised in the datasheet or on distributor web sites as it should be.
    The Renesas RL-78 is one of the best MCUs in this price range (considering performance and power consumption), has fantastic free dev tools, and it's virtually unheard of in the U.S.
    Nuvoton's Cortex-M0 parts are probably the simplest ARM MCUs to use, as they have much simpler power topology. They feel like an 8-bit MCU, which would be great for beginners looking to move up to ARM. They have $10-20 dev boards and free dev tools. But you pay heavily for all this with terrible power consumption numbers
    Different M0 vendors seem to have quite different interrupt structures for their peripherals, which can affect latency hugely
    ARM-GCC produced much faster (but much larger code) in all my tests when compared to MDK using comparable optimization settings. No flame war until final testing is finished, please." [2]

"

Re: $1 MCU review — looking for part suggestions! « Reply #11 on: July 31, 2017, 05:50:12 AM » If you are kickstarting, and plan to offshore your production, my suggestion is to try STM8, STM32 and STC. These are widely used in China due to the firmware protection. If you release a product in China that makes money that has weak code protection, your competitor will get the firmware and even decompiled source code the next day.

STM8 and STM32 have very good reputation in code protection, so does the new AVR (ATXMEGA, ATTINY-A, etc.). Chinese MCU companies, by definition, take this very seriously. For instance, STC pays bounty for people finding a way to crack their MCU.

If the MCU you chose simply uses a fuse or simple software bit to protect its firmware, then forget about it. There are lots of companies in China do intrusive MCU decrypting with focused ion beam and scanned electron microscopy technology. Universities rent their FIB machines for very low price, and they don't ask what you're using it for. You pay $20 per hour, you get the FIB and a technician, no questions asked. " [3]

" Or even if you're a hobbyist working here. The STC, dollar-for-dollar, looks fantastic when compared to other 8-bit MCUs. They have decent English datasheets, and their parts are readily available on Ali Express, which many U.S. shoppers are becoming comfortable using, and Taobao (which hardcore shoppers are fine wtih). All STC parts have a UART bootloader and UART debugging (via a monitor program), so if you've got a USB-to-UART dongle laying around, that's all you need. Developing in Keil is much grosser than the Silicon Labs' 8051 parts, though, which use their free, Eclipse-based IDE — while still calling the (excellent) Keil C51 compiler under the hood.

The STM8 is a fantastic microcontroller that ships with a great peripheral library, and has many features found only on Cortex MCUs (slew rate / drive strength, lots of clocking options, and a nice core design). Biggest problem for me is that STVD feels like it belongs in Windows 98.

STM32F0 is pretty average when compared to other Cortex-M0 parts, but it's really cheap in China, and, like the STM8, supports $5 ST-Link debugging. Lots of good, free, Eclipse-based tools. " [4]

" While what you said was true of compilers when the AVR was designed, it's not necessarily true in 2017. In fact, the fact that the 8051 has fine-grained balance between data space and access time (MOVs and operations can take 1, 2, or 3 cycles to execute -- basically the number of bytes long the instruction is -- based one where the data is) gives you control over performance, in many scenarios. In my testing, an ATTiny and an EFM8 were both tested with identical code to perform high-pass filtering using a 16-bit direct form I biquad implementation. They have almost identical performance numbers. This largely comes down to memory access, with Silicon Labs' pipelined core structure helping edge it out over the ATTiny. To add insult to injury, the EFM8 at the same price point runs at 72 MHz from an internal oscillator, while the ATTiny can only hit 8 MHz from an internal oscillator. And the EFM8 @ 72 MHz uses less power than an ATTiny @ 8 MHz. Obviously, this was *one test* and the results are preliminary.

Of course, Cortex-M0 parts that cost the same have come in and wrecked both of these in terms of raw processing performance and nJ-per-sample power consumption figures (but not power consumption in wait/sleep modes!) " [5]

" The whole comparison thing might be more interesting if done at more price levels (perhaps with fewer chips at each price level.) In general, I feel like I have a much better idea about what a "typical" $1 microcontroller will do than "do you get 5x or 10x the capabilities with a $5 or $10 microcontroller?" (and a comparison at the $15 level (near the top-of-the-line for AVRs) is a lot more dramatic in some ways. The 8bit chips are about 256k flash and 8k of RAM, while the 32bit chips are up at 1MB flash and 256k of RAM!) " [6]

note that a key thing in embedded is peripheral availability -- which i'll be utterly neglecting because my interest is in (slow, non-systems-level) HLL programming language design. The actual article talks a lot about peripherals, compilation toolchains, power consumption, and speed (both latency and throughput), all of which i'm less interested in. So if you are actually looking to compare these MCUs, go look at the actual article, not my notes here, as my notes here will purposefully leave out almost everything you care about.

MCUs and their cores in the review (noted as 'custom' unless they have an entry in the CORES section of the main article):

MCUs and their memory (these are mostly quotes):

These are most famous for use by the Arduino Diecimila, Duemelanove, and Uno.

(historically) popular application notes:

and notes on the article's application suggestions:

Still, the modern world is run by chips that do their low-power work through duty-cycling — waking up every five seconds, ramping up the clock to full-speed, processing data, and then going back to sleep....if you need to wake up and process some data as quickly as possible to get back to sleep, the PIC is not for you. It’s just too slow....In general, this used to be an easy ecosystem to recommend to students and hobbyists — but I think there’s considerable less value here than there used to be — especially up against other ecosystems these days...Having said all that, this is a unique controller that has some plausibly useful (though heavily application-specific) peripherals that you should keep in the back of your mind. The NCO, digital modulator, and configurable logic are especially useful for things like modulating IR, driving NeoPixels?, or doing spread-spectrum generation for quiet stepper motor drive."

Otherwise, you’ll be using the free evaluation version of C51, which is limited to 2 KB of flash (though you still get optimization and all that jazz). Having said that, for a lot of applications, 2 KB of flash is plenty (and 2 KB flash parts are common for production — especially for 8051 cores, which are very space-efficient).

There’s something fun about playing with an oddball chip you’re not going to find on DigiKey?...But for more normal people looking for a solid MCU platform, and can afford to spend an entire U.S. dollar on a part, there’s probably better options out there."

the ordering of section 'discussion', in case it matters; also, the taglines from 'discussion' are collected here:

other and custom ISA notes:

PIC (PIC16)

" PIC originated from the 1976 General Instrument Programmable Interface Controller, a peripheral controller built to boost the speed of the CP1600 CPU (which struggled with poor I/O performance). General Instrument sold the PIC1650A, with customer’s own microcode, separate from the CP1600 shortly thereafter.

The PIC1650 had a 12-bit-wide program ROM word size — each instruction fit into a single word. There were three types of instructions: standard byte-oriented file register operations, bit-oriented file register operations, and literal/control operations.

Instructions were able to be so short because there was only a single working register, W.

Byte-oriented file register operations had a 12-bit word had a 6-bit opcode, a 1-bit “destination” flag (determining if the CPU should place the result in W, or in the original register), and a 5-bit “file” (register) value. These would be operations like “Move W to f” or “Subtract W from f.”

The architecture had dedicated bit-set, bit-clear, and bit-test support for all register files, too. The bit instructions append the three-bit index of the bit to the 4-bit instruction — leaving room for the 5-bit file register address.

Literal operations were also accomplished with 4-bit instruction, leaving room for an 8-bit value. An interesting literal instruction is RETLW — return with literal in W — which allows the core to access ROM look-up tables in three instructions.2

Absent from the instruction set are conditional-branch statements; rather, there is a “skip” instruction — often combined with “goto” — to control program execution. ... I haven’t mentioned the stack because there isn’t one — at least not something that’s user-accessible. There’s a two-deep stack used only to store subroutine return addresses. ... You may be wondering why I’m discussing such an ancient architecture that surely has no relevance today. Yet, as it turns out, Microchip has made only minor tweaks over the years, and even the latest, greatest PIC16F15688 reviewed here has only modest changes from this original architecture.

In fact, the PIC1650A lives on in the “Baseline 12-bit” series of PIC parts, which encompass PIC10, PIC12, and even some PIC16 devices (like the PIC16F54).

Why? Because for lots of basic embedded projects, this is more than plenty — and while you could reach for a more-powerful microcontroller, it most likely won’t hit the 1.4 mA @ 16 MHz mark this PIC16 does.

Most of the PIC12 devices add a bit deeper stack and interrupt support (the “Enhanced 12-bit” series), but the first real change is the 14-bit and Enhanced 14-bit series — made up mostly of PIC16 parts, but also some PIC12 devices.

Here, the program word size has increased to 14-bit, allowing more RAM and program space access. Enhanced 14-bit devices have additional instructions plus indirect address modes through a new set of File Select Register (FSR) — of which there are two.

The PIC18 devices bump up the word size again to 16-bit, while adding much better indirect addressing support, and a few more instructions — some of which allow using the FSR2 as an emulated stack pointer. "

PIC24

" The PIC24 is a modified Harvard architecture 16-bit CPU with two-cycle instruction timing. It has a 17×17 single-cycle multiplier as well as a 32-by-16-bit hardware divider. There are 16 general-purpose registers, and its RISC architecture made the architecture more suitable for C development. The PIC24 has close lineage to the dsPIC DSP-endowed microcontrollers.

The PIC24 is named for the 24-bit instruction width used by the architecture; like the other PIC parts, the PIC24 does an instruction fetch and execution in a single cycle, regardless of instruction (except for branches and double-word moves).

The PIC24 has a vectored exception system similar to ARM microcontrollers; there’s also a seven-priority interrupt controller with up to 118 interrupt sources. "

HT66

" The HT66 feels quite similar in design to a Microchip PIC16: a 4-cycle single-accumulator RISC architecture, with an 8-level stack for saving the PC address, plus a banking arrangement used to address more than 256 bytes of memory. Unlike the PIC16, there’s a single 128-byte SFR set placed at the bottom of RAM. The remaining 128 bytes of addressable space are split into two banks to cover the 256-byte capacity this part has. The 63-instruction ISA is similar to the PIC16, but also includes bit manipulation instructions. "

" Quote from: Yansi on August 01, 2017, 03:52:49 AM

    Quote from: wraper on July 31, 2017, 08:05:18 AM
        Quote from: funkathustra on July 31, 2017, 06:10:04 AM
            Of course, Cortex-M0 parts that cost the same have come in and wrecked both of these in terms of raw processing performance and nJ-per-sample power consumption figures (but not power consumption in wait/sleep modes!)
        Also cortex M at such price don't have such interesting peripherals. For example, with EFM8UB1 I can get MCU with crystal-less USB, 5V tolerant GPIO, internal 3.3V voltage regulator which can supply up to 100 mA to external devices and 12 bit ADC for around $0.60 @100pcs. That is pretty amazing.
    Look at STM32F042.  ;)

Well, I looked on it briefly :(. Costs more than twice as much. No internal 5V -> 3.3V regulator, ADC capable inputs are not 5V tolerant. To use it with USB, instead of just a few decoupling caps I would need to use external VREG. ADC is much less versatile, no analog gain adjustment (0.5/1). Only single 1.25V reference voltage (1.65V and 2.4V for UB1). No analog comparator, UB1 has 2 of them, no internal DAC (used with comparators, both inverting and non-inverting inputs). UB1 comparator also has programmable hysteresis. " [7]

" Excluding the Propeller and XMOS and a few other weird architectures, most every family has a $1 entry on my list, so once you compare the base members, you're good. " [8]

" Look at STM32F042. ;)

    Well, I looked on it briefly  :(. Costs more than twice as much. No internal 5V -> 3.3V regulator, ADC capable inputs are not 5V tolerant. To use it with USB, instead of just a few decoupling caps I would need to use external VREG. ADC is much less versatile, no analog gain adjustment (0.5/1). Only single 1.25V reference voltage (1.65V and 2.4V for UB1). No analog comparator, UB1 has 2 of them, no internal DAC (used with comparators, both inverting and non-inverting inputs). UB1 comparator also has programmable hysteresis.

On the bang-for-your-buck metric, you ain't gonna beat an EFM8UB1. For typical USB full-speed devices, it's nearly perfect. " [9]

" Reply #47 on: August 02, 2017, 11:38:57 AM » Quote

    Quote
        and a comparison at the $15 level is a lot more dramatic in some ways.
    ... once you compare the base members, you're good.

I think I strongly disagree. One of the important factors, even when considered a $1 chip, is the potential for transitioning to a more powerful chip without having to start over. This leads to some depressing facts where you're NOT good if you understand the low end. Consider RAM. None of the 8-bit PIC architecture chips can address more than 4k of RAM. Few 8bit microcontrollers have as much as 16k of on-chip RAM, and even if they're "capable", you start running into restrictions - For example, "such-and-such banking scheme isn't support transparently by anything except xyz compiler", or isn't supported at all... (This applies to program memory as well. I like both the AVR and MSP430s, but neither is something I'd use if I thought I might someday need more than 64k...) Also, between chip complexity and vendor libraries, a lot of the 32bit chips being offered as 8bit replacements (ie the ones that fall in your price range) are a bit disappointing as well. I don't think I believe that a 16k SAMD10 is going to be an adequate replacement for a 16k AVR, for example. The usual answer for ARM chips is "just use a chip with more memory; they're still cheap." That's moderately true, but ... there are no samd10-pinout chips with more than 16k. " [10]

" brucehoult

Quote from: funkathustra on August 01, 2017, 05:33:55 AM

        I know code density is worse on Cortex-M0 chips, but how much worse? What happens when you factor in all the manufacturer-supplied peripheral libraries? 

Worse than what? Other Thumb2 chips? AVR?

Over the years, with programming many different architectures, I've come to the conclusion that 16 bit *instructions* are very much in a sweet spot, either fixed size, or with a way to escape to the occasional 32 bit instruction. That's regardless of whether the registers and data are 8 bit (AVR), 16 bit (PDP11 [1]), 32 bit (Thumb/Thumb2, RISC-V32C), or 64 bit (RISC-V64C).

32 bit instructions just have too much generality and wasted space. Sometimes you can use all the power (and aarch64 does a pretty good job), but sometimes you only want to negate a register and 32 bits is a waste for that.

Instructions based on 8 bit units seem like they should be better, especially as they can be better Huffman coded than instructions with larger granularity, but it just doesn't work out that way in practice, at least for x86, VAX, z80, 6502 etc. Thumb fits a 3-address add or subtract (destination register, and two source registers) into a two byte instruction. Both VAX and x86 need four bytes to do that. If the destination is the same as one operand then x86 can also do it in two bytes, but VAX needs three. 6502 needs 6 bytes (assuming 8 bit data and sources and destination in Zero Page). 8080/z80 need three bytes, assuming both operands and the destination are registers other than A (mov a,src1; add src2; mv dst,a).

[1] the actual instructions are all 16 bit, but some addressing modes have a side effect of loading an immediate or offset and advancing the PC " [11]

" westfw

Quote

    I've come to the conclusion that 16 bit *instructions* are very much in a sweet spot, either fixed size, or with a way to escape to the occasional 32 bit instruction.

Somewhat agree... " [12]

" a modern, general-purpose, 16-32 pin MCU that you would drop in entry-range, current-generation products. If you look at my list of other controllers I've selected thus far, that means 3.3V, internal oscillator, low active current, lots of timers, 10-14 bit ADC with tons of channels, 6+ PWM channels, 8-16 K of flash, 1K+ of RAM, one or more UART/SPI/I2c modules, and maybe some secret sauce — programmable logic, waveform generators, DACs, etc. " [13]

" But the other thing is this: if you're a student, hobbyist, or a professional looking to evaluate an MCU, I think you'll be fine with the free "limited" compilers — no matter which platform you're using (ok, except Microchip if you want any semblance of performance). I know you're going to criticize me for calling expensive, proprietary, limited compilers "free" but effectively, they are. (We're talking free-as-in-beer, not open-source).

Take CC-RL, Renesas's in-house compiler for the RL-78, which I've been playing around with all weekend for the review. It produces absolutely fantastic code. Yes, after 60 days, it goes into code-size-limited mode — but the limit is 128 KB of linked code. That's almost comically high. I don't know about you, but I've never used that much code on an 8-bit MCU before in my entire life, and probably never would.

A lot of these ARM MCUs on my list are pretty weak in terms of flash (the PSoC? 4000S has 16K, but everything else is running 8). Need a good, efficient compiler for an 8 KB part? Go download a free version of MDK, which, by many measures is the absolute best ARM compiler for microcontrollers. It's code-size-limited to 32K, but who cares? If you're buying a 64K or a 128K part, you can probably switch back over to ARM-GCC and have enough headroom to lose a bit of efficiency. " -- [14]

"

Elliot Williams says: November 7, 2017 at 5:48 am

Throwing in my two cents: I’ve used six’ish of these chips, and every one of Jay’s evaluations rings true with my experience. "

list of MCU families:

list of MCUs:


discussion on https://jaycarlson.net/microcontrollers/

" Consequently, the megaAVR remains the most open-source 8-bit microcontroller on the market — by a long shot. "

" There's a reason why nearly every hobby electronics project you see has an Arduino in it. Assuming the Arduino IDE is installed and permissions issues have been resolved, you can go from unboxing your Arduino to making it blink in under 60 seconds.

In addition, places like Adafruit [1] and Sparkfun [2] build their own boards and make most of their code free/libre/open-source [3] [4], with a large focus on Arduino.

...

radix07 2 hours ago [-]

Open source hardware in the microcontroller world isn't quite as big of a deal as it is in the software world. While it would be great to see the processor architecture and all, I would much rather have an ARM that I know works and won't have to look at. I believe modern Arduinos are going to Cortex processors which are ARM as well, so I don't think the open source claim is what keeps Arduino hardware going. It is most likely going to be ease of use, accessibility, community and the tools.

reply

...

jventura 51 minutes ago [-]

Maybe is not the best topic to ask, but does anyone know good sources (websites, books, kits) for playing with and learning a bit more about microcontrollers and electronics?

...

 deeg 48 minutes ago [-]

adafruit.com and sparkfun.com are two of my go-tos. Pretty strong community support.

reply

"

VLM 3 hours ago [-]

No love for the PIC10 series? Perhaps too many PIC already in the review would make it look like a MicroCHIP?(TM) press release. I do see some justification in that the 10F series is more of the twenty five cents class not the $1 class.

...

Yaggo 8 hours ago [-]

Didn't read all the author's criteria, but esp8266 is worth mentioning in the context of cheap microcontrollers.

https://en.wikipedia.org/wiki/ESP8266

reply

lucaspiller 7 hours ago [-]

Yeah if anyone is just looking to get started tinkering with embedded hardware I'd say this is the way to go. The chips were made popular by NodeMCU? which uses Lua, but you can easily run Arduino or write plain C on the same hardware.

I've tried a few breakout boards and my favourite is the WEMOS D1 Mini. All you need to supply is a MicroUSB? cable. You can get them shipped from China for under $3: https://wiki.wemos.cc/products:d1:d1_ ((( my note: i assume e meant https://wiki.wemos.cc/products:d1:d1_mini )))

StavrosK? 5 hours ago [-]

I can vouch for the D1 mini also. It's the one you want, by far. It has an on board programmer so you can power and program it from USB, while being the smallest board in size out of the popular ones.

The actual smallest I've seen is one I made: https://github.com/skorokithakis/tiny-ESP8266-breakout

reply

carbocation 4 hours ago [-]

Have you tried the NodeMCU?? (Not suggesting it, as I actually killed mine when trying to program it, but more curious about whether you’ve compared it to the Wemos D1.)

reply

StavrosK? 4 hours ago [-]

Yes, many of them, and a few others. The D1 is much smaller and has no redundant pins. It's also sexier.

reply

neya 5 hours ago [-]

I've had the best overall experience using an ATTINY85 [1]. It almost serves my purpose for every use case I've had in the past couple of years - Running steppers, basic home automation, wireless, etc.

If you buy them in bulk, you can get it down to less than $1 per MC. [2]

You can even install Arduino libs on it and the Arduino interface has quite good support for it as well.

Some of my projects with it include a DSLR camera slider [3], wireless home automation (turn off lights when I'm not in the room, communicate with my air-conditioner to maintain the room temperature using IR, etc.)

[1] http://www.microchip.com/wwwproducts/en/ATtiny85

[2] https://www.ebay.com/itm/5PCS-ATTINY85-20PU-IC-MCU-8BIT-8KB-...

[3] https://www.instagram.com/p/1AscllHDQ0/?taken-by=discovery.d...

reply

kirillkh 5 hours ago [-]

Do you just use development boards with these? And which ones?

reply

neya 4 hours ago [-]

The AT Tiny can be programmed with an Arduino Uno, which is what I used. It's pretty easy. But there are also cheaper boards specifically for the AT Tiny.

reply

cjsuk 8 hours ago [-]

That’s a rather good article. I will digest that today.

I’m currently using Cypress’s PSoC? line (4200M boards) for a couple of personal projects. The reconfigurable hardware and analogue parts actually kill a lot of external hardware. It’s pretty amazing and ridiculously cheap. Visual Studio is the IDE for this.

https://uk.rs-online.com/mobile/p/processor-microcontroller-...

Note: don’t just jump into these if you think it’s just a better Arduino as the learning curve is extreme. You really have to know your stuff before you open the box.

reply

joezydeco 1 hour ago [-]

PSoC? is an underrated line of parts. Now that they've dropped the wacky 8051-alike used in the older models and switched to ARM, there's a lot more you can do with these things.

reply

" The whole comparison thing might be more interesting if done at more price levels (perhaps with fewer chips at each price level.) In general, I feel like I have a much better idea about what a "typical" $1 microcontroller will do than "do you get 5x or 10x the capabilities with a $5 or $10 microcontroller?" (and a comparison at the $15 level (near the top-of-the-line for AVRs) is a lot more dramatic in some ways. The 8bit chips are about 256k flash and 8k of RAM, while the 32bit chips are up at 1MB flash and 256k of RAM!) " [15]

---

A quick overview of some things Motorola has worked on in the past:

https://www.eejournal.com/article/20140416-kinetisv/

---

"

The 6502 is nearly a RISC machine in number of machine cycles per instruction (about 2 average) yet has powerful addressing modes for table look-up-driven real-time software. The indirect, indexed addressing mode has yet to be beat by any RISC machine, which takes too many instructions to do the same thing. " -- [16]

" Indirect-indexed addressing

In this commonly used Addressing mode, the Y Index Register is used as an offset from the given zero page vector. The effective address is calculated as the vector plus the value in Y.

Indirect-indexed addressing is written as follows:

     LDY #$04
     LDA ($02),Y

In the above case, Y is loaded with four (4), and the vector is given as ($02). If zero page memory $02-$03 contains 00 80, then the effective address from the vector ($02) plus the offset (Y) would be $8004.

This addressing mode is commonly used in array addressing, such that the array index is placed in Y and the array base address is stored in zero page as the vector. Typically, the value in Y is calculated as the array element size multiplied by the array index. For single byte-sized array elements (such as character strings), the value in Y is the array index without modification. " [17]

---

"Some cores have significantly higher performance -- for example, the ARM Cortex-M4 has DSP instructions and usually floating-point, and the Cortex-M7 has cache IIRC. ... I wouldn't worry about ARM assembly language. They are almost always programmed in C, and you usually don't get enough performance improvement to justify going to ASM. OTOH, a generic C compiler won't know about special ARM instructions for DSP, etc. In that case using ASM or linking in an ASM library can buy you a lot. "

"

Re: Choosing a Microcontroller Brand michaelkellett michaelkellett Top Member Jun 19, 2015 3:06 AM (in response to screamingtiger)

Inevitably we all bring our own prejudices so apologies for mine in advance, please you can take what you can from it.

I design micro based systems for a living and almost always end up using an ST ARM based micro controller. I used to use AVRs and Silabs 8051 based parts but the ARMs have left them both way behind except for one or two very special applications. "

" I'd stick to NXP or ST. These seem to be the main players in ARM country judging from the ARM based projects mentioned on this blog. " -- [18]

" I have had a look at STM32 and I just find it rather complex and a lot of code to do simple things. Not much useful tutorial/ documentation either.

They are all quite "complex".

I would summarize it this way:

ST: OK chip at cheap prices. Nothing stands out so an all-round winner. Dominates Cortex-M3. NXP: pricy chips with limited features. Proven design. Worst software support of all vendors. TI: premium chips with lots of features. Buggy. CM4 focused. Freescale: premium chips with great software support. CM0/CM4 focused. Others: more or less specialty vendors.

So if you are poor, go with ST; If have lots of money, go with Freescale. "

" My observations:

ST: Cheap and OK chip. No vendor-provided dev environment, but multiple free options available in addition to paid options. There is ST-provided driver library, but it is limited, its code quality is low and its footprint is huge. For commercial quality projects, prepare to write whole bare-metal code yourself. Some parts of datasheets are not so easy to read; some information is quite well hidden or not there at all. Things are not consistent at all, expect mix of different styles and quality. USB stack is terrible if you want to use USB. NXP: More expensive chips. Quite cheap LPCXpresso boards. The vendor-provided libraries work well, but lately they have added too many options and everything starts to get to a mess; you have to read documentation to understand how to use things. Still, everything works well out-of-the-box, just prepare to read some documentation and be prepared for some wtf moments. Datasheets are OK. TI: Haven't used, but when looking at the history and errata sheets, everything seems very buggy. It will take TI a huge effort to come out of this hole they have dug themselves into. Freescale: Haven't used Cortex-M. Silabs Energy Micro EFM32: Good chips, a little on expensive side. Good datasheets. Their vendor-provided tools work well and are easy to use. Cypress PSoC?: a different view on hardware. FPGA/HW guys will like the configurable blocks. Tools work well (windows-only). Chips have been on pricier side for long time; prices have been dropping in recent times. The debugger was more expensive than others. Getting documentation from web site requires registration and you'll get some e-mails now and then with questions like "how did you like that file you downloaded recently". Spansion / Fujitsu FM4: haven't tried Atmel: Won't use. Too many disappointments on all product ranges (AVR, SAM3S, SAM9). A lot of marketing bullshit, real world is different. Datasheets are very well written. "

" I know that Atmel and Freescale have longevity programs, for what that is worth. TI dropped a whole range of Stellaris devices recently... that caught a lot of people out. "

-- [19]

---

" Atmel: Won't use. Too many disappointments on all product ranges (AVR, SAM3S, SAM9). A lot of marketing bullshit, real world is different. Datasheets are very well written. "

" Read Atmel's datasheets very carefully. I have blacklisted Atmel because they have proven to be way too optimistic in their specifications. If they specify a device for a minimum voltage of 3.3V it can't run at 3.3V in real world applications. "

" I know that Atmel and Freescale have longevity programs, for what that is worth. TI dropped a whole range of Stellaris devices recently... that caught a lot of people out. "

-- [20]

---

further analysis of https://jaycarlson.net/microcontrollers/#discussion

MCUs by ISA:

8-bit:

8051 8-bit

AVR 8-bit

PIC16 8-bit

STM8

other 8-bit

16-bit:

PIC24 16-bit

Other 16-bit

32-bit

MIPS 32-bit

Arm Cortex-M0 32-bit

MCUs he seemed to particularly like:

8-bit: 8051

STM8:

16-bit

other

32-bit

Arm Cortex-M0

ones that he only kinda likes but other people like (eg i've seen comments on HN etc):

ones that he kinda liked:

8-bit

8051

AVR

32-bit

MIPS

ARM Cortex-M0

ones that he likes except for ecosystem issues:

32-bit

ARM Cortex-M0

he neither liked nor hated; also, disliked but it has its place; also, niche-ISAs that i probably don't need to further investigate:

8-bit

8051

PIC15

other

32-bit

ARM Cortex-M0

MCUs he seemed to dislike:

---

futher analysis:

shorter list of the ones he 'liked' thru 'kinda liked' and 'liked with ecosystem issuse':

MCUs he seemed to particularly like:

8-bit: 8051

STM8:

16-bit

other

32-bit

Arm Cortex-M0

ones that he only kinda likes but other people like (eg i've seen comments on HN etc):

ones that he kinda liked:

8-bit

8051

AVR

32-bit

MIPS

ARM Cortex-M0

ones that he likes except for ecosystem issues:

32-bit

ARM Cortex-M0

---

putting it together again (except i'm dropping the Nuvoton M051, and the STC8). Within each bitwidth class, trying to list classes and parts in descending order of what he likes:

Note: we are talking about $1 MCUs so we're talking about price-sensitive projects here. Also he is very concerned with low-power. A lot of the applications are industrial (or hobbyist) control (eg lighting, quadcopters, etc) or low-power consumer stuff. He's also sensitive to how easy it is for hobbyists to use these products, eg are the dev tools cheap and easy to learn.

8-bit:

8051 (his favorite 8-bit arch)

STM8:

AVR

 note: some toolchain ppl also mention platformio and VS Code

PIC16

16-bit

32-bit

ARM Cortex-M0

Tensilica Xtensa

MIPS

---

lessons for oot from the $1 microcontroller review article and discussions about it: https://jaycarlson.net/microcontrollers/

ISAs to think about:

memory in the $1 MCU review in the systems selected in the previous section (not e.g. the esp's): ranges from 8k of flash and 1k of RAM (and one of them had only 768 bytes of RAM) to 16k flash (one of them had 32k) and 8k RAM (one of them had 16k RAM, and the STC8, which i dropped, has 64k of flash; but 4k was much more common).

to get memory in the 'range', i first glanced through the reviews noted above and see what he mentioned, then i went to company websites and looked, and then i checked digikey by searching for the product line identifiers inside the 'Integrated Circuits (ICs) > Embedded - Microcontrollers' and then looking at max RAM and max Program Memory Size. Usually if the highest value looks very high to me, and if there is only one part, and its variants, or maybe even only one 'series', at a high value, then i put that in parens and find the highest value with a second parts or series.

his mentions: " These parts top out in the $6+ range, with 128 KB of FRAM and 2 KB of RAM (though you can get MSP430s, more broadly, with better specs yet). ... other parts in the PSOC 4000S family run up to 48 MHz, with 32 KB of flash and 4 KB of RAM, and come in larger 48-pin packages (in the case of the CY8C4045AZI-S413). ... Having said all that, I know why you’re here: the most famous set of ATmegas is the ATmega8 line — including the original ATmega8, plus the widened family: the ATmega48, ATmega88, Atmega168, and ATmega328. These are most famous for use by the Arduino Diecimila, Duemelanove, and Uno. ... These range from 4K to 32K of flash, and vary from 512 to 4 KB of RAM — but otherwise have essentially the same peripherals and features, including: "

digikey for:

so filtering the list of MCUs above for at least 16k RAM, and at least 32k program memory available in range:

if we wanted at least 32k RAM, filter further (only AtMega? is eliminated, becuase it doesn't go to 32k RAM):

(and which others from the above digikey search have we left out that satisfies these memory requirements? The RL78 and the SAM. Let's not both with the RL78, because it has yet another ISA; and we already have a bunch of more popular Cortex M0s so let's forget about the SAM. We lost all of our 8051s, all of our AVRs, and all of our PIC (PIC ISAs, we still have the PIC32); in fact we lost all of our 8-bits. If we search Digikey for core size smaller than 16 bits (including 8/16 bits), with Part Status Active, and RAM size at least 32k, the most notable results are the AVR XMEGAs and the Zilog Z8 Super8s; the Zilog Z8 Super8s have a lowest minimum quantity of 48. So let's add in the XMegas so that we have an 8-bit, and an AVR, representative)

(and what if we only wanted 16k RAM, as indicated in ootBrainNotes1? Above it was noted that only AtMega? was eliminated by the jump to 32k RAM, and we already dealt with that by adding in XMega, but let's see if the Digikey 8-bit and 8/16 bit for >=16k RAM, <32k RAM turns up anything else. OK, we get 152 AVR Xmegas, 21 AVR AtMegas?, 17 eZ80s and ZGATEs (both from Zilog), a handful of LC87s, a handful of 8051s; so not that much different).

Also he mentioned the Propeller and XMOS although he didn't include them. I wrote down elsewhere that these seem 'interesting'. Let's add them in:

He likes the Infineon XMC1100 due to its peripherals but for our purposes that isn't as important. So let's keep it in mind if we want to learn about that stuff later, but for now drop it; the PSoC? and STM32 are very popular so no need to look further for Cortex M0 representatives. MSP430 is very popular (see https://www.embedded.com/electronics-blogs/other/4420311/MCU-popularity--Engineer-vs--provider-perceptions ; however i have my doubts about the relevance of that survey, it seems to mix historical popularity and current, while leaving out ARM) and ESP8266 is very popular (although i dunno if Xtensa is popular in general). I dunno if PIC32 is popular, but MIPS is (or was) popular, and is still used as an educational example of RISC. So we're left with the following chip families of note to consider:

MSP430. 'msp430': program memory up to 512k words, RAM up to 32k (also one 66k entry, MSP430F5659I in series MSP430F5xx and also one entry '2MB (Internal), 128MB (External), 64MB (FPGA)' which is MSP430FR5989IRGCR-ND in series MSP430™ FRAM). For 'msp432' you go up to program memory 256k, 64k RAM. 32-bit Cortex-M0: Cypress PSoC?. program memory up to 256k words, RAM up to 64k 32-bit Cortex-M0: STM32 STM32F042. program memory up to 2M words, 384k RAM (plus up to up to 512k for the STM32F7 series) 32-bit Tensilica Xtensa: ESP8266, ESP32. esp8266 has 64 KiB? of instruction RAM, 96 KiB? of data RAM. up to 16 MiB? external QSPI Flash is supported (512 KiB? to 4 MiB? typically included) MIPS (ISA, not a chip; but a chip would be some Microchip PIC32, program memory up to 1M words (plus up to 2MB in the PIC 32MZ series), RAM up to 256k (plus up to 640k in the PIC 32MZ DA series). With 'pic32mm', we have up to 256k program memory and 32k RAM

over these, it seems like except for the Propellor, every range has program memory of at least 256. XMega and Propellor have RAM limits of 32k; otherwise, MSP430, PSoC?, ESP8266, XMOS have 64k RAM limits (instruction RAM, in ESP8266's case). The next limit is 256k. So I think we'd better stick with 64k at the most, and probably 32k. So here's what's left:

Note: the diversity seen in this list understates the dominance of ARM.

Note: apparently 32-bit Cortex-M0 handily beat in terms of power efficiency, which is probably more important for Oot than power consumption in wait/sleep modes:

"Of course, Cortex-M0 parts that cost the same have come in and wrecked both of (an AVR and an 8051) in terms of raw processing performance and nJ-per-sample power consumption figures (but not power consumption in wait/sleep modes!)" [31]

as far as oot is concerned the advantage of the propeller is nullified because their advantage is having a unified memory space for a small amount of parallel processors where is what we need is an unbounded asynchronous memory spread across a very large amount of processors. The XMOS is better because you can connect them together. Epiphany also has a PGAS so maybe it's not for us (can you connect them together? yes 'e-link' but maximum grid size of 64: [32] [33]).

possibly relevant Google search:

https://www.google.com/search?q=avr+psoc+stm32+esp8266+propeller+xmos

https://www.google.com/search?q=avr+psoc+stm32+esp8266+xmos

https://www.google.com/search?q=avr+psoc+stm32+esp8266

https://www.google.com/search?q=propeller+xmos

if i had to prioritize some of these i'd say:

note: at one point the author says that some criteria for his choices were:

" These days, $1 buys you a mid-range, general-purpose basic microcontroller that's got dozens of I/O, half a dozen or more PWM channels, 10 or 12 bit ADC, decent sets of timers, and enough flash and RAM to cover most general-purpose entry-level needs. " -- the author, on [34]

" a modern, general-purpose, 16-32 pin MCU that you would drop in entry-range, current-generation products. If you look at my list of other controllers I've selected thus far, that means 3.3V, internal oscillator, low active current, lots of timers, 10-14 bit ADC with tons of channels, 6+ PWM channels, 8-16 K of flash, 1K+ of RAM, one or more UART/SPI/I2c modules, and maybe some secret sauce — programmable logic, waveform generators, DACs, etc. " [35]

some related links if i get one are:

mb:

---

" I'm using stock Lua 5.1.4 (not eLua) on an ARM, using armv6, I'm seeing about 189k of code, including most of the libraries (no OS or IO libraries) "

" Stock lua 5.1 on Cortex M3 is 70k. When taking any desktop-y code onto embedded systems you usually find out it drags in a substantial part of libc and libm, in this case giving a 190k executable (with newlib as libc). You *probably* don't want doubles, but once you've killed off the last few references to doubles in Lua itself, fprintf still drags in the softfp library. At which point you would rebuild newlib not to have them either.

RAM is the real problem for Lua. Looking at the STM32 chips, I see 256k flash devices with 32-64k of RAM. The Arduino-killer STM32F103RB6T is 128k/20k. Yeah, quantity 1000 it's only like $1 more for the 64k devices, but we're talking about a $4 chip. If you just want to play with cheap Lua, economics in the West shove you at wireless routers running Linux. I see the D-Link DIR-301 for $30 retail, and that's 32M RAM, 4M flash. Comparable non-branded direct-from-China are in the $25 range including a USB port or two (for you to plug your direct-from-China Arduino clone into of course). "

---

" The STC15W is the monster — 40-pin PDIP (with 61K of flash! $0.88!) "

---

"MCUs that can do everything (popular example is the STM32F4) are complicated to configure, use, and lay down on a board." [36]

---

"

        speed compared between 8051 and AVR, I would say that (single clk)8051 is in general faster(at same clk speed) if you can stay inside the 256 byte RAM, where the AVR don't have any penalty for more RAM, and there it's normally faster.
    Yup, you got it. With Silicon Labs' pipelined cores, the number of clock cycles an instruction takes is simply equal to the number of bytes long the instruction is (minus conditional branches). You have essentially three levels of granularity on the 8051 -- registers, "scratchpad" RAM, and XRAM, so MOV and math operations can take 1, 2 or 3 clock cycles, depending what you're operating on (gross simplification, but useful way of thinking about things, in my opinion).

There is some spread in the 'faster' bands.

8051 has boolean opcodes, interrupt priority and register bank switching, and can DJNZ on any DATA memory location - code that uses those features, benefits

AVR has some 16b-data opcodes and better pointer operations, so code that uses those can look better.

The biggest difference is AVR tops out at 16-20MHz at 5V, but lower MHz at lower Vcc. 8051 top out at 72MHz(LB1) at 3v, or 25~33MHz at 2.2~5.5V for other vendors.

The SiLabs? series have what is effectively a fractional baud UART, even on the smallest parts, so peripherals can make a difference.

"

---

" but it's really just that the developer needs to have more thorough understanding of the memory model of the platform, which you don't need for AVR. That's what made AVR look very elegant when it was introduced. For what it's worth, I've had to use reentrant functions precisely once in the three or four commercial projects I've done on 8051s, and it's easily accomplished by adding the "reentrant" keyword to your function. Keil will throw a warning (though not an error, oddly!) if you forget this. "

---

some $3.50 MCUs with 128MB RAM in the package:

jonsmirl says: November 7, 2017 at 5:47 am

HI3518E, GM8135, Allwinner V3S — all have 128MB RAM in the package. There are two dies internally. Report comment Reply

    jonsmirl says:	
    November 7, 2017 at 5:49 am
    All of those chips cost in the $3.50 range.

---

---

i noticed that some people in Reddit and Hackaday mentioned, among other things:

---

another way to assess memory might be to build up a histogram of #of MCU parts on Digikey by memory size:

this appears to be the microcontroller product category on Digikey:

https://www.digikey.com/products/en/integrated-circuits-ics/embedded-microcontrollers/685

you can filter by 'Program Memory Size' and 'RAM size'

The 'Program Memory Size' options are: -, External, 384B (256 x 12), ..., 1KB (1K x 8), ..., 8KB (8K x 8), ..., 256KB (256K x 8), ..., 1MB (1M x 8), ..., 4MB (4M x 8), ..., 8MB (8M x 8), ..., 2GB (2G x 8)

The RAM size options are: -, 16 x 8, 23 x 8, ... 254 x 8, 256 x 16, 256 x 4, 256 x 8, ..., 1K x 8, ..., 4K x 8, ..., 8K x 8, ..., 256K x 8, ..., 1M x 8, ..., 10M x 8

No filtering: 69,668 results Ram size == '-': 357 results Ram size <1k: 17,427 Remaining (Ram size >=1k: 47,110 Remaining) Ram size <2k: 24,158 Remaining (>=2k: 40,379 Remaining) Ram size <4k: 31,896 Remaining (>=4k: 32,641 Remaining ) Ram size <8k: 39,017 Remaining (>=8k: 39,017 Remaining (about 1/2)) Ram size <16k: 44,738 Remaining (>=16k: 19,799 Remaining (about 1/3)) Ram size <32k: 50,886 Remaining (>=32k: 13,651 Remaining (about 1/5)) Ram size <64k: 55,916 Remaining (>=64k: 8,621 Remaining (about 1/7.5)) Ram size <128k: 59,330 Remaining (>=128k: 5,207 Remaining (about 1/12)) Ram size <256k: 62,262 Remaining (>=256k: 2,275 Remaining ) Ram size <512k: 63,573 Remaining (>=512k: 964 Remaining )

let's also count some <=s:

Ram size <=4k: 36,810 Remaining (>4k: 27,727 Remaining (about 1/2)) Ram size <=8k: 43,295 Remaining (>8k: 21,242 Remaining (about 1/3)) Ram size <=16k: 49,187 Remaining (>16k: 15,350 Remaining (about 1/4)) Ram size <=32k: 54,829 Remaining (>32k: 9,708 Remaining (about 1/7))

(note: i'm counting words, not bytes, so 8k x 8, 8k x 4, and 8k x 16 are all counted as 8k).

so the median is around 4k, 2/3 is around 8k, 1/4 or 1/5 is around 16k, and 1/7 is around 32k.

Now do the same for Program Memory Size and EEPROM size:

Prog Mem <=8k: 16,776 Remaining (>8k 47,761 Remaining) Prog Mem <=16k: 24,422 Remaining Prog Mem <=32k: 32,412 Remaining (>32k 32,125 Remaining (about 1/2)) Prog Mem <=64k: 41,201 Remaining (>64k 23,336 Remaining) Prog Mem <=128k: 49,499 Remaining Remaining (>128k 15,038 Remaining ) Prog Mem <=256k: 55,896 Remaining Remaining (>256k 8,641 Remaining )

so Program Memory Size is like RAM but about 8x; median at 32k, 1/7.5 at 256k

as for EEPROM Size, '-' has 47,052 Remaining, so most of the MCUs don't have EEPROM.

so my analysis is:

recall also that:

and i think the upshot is:

---

this forum thread https://www.embeddedrelated.com/showthread/comp.arch.embedded/45923-1.php#tabs1-chronological has some opinions on ColdFire? vs PowerPC? vs ARM:

" Overall, the ColdFire? is a very, very nice architecture (where the Arm is merely "nice"), and these are good devices. "

" Arm7 core seems to have serious limitations: ... interrupts handling is poor and vectorisation should be done in software (there are proprietaty solutions to reduce this limit) ... (((but someone else says))) Nearly all modern ARM cores include a VIC (Vectored Interrupt Controller) of some sort. "

" The real choice you should be considering is Power PC vs. the rest. Years ago I had to switch from CPU32 (((my note: a 68k derivative)) to something newer and I considered the Coldfire, V4 was on its way to become soon available. ... I bit the bullet and switched to the Power PC - and I have not regretted that for a minute....My advice would be in favour of the PPC - more than a single manufacturer, by far the best architecture from a technical point of view I know of (not that ARM or Coldfire are bad, PPC is just ages ahead), "

" The PPC architecture has it's strong points - being easy to understand, easy to use, and suitable for a small device as a step up from 8-bit are not among them. If the OP is used to the simplicity of devices such as the AVR or the 8051, but needs a bit more processing power, then a PPC-based micro would come as a very big shock. In itself, the PPC architecture is nice if you are thinking big - for example, the condition code system is much more scalable than in the ColdFire? ISA. If you are thinking small to medium, however, the PPC is overly complex, especially at an assembly level. Migration to 64 bits is possible, but hardly "clear" - an address bus that runs from A31 "up" to A-8 is not exactly pretty, and it's hardly in the OP's sights as a step up from 8-bit. "

" Having written many megabytes of sources for both CPU32 and PPC, I can say that PPC is beyond reach for competing architectures nowadays. Using its native assembly makes no sense at all, if you don't have VPA you are stuck with C or other HLL. "

---

---

for Oot Assembly, out of the ARMs, i suggest we target the Cortex-M23, since it seems to be an evolved Cortex-M0 (Cortex-M0 is the first, Cortex-M0+ is the second, and Cortex-M23 seems to be the ethird). Except that we should assume the more restrictive Harvard architecture, which the M3, M4, M7, M33 have. The M23 has 4 instructions that the M0+ doesn't; i guess Oot Assembly should have instructions for these, i guess the implementation should have to do them in software. Although actually i assume most Oot stuff will be on M33s, for the better power efficiency.

--- each core on the GA144 (the F18A) only has 128 words of memory (plus 20 words of stacks and registers).

as a guy says below, "Think of the chip as being a step up from FPGAs (in some ways) rather than a small CPU.". The Epiphany/Parallella/Adaptiva stuff seems more like what i'm aiming for; each Epiphany core in the Parallella has 32kb local memory [37]

also this guy had a difficult time putting the GA144 onto a development board that is sold with it:

http://www.forth.org/svfig/kk/10-2013-Ruffer.pdf

so, i think i should remove the GA144 from consideration as a target architecture. Although i should still learn about it.

https://www.reddit.com/r/programming/comments/89ocp/colorforthcom_update_seaforth_is_dead_chuck_moore/ says that the "S40" (is this a similar thing?)

" [–]kragensitaker 2 points 8 years ago

Nobody has demonstrated running high-level languages on the chip, but of course it's computationally universal so you can. But you probably aren't going to run Lisp usefully on a single core with 128 words of memory. You probably won't run anything that involves dynamic memory allocation, in fact. Think of the chip as being a step up from FPGAs (in some ways) rather than a small CPU.

    permalinkembedsaveparentgive gold"

---

here's what an Adapteva guy once said about the competition [38]:

" The idea (maybe its naive?) is that if we put this platform in a lot of different universities for close to nothing,then at least it could be used as a tool for quickly teaching all the current methods. We don’t see this happening without access to cheap and orthogonal hardware.

I am familiar with the examples you mentioned, and I do think there are some differences:

GreenArrays? GA144–>people didn’t want forth

XMOS–>great effort but not well known, no floating point, not high enough performance. (please correct me if I am wrong)

Parallax–>not modern enough.

GPUs–>not general purpose enough, not really ANSI-C programmable. Constrains programming model too much.

FPGAs–>not really software programmable

We do feel that the Epiphany would serve as a better experimentation platform and teaching platform for parallel programming. We already support C/C++/OpenCL? and we have people interested in porting openMP and MPI(lite). Halmstad U in Sweden is even playing around with Occam. "

i have somewhat different opinions, for purposes of Oot:

---

searched on Digikey for 'Active' status MCU chips with DIP packaging with RAM >=8k, program memory >=16k:

https://www.digikey.com/products/en/integrated-circuits-ics/embedded-microcontrollers/685?k=&pkeyword=&pv156=212&pv156=193&pv156=53&pv156=187&pv156=100&pv156=221&pv156=74&pv156=75&FV=402c67%2C402c8e%2C402ec1%2C402f59%2C403b8c%2C400034%2C40196e%2C401970%2C401975%2C40197c%2C401985%2C40198a%2C401993%2C401a9e%2C401c01%2C401c35%2C400050%2C4021de%2C4021e5%2C402415%2Cffe002ad%2C26c00ab%2C26c00ac%2C26c00ae%2C26c00af%2C26c00b3%2C26c00b7%2C26c00b8%2C26c00c1%2C26c00c3%2C26c00c5%2C26c00c8%2C26c00d4%2C26c00d6%2C26c00dc%2C26c00df%2C26c00e1%2C26c00f0%2C26c00f2%2C26c00f4%2C26c00fd%2C26c0103%2C26c0105%2C26c0106%2C26c010d%2C26c0114%2C26c0115%2C26c0117%2C26c0118%2C26c0119%2C26c011f%2C26c0121%2C26c0122%2C26c0133%2C26c0139%2C26c013d%2C26c0148%2C26c016e%2C2700064%2C2700095%2C27000bb%2C27000c1%2C27000d4%2C27000dd%2C2700028%2C2700035%2C270004a%2C270004b%2C270004c&mnonly=0&ColumnSort=0&page=1&quantity=0&ptm=0&fid=0&pageSize=25

only 125 results. Only manufacturers are Microchip Technology and Parallax. Only processors are Propeller, AVR, dsPIC, MIPS, PIC. No ARM Cortex! I think this says more about the decline of DIP (and about which manufacturers are 'hobbyist-friendly') than about which parts we should focus on though.

---

here's what ppl say are the easiest SMD (surface-mount) packaging for hobbyists:

" Use large parts, avoid BGAs and fine pitch packages and you will be fine ... use larger components, most people I know are fine with anything larger than an 0805 ... SOICs are easy to solder. SSOP at TSSOP with finer pitch are moderately difficult without a good soldering iron. ... I've never had any issue soldering any SMD package (I've avoided BGAs though). ... From my experience SSOP is also okay, it really depends on the pin pitch, the larger the better ... 0805 is fine. 0603 is still possible, but at least I have a less than 100% succes rate with those. TQFP's the same issue: with some practice they are a definite possibility but don't expect to get 100% right. ... Going beyond 0805 doesn't make it easier anymore, but more difficult IMO. Thermal mass increases. The easy part in soldering SMD is that the solder flows so quickly, due to small thermal masses, that it's not easy to fail in the way that many people fail through-hole components - not heating enough, or not heating all parts (pad + component lead) at the same time.

When designing PCB, don't connect pads directly to planes but use thin and long enough thermal reliefs, and a 0603 or 0805 SMD is actually easier to solder than through hole. The reason is that even without proper connection between iron, component and traces, any solder that melts will touch the trace and component, transfer heat and heat up parts in < 0.5 seconds, way before flux has burned away, providing proper wetting and solder in less than a second.

Going bigger increases thermal masses and also increases the risk of ceramic caps stress cracking when not solder just right.

IMO, 0603 is a very good size, even for a beginner. 0402 is too small for most beginners. 0805 is generous, and some beginners do prefer it over 0603.

0805 is really the sweet spot from the ease of soldering viewpoint, given properly designed PCB. Easiest component type to solder, ever. Beats through-hole. ... Personally, I prefer SSOP over SOIC. But I agree, that might turn off a lot of hobbyists. ... it isn't, until smaller than 0603 for passives, qfn and smaller pitch for ics ... SOIC and 0805 are easy. With a normal THT soldering iron (big chisel) and 0.5mm sn60pb40, add a flux pen and make it look better.

0603, TFQP and (T)SSOP require more patience. People avoid these because they are small, but the trick it to let the solder do its job. Surface tension of solder helps you. I have a big chisel soldering iron and can do TQFP and (T)SOP without issues. But I use paste and air method because this is much faster. I've also seen people using hot plates (for cooking) modded with a PID controller to reflow boards.

With QFN you will need paste flux and hot air. These are really difficult because you cannot inspect them without optics. ... I personally wouldn't touch BGAs even thought some people remove/re-ball/resolder them. ... For Hobbyist stuff, SSOP, TSSOP, TQFP is fine. For passives I generally shoot for 0805 but 0603 is certainly doable - providing you have spares on hand in-case you breath. ... I would invest in a hot-air station too for DFN/QFN or rework/desodering. ... I think soldering SOIC or even SSOP, TSSOP and TQFP parts, is easier than DIP, once you've learned how to drag solder, and it is much faster. But if you are targeting hobbiests who solder once a year, chances are great that they think it is hard and don't buy it. And they might be right, because with a cheap soldering iron, no flux, wrong tip, and too much solder, it can be indeed hard. ... SMD soldering is not that difficult. For the beginning I would start with the not so small parts: e.g. 0805 size, SOP and SOT23. Especially the passives are really easy to solder as SMD even in size 0603 - usually easier than old TH style. ... All ICs that have leads are same as easy, just beware of 0.4mm pitch QFPs - those can be a bitch even with a good quality pcb and lots of flux. QFNs and DFNs are doable with some practice and appropriately designed pcb and a hot air gun.

BGAs are a bitch. Small ball count ones are doable if one is forced to though... ... It depends on what kind of smd you are talking about. SOIC, LQFP, SOT and TSSOP are no problem. 1206 is easy. 0805 is doable.

On the flip side, QFN, BGA, or 0402 are way beyond my capabilities, :) ...

I work with 0603 components, TSSOP and fine-line TQFP fairly regularly. I uses a 936 clone with a wedge tip, and I have 5.3/5.2 vision. "

" SOIC is best for the beginner since they are still pretty large. 0805 resistors / caps are the smallest I'd go for beginner hobbyist ... I'd stay away from parts with exposed pads, BGA's. CGA's, TSSOP, Lead-less parts ... BGA parts, or anything with hidden pads, are generally not suitable for hobbyists. ... if there is anything finer than a 50-mil lead pitch, I would not buy a bare board and try to assemble it ... 1.27mm (0.05 inch) pitch is easy by hand. ... If you don't have #4 above, keep the chips to 0603 size (0.030"x0.060") and the IC pitch to 0.05 "

" the 38-pin TSSOP chips will be substantially easier to solder than the 0.5mm QFNs and QFPs you usually end up with in these pin counts. "

in summary sounds like for hobbyists:

---

in Digikey, selecting 'active' status MCUs with *DIP or SOIC packaging (no exposed pads) and RAM >=8k, program memory >=16k yields 366 results. Only manufacturers are Microchip Technology and Parallax. Only processors are Propeller, AVR, dsPIC, MIPS, PIC. Still no ARM Cortex!

Selecting 'active' status MCUs with ARM Cortex M* processors and RAM >=8k, program memory >=16k, and looking at the packaging, excluding the 'bad's above, i see a lot of:

---

https://learn.sparkfun.com/tutorials/integrated-circuits/ic-packages https://blog.octopart.com/archives/2017/03/octopart-guide-ic-packages http://shannonstrutz.com/component-packages http://skywired.net/blog/tutorials/how-to-solder-qfp-tssop-soic-surface-mount/

sounds like *FP and *SOP are both rather hard. https://learn.sparkfun.com/tutorials/integrated-circuits/ic-packages implicitly suggests that *QFP might be a little harder than *SOP, because they claim they are listing in descending order of hand-solderability and they list *SOP first.

---

but wait what about the sam d10, which has SOIC? oh, it only has 4k of RAM (it does has 16k of program memory, though).

if we select 'active' Cortex-M* MCUs with program memory >= 16k and SOIC or DIP, we get 45, all Cortex M0 or Cortex M0+, they are LPC1100L, LPC81xM, PSOC 4 CY8C4000, SAMD(10

11)(CD). They have 2k or 4k of RAM.

if we select TSSOPs instead, we get 131 Cortex-M0 or M0+ with RAM from 2k to 16k, LPC1100L, LPC81xM, LPC1100L, LPC82x, PSOC 4 CY8C4000, PSOC 4 CY8C41xx, PSOC 4 CY8C42xx, STM32F0, STM32L0, XMC1000 (manufacturers Cyrpress, Infineon, NXP, ST).

The 24 STM32s in this list go from 2k to 8k RAM.

If we demand 8k or 16k RAM, we narrow it to series LPC82x, STM32L0, XMC1000. If we demand 16k RAM, we narrow it to series Infineon XMC1000.

---

in Digikey, selecting 'active' status ARM Cortext M* MCUs with *SOP or *QFP packaging (no exposed pads) and RAM >=8k, program memory >=16k yields 2003 2003 results from many manufacturers. So i guess that's what's available..

https://www.digikey.com/products/en/integrated-circuits-ics/embedded-microcontrollers/685?FV=4003ec%2C4003ed%2C400455%2C400457%2C40045d%2C400460%2C402c9f%2C402e6e%2C402ea2%2C402ed0%2C402f18%2C402f69%2C4004c7%2C4004c9%2C4032eb%2C4032ec%2C40346b%2C40062e%2C4006d3%2C400ec3%2C4002c9%2C402178%2C26c00ab%2C26c00ac%2C26c00b7%2C26c00b8%2C26c00c1%2C26c00c3%2C26c00c5%2C26c00c8%2C26c00d4%2C26c00d6%2C26c00dc%2C26c00df%2C26c00e1%2C26c00f0%2C26c00f2%2C26c00f4%2C26c00fd%2C26c0105%2C26c0106%2C26c010d%2C26c0114%2C26c0115%2C26c0118%2C26c0119%2C26c011f%2C26c0121%2C26c0122%2C26c0133%2C26c013d%2C26c0148%2C26c016e%2C2700064%2C27000bb%2C27000c1%2C27000d4%2C27000dd%2C2700035%2C270004a%2C270004b%2C270004c%2C7e8007d%2C7e80098%2C7e800c6%2C7e8010f%2C7e8005d%2C1f140000%2Cffe002ad&mnonly=0&ColumnSort=-1000011&page=1&stock=0&pbfree=0&rohs=0&cad=0&datasheet=0&nstock=0&photo=0&nonrohs=0&newproducts=0&quantity=&ptm=0&fid=0&pageSize=25

interesting, no XMOSs are included in the results! The XMOSs are all *QFP Exposed Pad, *QPF, *BGA. No wonder hobbyists hate them.. sounds like you can pay companies to assemble these for you... [39]

well, as unlikely as it was that i'd ever progress to actually buying a bunch of those things to run Oot on, it's even more unlikely now. Probably the most i could ever handle would be to get a dev board, or better, just use a simulator. Still useful to think about, though.

---

there was also Tilera with the TILE64, can you still buy this?

http://meseec.ce.rit.edu/551-projects/fall2015/3-6.pdf

---

also see: http://vcl.ece.ucdavis.edu/misc/many-core.html

KiloCore? PEZY-SC Processor, PEZY Computing https://en.wikichip.org/wiki/Special:WhatLinksHere/many-core_microprocessor


kilocore:

"massively parallel processor array" MPPA

[40] " Each core contains 128x40-bit local instruction memory. Data memory is also stored in each as 2 banks of 128x16-bit each (for a total of 256x16-bit). ... Per core

    640 bytes (128x40-bit) local instruction memory
    512 bytes (256x16-bit) local data memory"

" Each core supports 72 general instructions supporting signed and unsigned operations. The processor operates on 16-bit data word size with the exception of the multiply-accumulator which has a 40-bit output. Larger word size operations such as 32-bit may be emulated via software. "

---

PEZY-SC

https://en.wikichip.org/wiki/pezy/pezy-scx/pezy-sc

"The PEZY-SC is used in a number of TOP500 & Green500 supercomputers as the world's most efficient supercomputers. "

Each core has:

"A Quora post says they're doing something like the Adapteva guys creating a bunch of tiny cores with colocated memory but not much else is known other than they use a subset of OpenCL?" [41]

" [–]dylan522p_ 11 points 1 year ago

    https://imgtec.com/news/press-release/pezy-and-imagination-team-up-to-develop-next-generation-hpc-systems/

This from the op shows they are going to a 64bit mips cpu developed by imagination for future generations. Current is arm though "

" [–]Exist50 3 points 1 year ago

How are these chips superior to GPU compute with OpenCL??

    permalinkembedsavegive gold

[–]dylan522p_ 10 points 1 year ago*

    Like ClearSpeed, Adapteva, and Kalray, their chips are very close cousins of GPU chips but are not optimized for video gaming the way chips from AMD and Nvidia are. You won't find "shaders," tesselation engines, texture maps, or support for DirectX libraries on the PEZY products, but more like a set of tools and libraries for developing compute-intensive, highly parallel applications that need lots of floating-point arithmetic but relatively little data motion.

From the second to last link he gave. Absence of ROPs is probably another way they have area and efficiency savings on.

    permalinkembedsaveparentgive gold

[–]TheImmortalLS? 2 points 1 year ago

Sounds similar to how nvidia optimized Maxwell for gaming by cutting everything that wasn't somewhat needed out (less power), but for opencl

    permalinkembedsaveparentgive gold

[–]non_clever_name 2 points 1 year ago

Maxwell's amazing for machine learning too though. Usually you don't need double-precision.

    permalinkembedsaveparentgive gold"

" Tom's Hardware reports that PEZY's next generation of chips will boost the core count to 4,096 and integrate Imagination's 64-bit MIPS Warrior CPU onto a system-on-a-chip:...announced that it will integrate Imagination's highly efficient 64-bit I6400 CPUs into its many-core architecture.

...

Very doubtful. Warrior is a MIPS64r6 core, which breaks backwards compat with all previous MIPS releases. For example, all of the branch-likely opcodes are reused for compact branch instructions, which don't have a delay slot (branch likely has a weird delay slot that is only executed in the branch-taken case, which is horrible to try to reason about in the compiler).

For a supercomputer, this is probably fine. MIPS64r6 is a much nicer ISA than any previous MIPS and people who are spending tens of millions of dollars on a computer can probably be expected to recompile their code. "

" It supports OpenCL?!

When new accelerators support OpenCL?, it gets accepted more easily. So it is very interesting the PEZY-SC runs on OpenCL?. "

" Architecture

The basic architecture of all the PEZY-SCx chips is fairly similar. At the heart is the Processing Element. Depending on the model, 1000s of those PEs are then integrated on a single die.

The PEZY-SCx are designed as accelerators, that is, the a host processor (typically an Intel Xeon E5) off-loads the PEZY-SC code to execute. Those chips support OpenCL?-like programming called PZCL. Processing Element (PE) pezy-sc pe.svg

The cores are called the processing elements (PE). The PEs are designed to be very simple RISC cores that are confused as MIMD although in principle each PE can run different workloads. ... The instruction set architecture implemented is a proprietary one designed by PEZY. The instruction set supports various operations such as data flashing, synchronization, acquisition of IDs, and thread switching. Each PE has an ID which is used by the code to track processes. The PEs do not maintain cache-coherency and there is no per-PE data cache. Complex instructions are processed by the Special Function Units (SFU) located in each city. A fair amount of sacrifices were made in order to ensure the cores remain small enough so that a large amount of them can be packed into a small area " [42]

---