proj-oot-lowEndTargets-lowEndTargetsUnsorted6

https://blog.cloudflare.com/branch-predictor/

" Some time ago I was looking at a hot section in our code and I saw this:

	if (debug) {
    	  log("...");
    }
    

This got me thinking. This code is in a performance critical loop and it looks like a waste - we never run with the "debug" flag enabled[1]. Is it ok to have if clauses that will basically never be run? Surely, there must be some performance cost to that... ...

Top tip 1. On this CPU a branch instruction that is taken but not predicted, costs ~7 cycles more than one that is taken and predicted. Even if the branch was unconditional. ... Top tip 2: conditional branches never-taken are basically free - at least on this CPU. ... Top tip 3. In the hot code you want to have less than 2K function calls - on this CPU. ... Top tip 4. On this CPU it's possible to get <1 cycle per predicted jmp when the hot loop fits in ~32KiB?. ... Top tip 5. On Intel avoid placing your jmp/call/ret instructions at regular 64-byte intervals. ... Top tip 6. on M1 the predicted-taken branch generally takes 3 cycles and unpredicted but taken has varying cost, depending on jmp length. BTB is likely linked with L1 cache. ... On x86 the hot code needs to split the BTB budget between function calls and taken branches. The BTB has only a size of 4096 entries. There are strong benefits in keeping the hot code under 16KiB?.

On the other hand on M1 the BTB seems to be limited by L1 instruction cache. If you're writing super hot code, ideally it should fit 4KiB?.

Finally, can you add this one more if statement? If it's never-taken, it's probably ok. I found no evidence that such branches incur any extra cost. But do avoid always-taken branches and function calls. "

so, we're seeing numbers like 4k, 16k, 32k memory, and 2k number of function calls.

the conclusion is also interesting for unrelated reasons: "Finally, can you add this one more if statement? If it's never-taken, it's probably ok. I found no evidence that such branches incur any extra cost. But do avoid always-taken branches and function calls."

---

"Winterbloom's primary microcontroller is the Microchip SAM D series- specifically, the SAM D21, SAM D51, and SAM D11.

...

SAM D11 SAM D21 SAM D51 CPU Cortex-M0+ Cortex-M0+ Cortex-M4F Clock speed 48 MHz 48 MHz 120 MHz Max flash 16 kB 256 kB 1024 kB Max RAM 4 kB 32 kB 256 kB "

---

http://www.mynor.org/ http://www.mynor.org/tranor " MyNOR? is a single board computer that does not have a CPU. Instead, the CPU consists of discrete logic gates from the 74HC series. This computer also has no ALU. Only a single NOR gate is used to perfom all computations such as addition, subtraction, AND, OR and XOR. This computer is not fast, it is rather slow. MyNOR? can only perform 2600 8-bit additions per second, although it is clocked at 4 MHz. This is because everything is done in software. MyNOR? has only a 32 kB ROM for program storage, but this is more than enough. The very slim microcode occupies only 9 kB, the remaining 23 kB are used for the application program. "

" TraNOR? - A computer built with transistors

This is not a transistor computer. In fact, it is a computer with a CPU made up of discrete transistors, and the CPU instructions are composed by microcode stored in the same memory that also contains the application program.

This computer consists of 1897 transistors for the CPU, 598 transistors for the four 8-bit I/O-ports and additional 124 transistors for the LCD I/O board. Furthermore, three integrated memory chips were used. Of course it would be possible to build the SRAM chip with transistors and the ROM chip with transistors and diodes as well, but this would have required much more transistors.

The complexity of my design is somewhere between the Intel i4004 CPU (2250 transistors) and the 6502 (3218 transistors). The i8080 already has 4500 transistors and the Zilog Z80 even has 8500 transistors, so you get an idea how small my design still is. The most complex component that I use in the computer is the EEPROM memory chip. And even the LCD is also more complex than the TraNOR? CPU.

My design goal was to build a transistorized computer that is 100% compatible with MyNOR?. I have reached this goal, software written for MyNOR? runs on TraNOR? with the same speed. And TraNOR? doesn't need a special EPROM image either, it also works with MyNOR? ROM v1.0 and later versions. "

" Note: I could have built the computer without any memory chips. The 32 kB EPROM (ROM) could be replaced by a large diode matrix or a core rope memory, and the 8 kB SRAM could be replaced with many discrete flip-flops or discrete DRAM memory cells. And the 64 kB EEPROM is not needed at all. But for the sake of simplicity, I decided to use these chips anyway. "

" No CPU or MCU, no ALU, only one discrete NOR gate for computations The 8-bit CPU is made of 15 CMOS logic chips, 2 transistors, a ROM and a RAM chip additional 4 CMOS logic chips are used to provide digital I/O 8 kB SRAM for CPU registers, program code and data 32 kB ROM (OTP EPROM) for microcode and program storage 64 kB EEPROM for 8 user programs, with auto-boot after power on ... Hardware interrupts (except the non-maskable hardware reset) are not supported A stack memory of 256 byte enables nested subroutine calls Up to 24 digital outputs and 8 digital inputs with integrated pull-up's ... Slim microcode architecture with 28 instructions The microcode occupies only 9 kB of the ROM, 23 kB are free for the OS The Operating System provides lots of useful API functions The OS contains a calculator program that can do floating point calculations The OS contains a monitor program which allows directly programming MyNOR? in assembly

"

"

Instruction Set

MyNOR? is a CISC (complex instruction set) CPU with von-Neumann architecture. Programcode and data are stored together in the same RAM. Furthermore the RAM is used to store the stack memory and also the CPU registers. Because CPU registers are stored in RAM, MyNOR? is capable of dealing with up to 256 8-bit registers.

Instruction Function Instruction Function LD reg,# Load register with immediate value SUB reg Subtract register from ACCU (with carry) LD reg,reg Load register with other register XOR reg Perform XOR operation on ACCU and register LDA # Load ACCU with immediate value CMP reg Compare ACCU with register and set FLAG LDA reg Load ACCU from register CMP # Compare ACCU with immediate value and set FLG STA reg Store ACCU to register TST reg Test register for zero and set FLAG LAP Load ACCU through pointer JMP abs Unconditional jump to absolut memory address SAP Store ACCU through pointer JNF abs Jump to absolut memory address if FLAG = 0 ADD reg Add register to ACCU (with carry) JPF abs Jump to absolut memory address if FLAG = 1 AND reg Perform AND operation on ACCU and register JSR abs Call subroutine DEC reg Decrement register RET Return from subroutine INC reg Increment register RST Reset the CPU OR reg Perform OR operation on ACCU and register IO port Input or Output ACCU on port ROL reg Rotate register left (with carry) PSH reg Push register to stack ROR reg Rotate register right (with carry) POP reg Pull register from stack

I have optimized the instruction set a lot, so programming becomes convenient and efficient. The Cross Assembler "myca" provides some special macro instructions to make programming even more convenient: Instruction Function ADD # Add immediate value to ACCU AND # Perform AND operation on ACCU and immediate value OR # Perform OR operation on ACCU and immediate value SUB # Subtract immediate value from ACCU (with carry) XOR # Perform XOR operation on ACCU and immediate value CLC Clear (carry) FLAG SEC Set (carry) FLAG

If you are interested in a full description of the registers and the instruction set, please read the MyNOR-Instruction-Set documentation. "

---

"A GC sweep on a phone with 128k of heap is a very different thing than a desktop with a multi-GB heap." [1]

Boris Chuprin @noop_dev Replying to @Simon_Fe1 I believe Java was a part of the early IoT? vision - a language for creating safe downloadable code for cheap 32-bit single-CPU appliances and thin clients. https://en.m.wikipedia.org/wiki/Green_threads I personally believe they should have left 1OS thread/VM limitation and made VMs communicate via MP

---

https://www.youtube.com/watch?v=7ybybf4tJWw Doom on an IKEA TRÅDFRI? lamp! IKEA TRÅDFRI? RGB GU10 lamp (IKEA model: LED1923R5). https://next-hack.com/index.php/2021/06/12/lets-port-doom-to-an-ikea-tradfri-lamp/

Cortex M33, 80 MHz 108 kB RAM 1 MB internal flash 8 MB external dual-SPI flash Silicon lab's MGM210L RF module

---

dbcurtis on Aug 13, 2019 [–]

These have their uses.

A friend who frequently does contract development in the toy space has (or at least used to have) a favorite go-to MCU that costs under $0.06 in bare die. It is essentially a 6502 with 100 bytes of RAM and a metric butt-load of mask-programmable ROM. It was originally designed for greeting cards. He has designed it into toys.

It is hard to use, you need a dev kit and a good relationship with the distributor to get the documentation. It only makes sense in high-volume products, since it comes as passivated bare die so assembly requires a die-bonder and expoxy encapsulation depositer.

Not for everyday use. But as my friend says: “You haven’t lived until you have spent an entire afternoon arguing over $0.05 on the BOM.”

jerryr on Aug 13, 2019 [–]

That sounds very similar to my experience with toy development. For a toy that played a bunch of pre-recorded sounds, we used a 4-bit Winbond MCU (their MCU division is now Nuvoton) that had a tiny bit of RAM and a ton of mask ROM. Firmware development was done in assembly and targeted a huge (physically large) emulator for test/debug. When we were satisfied with the firmware, we'd send it off to our CM, who would then order the parts with our FW in ROM. They'd get back bare die parts, which were wire bonded to the PCB and then epoxied over (that miserable "glop top" packaging, which is the bane of many teardowns). Development was a bit painful, but high volume production was extremely cheap.

Edit: Oops. I conflated projects. The toy project actually used a SunPlus? MCU, not a Winbond MCU. It was an 8-bit RISC CPU running at 5MHz with 128 bytes RAM and 256KB mask ROM. The ROM held both the program and audio samples. I don't recall what encoding was used for the audio.

nickpsecurity on Aug 13, 2019 [–]

Well, I'm curious which project used a 4-bitter. Jack Gansle and Robert Cravatta did a survey a while back:

http://www.embeddedinsights.com/channels/2010/12/10/consider...

http://www.ganssle.com/rants/is4bitsdead.htm

The two examples were timepiece designs and Gilette Fusion ProGlide?. On top of getting yours, I'm curious if any of these cheap MCU's in the article could today have met whatever your requirements were for a 4-bitter?

jerryr on Aug 13, 2019 [–]

It was also for a low-cost audio application, but it wasn't a toy. This was back in 2001 or so. The MCUs in this article all only have ~1KB ROM, which wouldn't have been enough for our audio samples. We needed >256KB. The "4-bitness" was just incidentally what Winbond offered with a large ROM at the time. However, the SunPlus? that we later used in the toy also offered a large ROM with an 8-bit CPU for a similar cost. So, while I can't authoritatively say that 4-bit is dead, it does seem like there are a lot of alternatives in similar price ranges now.

andrehacker on Aug 13, 2019 [–]

Ah, yes, there was an article here a year back about the original Furby using that same configuration. The article actually had the annotated 6502 source code.

https://news.ycombinator.com/item?id=1775159

somesortofsystm on Aug 13, 2019 [–]

>6502

Without question, one of the nicest platforms to have in multitudes of thousands, at low energy and cost ..

kragen on Aug 14, 2019 [–]

Honestly I'd prefer an ARM or an 8086 or an AVR. I imagine I'd prefer a J1A too but haven't tried. The 6502 makes it a pain to do anything in any language higher-level than assembly, even C, and being 8-bit means you're constantly facing tradeoffs between making things fast or making them correct for more than 128 or 256 items.

mastax on Aug 13, 2019 [–]

Some Philips sonicare toothbrushes use(d) a 4-bit microcontroller from an obscure Swiss company. (From memory, since I can't find the EEVBlog video teardown) 52 bytes of RAM, custom size mask ROM, ? Kilohertz clock speed. It makes sense, they just needed a timer for the "2 minutes of brushing is up" feature, and maybe some battery management. It still surprised me that it was worth the hassle to save a few cents, even if they sell millions of the things. They must make insane margins: $80 for a vibration motor and $25 brush refills.

kens on Aug 14, 2019 [–]

I did a teardown of a Sonicare toothbrush that used an 8-bit PIC 16F1516 microcontroller. There's a lot more going on in the toothbrush than I expected. I expected a simple motor, but there's a mechanically-complex resonant coil mechanism, driven by an H-bridge. There's some expensive manufacturing in there. Another interesting thing was the toothbrush has a "pressure sensor" to tell if you're brushing too hard, but it's really a Hall-effect sensor.

http://www.righto.com/2016/09/sonicare-toothbrush-teardown.html

kken on Aug 13, 2019 [–]

Maybe one of these?

https://www.emmicroelectronic.com/catalog?title=&term_node_tid_depth=29

EM Microelectronics is actually not so obscure. They belong to the Swatch group and are specialized in ultra lower power analog and mixed signal circuit. Obviously, first for watches.

fpgaminer on Aug 13, 2019 [–]

Are you thinking of this Braun teardown? https://www.youtube.com/watch?v=JJgKfTW53uo That indeed uses a 4-bit micro from a Swiss company (The only sonicare teardown I found was a forum post)

mastax on Aug 13, 2019 [–]

Yep, that must be it. Thanks!

kragen on Aug 13, 2019 [–]

Designs like the GreenArrays? F18A core show that you can do dramatically more than the 6502 or Z80 did in a similar number of transistors. The J1 and J1A are free descendants. The MOStek and Zilog hackers were wizards but they were working under serious time and, in the Zilog case, compatibility constraints. We know enough to do better now.

(https://news.ycombinator.com/item?id=20688443 is on the same topic.)

(As others commented, those are transistor counts.)

petra on Aug 13, 2019 [–]

Smallest irsc-v core i know: https://devhub.io/repos/kammoh-picorv32

~1000 Luts. 1 Lut = 6-24 gates on average. a bit pmore but still pretty close.

vardump on Aug 13, 2019 [–]

A LUT can be 10 gates and it can be 100+ gates. You just can't compare FPGA LUTs to gates like that.

FPGAs have things like block RAMs and multipliers. Those require a ton of gates, but don't increase required FPGA LUT count by much.

nabla9 on Aug 13, 2019 [–]

Chips inspired by Chuck Moore's designs would fit this niche.

joezydeco on Aug 13, 2019 [–]

I would think a lot of manufacturers are looking to RISC-V for that kind of eventuality. But in the meantime, and looking at that chart, it seems a lot more convenient to just rip off the PIC.

monocasa on Aug 13, 2019 [–]

I don't think a RISC-V would fit in this gate count niche. Just the register file for an RV32-C core is half these chips' RAM.

ywfmotc on Aug 14, 2019 [–]

Yes, the PICs don't have a register file at all. They also have quite a simple instruction set, again providing gate savings, but then the size of program ROM may be a bit larger as a result.

The ALU can be a reasonable chunk of the processor size, and an 8-bit ALU is going to be much smaller than a RISCV ALU. Although I read somewhere that some of the Z80s, although an 8-bit processor, had a 4-bit ALU, and also I read somewhere that the 32-bit NIOS processor has a 16-bit ALU. But whether that's true or not ...

I designed a size optimised 16-bit MSP430 clone for small size low cost machxo3 FPGAs that used an 8-bit ALU. It a good way of keeping the number of LUTs down when optimising for size over speed.

For something like the low cost ice40 FPGAs, a PIC would probably be a very good match for those too compared to RISCV, because ice40 doesn't have distributed memory, which is what you'd like for register files (otherwise I expect one of the block RAMs would be used for the register file, and ideally you wouldn't want that to happen).

zokier on Aug 13, 2019 [–]

The top end padauk ("PFS173") actually seems pretty reasonable chip, comparable to bottom end attinys with only quarter of the price.

howard941 on Aug 14, 2019 [–]

After I got over being spoiled by ARMs I fell for Padauk's stuff which I now think are awesome and makes almost everything that leaves the building a candidate for a micro. It breaks the old rules where you had to have a certain level of complexity in the product to justify using a processor. Their stuff are economical replacements for discrete logic and a no brainer for the ubiquitous ADC->serial use cases.

My hardware guy came on really heavy against the Padauk stuff because it was put to me that the temperature range wasn't wide enough particularly at the high end where supposedly we had to operate at 125C. I actually think attinys made it in the design instead of the Padauks just because of personal preference and an unwillingness to share the project rather than unavailability in automotive temp range.

So the tubes of Padauks and a pair of ICEs I brought in so it would be very easy to play with sit unused and I am now at a different place that places more emphasis on what all of the engineers think rather than just whoever happened to draw the long straw for the project.

kabdib on Aug 14, 2019 [–]

I wrote code for a MCU that did power sequencing on a consumer product. 4K of code, 2K of RAM, some EEPROM, a 16Mhz clock . . . 18 cents.

It was a fun little project: http://www.dadhacker.com/blog/?p=1911

---

the previous section's comments were on this:

https://cpldcpu.wordpress.com/2019/08/12/the-terrible-3-cent-mcu/

" There are some obvious commonalities: All devices are designed around an accumulator based architecture, undeniably inspired from the Microchip PIC12 series. "

Interestingly, with only MDT as an exception, all vendors extended and modified their designs from the original...Some of the major shortcomings are being addressed, such as lack of interrupts, addressable space of JMP/CALL, banking of memory/IO and severe lack of periphery.

---

"To keep costs low for schools, these devices typically em-ploy 32 bit ARM Cortex-M microcontrollers (MCUs) with16-256kB of RAM and are programmed using an externalcomputer (usually a laptop or desktop" [2]

---

Basys 3 Artix-7 FPGA Trainer Board: Recommended for Introductory Users $149.00

"The Basys 3 is one of the best boards on the market for getting started with FPGA. It is an entry-level development board built around a Xilinx Artix-7 FPGA."

used in the WRAMP project WRAMP offers 16k (32-bit?) words of RAM and 32k words of ROM

---

https://www.copetti.org/writings/consoles/nintendo-64/ https://www.copetti.org/writings/consoles/

---

https://www.pjrc.com/store/teensy41.html ARM Cortex-M7 at 600 MHz Float point math unit, 64 & 32 bits 7936K Flash, 1024K RAM (512K tightly coupled), 4K EEPROM (emulated)

https://news.ycombinator.com/item?id=23143888

---

"One big thing: Apple packs an incredible amount of L1D/L1I and L2 cache into their ARM CPUs. Modern x86 CPUs also have beefy caches, but Apple takes it to the next level. For comparison: the current Ryzen family has 32KB L1I and L1D caches for each core; Apple’s M1 has 192KB of L1I and 128KB of L1D. Each Ryzen core also gets 512KB of L2; Apple’s M1 has 12MB of L2 shared across the 4 “performance” cores and another 4MB shared across the 4 “efficiency” cores." [3] "This is somewhat incomplete. The 512KiB? L2 on Ryzen is per core. Ryzen CPUs also have L3 cache that is shared by cores. E.g. the Ryzen 3700X has 16MiB? L3 cache per core complex (32 MiB? in total) and the 3900X has 64MiB? in total (also 16MiB? per core complex)."

---

(comment was about Zig and Rust on embedded)

travisgriggs 7 days ago [–]

I write/maintain multiple code bases for XMega, SAMD, and STM chips. The kind where 32K ram is cool and 256K flash is lots. I'm curious if either of these is ready for normal humans to cross compile for these kinds of targets (M0 and the ilk). Zig looks very interesting to me.

reply

---

"until 2015 one could buy the MARC4 4bit processor from Atmel (may be lurking in your car key fob remote)"

"c12 1 year ago

link

The MARC4 rings a bell. I can’t find the website now but I had bookmarked a decade ago someones project where they had made a remote temperature sensor that radioed to a base station temp/humidity and could last for years on a single 9v battery being built of less than six components that fit on a board that was barely bigger than the contacts of the 9V it plugged into. "

---

knightOS on calculators

https://drewdevault.com//2020/01/27/KnightOS-was-interesting.html

"a rather nice Unix-like environment, with a filesystem, preemptive multiprocessing and multithreading, assembly and C programming environments, and more. The entire system was written in handwritten z80 assembly, almost 50,000 lines of it, on a compiler toolchain we built from scratch.

There was only 64 KiB? of usable RAM. The kernel stored all of its state in 1024 bytes of statically allocated RAM. "

---

https://www.hackster.io/news/onio-zero-offers-up-to-24mhz-of-risc-v-microcontroller-performance-on-nothing-but-harvested-energy-70285321d50d

"The microcontroller itself is based on the free and open source RISC-V instruction set architecture — specifically, RV32EMC — running at up to 24MHz when fed 1.8V. The controller also operates at lower voltages, when required: 1V gets you 6MHz, 0.8V gets you 1MHz, and the chip will continue to run - albeit at ever-decreasing speeds - as low as 450mV, the company claims. There's 1kB of mask ROM and 2kB of RAM included, along with 8-32kB of ultra-low-power flash storage capable of 100,000 write cycles and readable down to 850mV.

The ONiO?.zero also includes a crystal-free Bluetooth Low Energy (BLE) transmitter capable of operating at voltages as low as 850mV, an IEEE 802.15.4 ultra-wide-band (UWB) transmitter operating in the 3.5-10GHz band, and an optional 433MHz MICS radio transmitter for industrial, scientific, and medical (ISM) band use.

The chip's energy comes courtesy of an internal radio-frequency rectifier, harvesting power from the 800/900/1800 and 1900/2400MHz bands (ISM and GSM). For environments without enough radio-frequency energy to reliably power the chip, the "internal power factory" supports photovoltaic cells down to 400mV, piezoelectric, and thermal sources from 1.8V to 3.6V. "

---

" The VEGA chip isn’t your typical commercial computing platform; it’s clearly purpose-designed to allow folks to evaluate RISC-V, with another rather well-known company’s cores in there as a baseline. There is a total of four cores: a pair for applications and a pair to handle the communications. Each pair consists of one RISC-V core and one ARM core. For applications, there’s a RISC-V core (more on it in a minute) and an ARM Cortex-M4F (along with a meg of flash, 256K of SRAM, and 48K of ROM). For comms, there’s a different RISC-V core and an ARM Cortex M0+ (with 256K of flash and 128K of ROM). " https://www.eejournal.com/article/priming-the-risc-v-pump/

---

https://www.eejournal.com/article/microchip-polarfire-takes-a-risc-v/ PolarFire? is a mid-range FPGA family that brings 100K to 500K logic elements with 12.7 Gbps SerDes? transceivers, which the company claims deliver significantly lower power consumption, better security, and higher reliability than mid-range devices from other vendors. The power consumption advantage, in particular, has won the original PolarFire? numerous sockets where other mid-range FPGAs couldn’t fit the power budget.

Now, the company has announced a new “SoC” version of PolarFire? that adds an abundance of processor power to the mix, in the form of a five-core RISC-V processing subsystem. ...This is possible because the PolarFire? RISC-V implementation (developed in collaboration with SiFive?) has a flexible 2 MB L2 memory subsystem that can be configured as a cache, a scratchpad, or a direct access memory ...The FPGA part of the new SoC? FPGA is basically what the company already offered in the PolarFire? family. This includes devices ranging from 23K to 461K logic elements, 68 to 1,420 18×18 multiplier MAC (DSP) blocks, 1.8 to 31.6 Mbits on-chip RAM, 4 to 10 12.5 Gpbs SerDes? lanes, and 2 PCIe gen2 endpoints. In other words – a very capable FPGA to go with a very capable processor subsystem.

---

https://www.cnx-software.com/2019/12/16/trion-t20-bga256-fpga-development-kit-pulserain-reindeer-risc-v-rv32im-soft-cpu/ But while looking for information about the board on the net, I discovered “PulseRain Reindeer for Efinix Trion T20 BGA256 Development Kit” on Github. PulseRain? Reindeer is a soft CPU of Von Neumann architecture leveraging RISC-V RV32I[M] instruction set, and featuring a 2 x 2 pipeline. Configuration tested on Trion T20:

    RV32I processor core, Von Neumann Architecture
    48KB Block RAM for code and data
    1x UART TX
    32-bit GPIO

---

" Here is a table of common x86_64 platforms and their default stack sizes for the main thread (process) and child threads: OS Process Stack Size Thread Stack Size Darwin (macOS, iOS, etc) 8 MiB? 512 KiB? FreeBSD? 8 MiB? 2 MiB? OpenBSD? (before 4.6) 8 MiB? 64 KiB? OpenBSD? (4.6 and later) 8 MiB? 512 KiB? Windows 1 MiB? 1 MiB? Alpine 3.10 and older 8 MiB? 80 KiB? Alpine 3.11 and newer 8 MiB? 128 KiB? GNU/Linux 8 MiB? 8 MiB? " [4] " The article highlights “OpenBSD before 4.6” as having the smallest stack space, but OpenBSD? 4.6 is from 2009 and essentially not really worth considering IMHO. I certainly wouldn’t call a 12-year old system a “common x86_64 platform”.

So Alpine actually has the smallest stack space with 128K in that table, followed by OpenBSD? and Darwin/macOS with 512K.

I also looked up the stack size for some other platforms (quick search, may be outdated for some):

    Solaris: 2M (64-bit) or 1M (32-bit)
    z/OS: 1M
    QNX: 256K (amd64), 512K (arm64) or 128K (32bit)
    HP-UX: 256K (but can be changed easily with PTHREAD_DEFAULT_STACK_SIZE env var).
    Minix: 132K
    AIX: 96K

" ---

MQTT

---

" on GPU register file is bigger than cache! Let’s use an example of RTX 2060 from Nvidia. Per every SM there is a register of size 256KB, meanwhile L1/shared memory per this exact SM is only 96KB. "-- https://vksegfault.github.io/posts/gentle-intro-gpu-inner-workings/

---

https://www.tomshardware.com/reviews/seeed-xiao-rp2040-review-dollar5-brain-food

$5 CPU Dual-core ARM Cortex M0+ processor up to 133MHz Flash Memory 2MB SRAM 264KB

---

https://www.eejournal.com/article/machine-learning-esperanto-coaxes-1092-risc-v-processors-to-dance-on-the-head-of-a-pin-er-chip/

1092...customized, 64-bit RISC-V microprocessor cores, 160 Mbytes of on-chip SRAM, and assorted I/O ports – all on one 7nm die.

---

https://www.hackster.io/news/hands-on-with-the-rp2040-and-pico-the-first-in-house-silicon-and-microcontroller-from-raspberry-pi-effc452fc25d 264kB of static RAM (SRAM Arm Cortex-M0+ ). An external chip provides 2MB of flash storage, while 26 general-purpose input/output (GPIO) pins on the RP2040 are brought out to the Pico's headers. These GPIO pins include hardware interrupts, 16 pulse-width modulation (PWM) pins, three 12-bit analog-to-digital converter (ADC) pins, two UARTs, two I2C, and two SPI buses $4

" The majority of the RP2040's functionality is exposed in the MicroPython? port: You can use hardware interrupts, PWM, the non-volatile storage, the ADC channels, the internal temperature sensor, SPI and I2C buses, create a thread to run on the second CPU core — and even use the PIO blocks, which allow you to copy and paste or write your own interface definitions directly in REPL or the Python IDE of your choice. ... Conclusion

It's hard not to be impressed by the Pico, especially for a first microcontroller product — albeit one from a company which has spent the last nine years all-but dominating the single-board computer scene. While it's easy to focus on a handful of missing features, in particular the lack of Bluetooth or Wi-Fi support, there's a lot of power there — especially given the board's bargain-basement $4 price point and module-ready yet breadboard-friendly format." https://techcrunch-com.cdn.ampproject.org/v/s/techcrunch.com/2021/01/21/raspberry-pi-foundation-launches-4-microcontroller-with-custom-chip/amp/?amp_js_v=a6&amp_gsa=1&usqp=mq331AQFKAGwASA%3D#aoh=16112548745968&csi=0&referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Ftechcrunch.com%2F2021%2F01%2F21%2Fraspberry-pi-foundation-launches-4-microcontroller-with-custom-chip%2F

https://blog-kalinoff-com.cdn.ampproject.org/v/s/blog.kalinoff.com/stablecoins-introduction-and-overview/amp/?amp_js_v=a6&amp_gsa=1#referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Fblog.kalinoff.com%2Fstablecoins-introduction-and-overview%2F stablecoin taxonomy intro toread

https://www.theblockresearch.com/defi-governance-games-gasless-voting-and-snapshot-90887 https://twitter.com/frankresearcher/status/1351530381590339586

https://docs.snapshot.org/ "Multiple voting systems - Single choice, Approval voting, Quadratic voting, and more" https://aragon.org/blog/snapshot

https://docs.snapshot.org/proposals/voting-types

https://www.theblockresearch.com/a-brief-overview-of-different-types-of-stablecoins-91454

https://blog-celer-network.cdn.ampproject.org/v/s/blog.celer.network/2021/01/19/celer-network-adds-polkadot-support-with-state-channel-substrate-modules-released/amp/?amp_js_v=a6&amp_gsa=1&usqp=mq331AQFKAGwASA%3D#aoh=16111909582619&csi=0&referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Fblog.celer.network%2F2021%2F01%2F19%2Fceler-network-adds-polkadot-support-with-state-channel-substrate-modules-released%2F

https://www.coindesk.com/policy/2021/01/19/enjin-coin-becomes-first-gaming-cryptocurrency-whitelisted-for-use-in-japan/ https://cointelegraph-com.cdn.ampproject.org/v/s/cointelegraph.com/news/multiple-defi-projects-unveil-plan-new-user-interface-upgrades/amp?amp_js_v=a6&amp_gsa=1&usqp=mq331AQFKAGwASA%3D#aoh=16111696347989&csi=0&referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Fcointelegraph.com%2Fnews%2Fmultiple-defi-projects-unveil-plan-new-user-interface-upgrades

https://www.eejournal.com/article/machine-learning-esperanto-coaxes-1092-risc-v-processors-to-dance-on-the-head-of-a-pin-er-chip/

---

CHUNGUS 2 - A very powerful 1Hz Minecraft CPU https://www.youtube.com/watch?v=FDiapbD0Xfg https://www.pcgamer.com/minecraftception-redstone-pc-chungus/

Other hardware used in video

3 flags, carry/overflow, zero, even

ISA: https://docs.google.com/spreadsheets/d/10_ZERVmsKr0uqQXXbHxMQW-aBpHn6tl5L6Mn-zm57O4/edit#gid=1803754650

Instruction Byte 1 Byte 2 Control lines Mnemonic Description Opcode Operands Halt Update flags ALU Writeback A byte 1 A byte 2 B Immediate NOP No-op No No No No No No No No HLT Halt Yes No No No No No No No STS Settings Setting Immediate No No No No No No No No CLI Conditional load Immediate Destination Condition Sign ext. immediate No No Yes Yes No No No Yes JMP Jump Location Yes No No No No No No No CAL Call Location Yes No No No No No No No RET Return Yes No No No No No No No BRH Branch Condition T Location in page No No No No No No No No POI Pointer E Source B No Yes Yes No Yes No Yes No SLD Special register load Destination Register No Yes Yes Yes No No No No PST Port store Source Port address No Yes Yes No Yes No No No PLD Port load Destination Port address Yes Yes Yes Yes No No No No PSH Push stack Source U W Sign extend offset Sometimes Yes Yes No Yes No No No POP Pop stack Destination U W Sign extend offset Sometimes Yes Yes Yes No No No No MST Memory store Source Address Sometimes Yes Yes No Yes No No No MLD Memory load Destination Address Sometimes Yes Yes Yes No No No No LIM Load immediate Destination Immediate No No Yes Yes No No No Yes AIM AND immediate Source/dest Immediate No Yes Yes Yes Yes No No Yes CMP Compare Source Immediate No Yes Yes No Yes No No Yes CMA Compare AND Source Immediate No Yes Yes No Yes No No Yes MOV Move Destination Source A No No Yes Yes No Yes Yes No ADD Add Destination Source A Type Source B No Yes Yes Yes No Yes Yes No SUB Subtract Destination Source A Type Source B No Yes Yes Yes No Yes Yes No ADI Add immediate Destination Source A Sign ext. immediate No Yes Yes Yes No Yes No Yes BIT Bitwise logic Destination Source A Type Source B No Yes Yes Yes No Yes Yes No BNT Inverse bitwise logic Destination Source A Type Source B No Yes Yes Yes No Yes Yes No SHF Barrel shift Destination Source A Type Source B No Yes Yes Yes No Yes Yes No SFI Barrel shift immediate Destination Source A Type Immediate No Yes Yes Yes No Yes No Yes MUL Multiply/divide Destination Source A Type Source B Yes Yes No Yes No Yes Yes No UDH User defined hardware Destination Source A Type Source B Yes Yes No Yes No Yes Yes No UDH User defined hardware Destination Source A Type Source B Yes Yes No Yes No Yes Yes No BCT Bit count Destination Source A Type Immediate Yes Yes No Yes No Yes No Yes

---

https://www.righto.com/2021/12/reverse-engineering-tiny-1980s-chip.html Reverse-engineering a tiny 1980s chip that plays Christmas tunes

---

https://ploum.net/the-computer-built-to-last-50-years/ C64 8bit. 6510 (6502 family). 64k memory Amiga 1200 68020 but with 24-bit address bus instead of 32-bit 256 byte L1 icache 2MB memory PowerBook? G4 (from https://lobste.rs/s/rasume/amiga_as_computer_built_last_50_years) PowerPC? L2 cache >=256k >= 128 MB memory

---

"While the 386 design heavily leveraged the logic design of the 286, the 486 was a more radical departure with the move to a fully pipelined design, the integration of a large floating point unit, and the introduction of the first on-chip cache – a whopping 8K byte cache which was a write through cache used for both code and data" -- https://claytonchristensen.com/books/the-innovators-solution/

---

https://bbenchoff.github.io/pages/LinuxDevice.html

---

"Altair BASIC, a 4 KB BASIC interpreter for the Intel 8080-based MITS Altair 8800, which, despite all its other limitations, included a 32 bit floating point library.

.. In 1976, MOS Technology launched the KIM-1, an evaluation board based around the new 6502 CPU from the same company. Microsoft converted their BASIC for the Intel 8080 to run on the 6502, keeping both the architecture of the interpreter and its data structures the same, and created two versions: an 8 KB version with a 32 bit floating point library (6 digits), and a 9 KB system with 40 bit floating point support (9 digits).

Some sources claim that, while BASIC for the 8080 was 8 KB in size, Microsoft just couldn’t fit BASIC 6502 into 8 KB, while other sources claim there was an 8KB version for the 6502. The truth is somewhere in the middle. The BASIC ROMs of the Ohio Scientific Model 500/600 (KIM-like microcomputer kits from 1977/1978) and the Compukit UK101 were indeed 8 KB in size, but unlike the 8080 version, it didn’t leave enough room for the machine-specific I/O code that had to be added by the OEM, so these machines required an extra ROM chip containing this I/O code.

In 1977, Microsoft changed the 6 digit floating point code to support 9 digits and included actual error stings instead of two-character codes, while leaving everything else unchanged. A 6502 machine with BASIC in ROM needed more than 8 KB anyway, why not make it a little bigger to add extra features. The 6 digit math code was still an assembly time option; the 1981 Atari Microsoft BASIC used that code.

... The 1976 Apple I was the first system besides the KIM to use the MOS 6502 CPU, but Steve Wozniak wrote his own 4KB BASIC interpreter instead of licensing Microsoft’s. An enhanced version of Woz’ “Integer BASIC” came in the ROM of the Apple II in 1977; Microsoft BASIC (called “AppleSoft”) was available as an option on tape. On the Apple II Plus (1978), AppleSoft? II replaced Integer BASIC.

" -- [5]

---

" There is very little code in Another World. The original Amiga version was reportedly 6,000 lines of assembly[2]. The PC DOS executable is only 20 KiB?. Surprising for such a vast game which shipped on a single 1.44 MiB? floppy. That is because most of the business logic is implemented via bytecode. The Another World executable is in fact a virtual machine host which reads and executes uint8_t opcodes.

Another World VM defines 256 variables, 64 threads, 29 opcodes, and four framebuffers[3]. ... The virtual machine's graphic system uses a coordinate system of 320x200 with 16 palette-based colors. The color limitation may be surprising given that the development platform, the Amiga 500, supported up to 32 colors. This choice was a sweet spot allowing the graphics to be compatible with the other big platform of the era, the Atari ST which supports only 16 colors. " -- [6]

--- https://www.lynx.com/embedded-systems-learning-center/most-popular-real-time-operating-systems-rtos

---

https://www.embedded.com/wp-content/uploads/2019/11/EETimes_Embedded_2019_Embedded_Markets_Study.pdf

32-bit main processors were the most popular in this 2019 survey most popular 32-bit ISA was ARM most popular 16-bit ISAs were TI MSP430, Microchip PIC24 / dsPIC most popular 8-bit ISAs were Atmel/Microchip AVR, Microchip PIC Texas Instruments/TI, STMicroelectronics, Microchip/Atmel, NXP/Freescale were the most popular vendors in this 2019 survey Linux and FreeRTOS? were the most popular embedded OSs

---

"MSP430TM has a 20-bit address space; FSF MSP430 GCC has a 20b pointer type." -- https://www.avrfreaks.net/forum/split-avr-vs-ti-msp430

"The MSP430F1611 as used in the current products has the following spec:

This is why I was looking at say the AVR Mega128 or XMega128. "

---

https://www-hackster-io.cdn.ampproject.org/v/s/www.hackster.io/news/lilygo-launches-sub-8-risc-v-based-esp32-c3-t-zigbee-wireless-development-board-2d3574a46a31.amp?amp_gsa=1&amp_js_v=a9&usqp=mq331AQIKAGwASCAAgM%3D#amp_tf=From%20%251%24s&aoh=16513493395537&csi=0&referrer=https%3A%2F%2Fwww.google.com&ampshare=https%3A%2F%2Fwww.hackster.io%2Fnews%2Flilygo-launches-sub-8-risc-v-based-esp32-c3-t-zigbee-wireless-development-board-2d3574a46a31

" Chinese embedded electronics specialist LILYGO has launched a low-cost development board boasting Wi-Fi, Bluetooth 5.0 Low Energy, and Zigbee 3.0 connectivity — and has priced the T-Zigbee at under $8.

The new board, brought to our attention by CNX Software, is based on a gumstick form-factor with breadboard-friendly headers and a USB Type-C connector for power and data. Its primary processor is an Espressif ESP32-C3 system-on-chip, which includes a 32-bit RISC-V core running at up to 160MHz alongside 400kB of static RAM (SRAM) plus Wi-Fi 4 and Bluetooth 5.0 Low Energy (BLE) radios.

That's not the only controller on the board, however: Alongside the ESP32-C3 is a Telink TLSR8258 multi-protocol radio controller, which offers support for Zigbee 3.0 connectivity along with RF4CE, Thread, 6LoWPAN?, HomeKit?, and ANT, plus Bluetooth Mesh capabilities. THe two radios are brought out to two separate PCB antennas, with U.FL connectors for optional external antennas.

In total, the board offers 21 general-purpose input/output (GPIO) pins split between the two processors, with four analog inputs, three UART, two I2C, two SPI, and I2S buses. Three user-controllable LEDs are included in the board, along with reset and a single user-addressable button — while a hardware DIP switch flips the USB UART connection between the ESP32-C3 and the TLSR2858. "

---

https://www.ksl.com/article/50411881/tiny-robotic-crabs-engineers-invent-the-worlds-smallest-remote-controlled-walking-robots

--- cortex m0+ 264 KB SRAM in six independent banks

---

https://en.wikipedia.org/wiki/RP2040

---

that lego-sized screen and driver:

"Lots of people have asked, so: processor is a STM32F030F4P6 (Cortex M0, 16K flash, 4K ram), screen is a QT1306P82 (0.42" OLED)." -- https://twitter.com/ancient_james/status/1534002794726031360

---

"A large proportion of microcontroller models available on the market still have less than 64 kB of flash and less than 2 kB of RAM...All the microcontrollers I’ve worked with in my career as a firmware engineer have had ≤ 16kB RAM...Microvium compiled with the same settings compiles to just 8.5kB....Microvium takes 54 bytes for the kernel state." -- [7]

---

"$14.95 smart lamp(Opens in a new window) from Ikea that features an ARM-based Cortex M33 processor with “96 + 12 kB of RAM,” or just enough to run the first level of Doom" -- https://www.pcmag.com/news/you-can-run-doom-on-a-chip-from-a-15-ikea-smart-lamp

---

https://www.raspberrypi.com/news/raspberry-pi-pico-w-your-6-iot-platform/

---

https://dmitry.gr/?r=05.Projects&proj=33.%20LinuxCard

e used an "ATSAMD21 series chip, specifically the ATSAMDA1E16"

this seems to have 64k of program memory and 8k of data memory

---

6 rpi cluster

https://www.jeffgeerling.com/blog/2022/6-raspberry-pis-6-ssds-on-mini-itx-motherboard $200

---

georgeecollins 4 months ago

next [–]

I thought it was really interesting that they fixated on 6502 NES and then Raspberry PI. If you were going to make something that hoped would work anywhere anytime including far into an unknown future, I think Raspberry PI is the platform you would work on. Nxn looks really cool.

pandakar 4 months ago

prev [–]

Interesting talk by Hundred Rabbits where they discuss how their lives and development practices have changed since living on a sailboat.

about:

https://framatube.org/w/417a094e-bb7f-4000-9df1-30859c2fda67

This keynote talk is titled "Software Doldrums," as presented by Rek and Devine of Hundred Rabbits.

Hundred Rabbits is a small artist collective consisting of Rek, a writer and cartoonist, and Devine, a programmer, artist, and musician. They travel the globe together with their sailboat named "Pino" while creating and adapting, among other things, software to fit their needs.

This talk, which was featured as a keynote at LibrePlanet? 2022, is about the dangers and shortcomings of relying on always-online proprietary platforms. Hundred Rabbits will share how they reimagined their software to encourage the reuse, repair and maintenance of existing hardware.

https://git.sr.ht/~rabbits/libreplanet2022 https://git.sr.ht/~rabbits/libreplanet2022/tree/master/item/slides

---

https://news.ycombinator.com/item?id=30615959 Pockit: A tiny, powerful, modular computer [video]

---

the Ja1, which is an open-source 16-bit chip that runs on an open-source FPGA (Lattice iCE40HX-1k; about $10?) with 8k and runs SwapForth? (which takes 5k of the 8k; see https://excamera.com/sphinx/article-j1a-swapforth.html ) is at: https://github.com/jamesbowman/swapforth/blob/master/j1a/verilog/j1.v

it appears to have a PC, a data stack, and a return stack; each stack appears to have roughly 16 elements

---

https://www.tomshardware.com/reviews/pinecil-v2 Smart Soldering Iron Powered by RISC-V CPU

CPU 32-bit RV32IMAFC RISC-V “SiFive E24 Core” @ 144 MHz RAM 132KB SRAM Storage 192KB "A RISC-V CPU in a Soldering Iron?

Pinecil V1 introduced a 32-bit RV32IMAC RISC-V “Bumblebee Core” CPU running at 108 MHz. This CPU was the brains that worked with IronOS? (the soldering iron’s operating system) to coordinate reading temperature, voltage and motion sensors, all of which are necessary for using the iron. For Pinecil V2, we get a spec bump to a 32-bit RV32IMAFC RISC-V “SiFive E24 Core running at 144 MHz

An optional breakout board, which plugs into the USB C port and passes power through it, provides GPIO pins which can be used to test, debug and create projects using the iron as a controller.

The RISC-V CPU doesn’t really come into play for most users. We turn on the iron, heat it up, and start soldering. "

--- "The NavSpark? Mini is still $36 for a 6-pack, making it one of the cheapest uCs with a floating-point unit. (The ESP32 technically has one, but its performance is inexplicably poor.)" -- [8]

https://www.cnx-software.com/2016/01/04/navspark-mini-is-a-6-arduino-compatible-gps-board/ "MCU – Skytraq Venus828F 32bit LEON3 Sparc-V8 MCU @ 100MHz with IEEE-754 Compliant Floating Point Unit, 1024KB Flash Memory, and 212KB RAM"

---

https://www.thebyteattic.com/p/agon.html

Agon light™... instant-on, BASIC-programmed microcontroller...fully open-source 8-bit microcomputer...in one small, low-cost board,...ez80f92, 24-bit addr bus, 512kb SRAM, 8MB video memory, 640 x 480 pixels, 64 colors simultaneously it says it should be possible to do at about $50, but currently you can get one for about $200 https://lobste.rs/s/qxtbfw/agon_light_low_cost_open_source_8_bit

---

Pebble Original & Steel Processor: ARM Cortex-M3, up to 80 MHz with 512 KB of on-chip storage all in STMicroelectronics' STM32F205RE6 SoC? (System on a Chip) Memory (RAM): 128 kB (provided on the SoC? mentioned above). The RAM is split between the system (84 kB), background worker (12 kB), and the currently running app (32 kB). Only 24 kB of the 32 kB is directly usable by the app developer, the other 8 kB is for things like the app framebuffer that the app has access to but doesn't use directly. System storage (micro flash ROM): 512 kB (provided on the SoC? mentioned above). This is used for storing the bootloader and firmware. It is directly memory mapped, allowing code to be executed in place, without requiring it to be copied first to RAM.

---

https://dercuano.github.io/notes/stm32.html

---

https://derctuo.github.io/notes/minimal-cost-computer.html

8 KiB? RAM

---

https://tomu.im/

https://tomu.im/qomu.html I’m Qomu, an Arm MCU + eFPGA SoC? in your USB port! I have an Arm Cortex M4F MCU with 512 kilobytes of RAM, and ~1K LUT4 FPGA logic cells for a soft USB core, offloading CPU-intensive functions from the Arm core, and even a couple of small RISC-V cores. I have four contact pads that can easily be used to make two buttons. And just like my siblings, I have an RGB LED, because everyone loves blinking an LED any color of the rainbow!

Qomu Zephyr OS


keewee7 4 days ago

next [–]

It's interesting how the ESP32 has become the de facto IoT? MCU used in almost all new IoT? products.

Other companies like TI and STMicro had their own cheap WiFi?/BLE MCUs but their devkits used to be too expensive for hobbyists and students. But the Chinese startup behind the ESPxx chips made sure that their devkits were cheap enough for hobbyists and students and through that influence they now also dominate the professional industry.

reply

ilyt 3 days ago

parent next [–]

It's not just devkits being cheap (there have been plenty of knockoff cheap ST boards) its

Even now in middle of chipaggedon you can get WIFI/BT breakout board for like $3-5. That's not "just" "cheap enough for hobbyist breadboard", that's cheap enough to get the whole product built around.

I think partly due to how cheap WiFi?/BT has become the likes of Zigbee didn't got as popular as they probably should, between licensing costs it was just cheaper for IoT? companies to not touch it, even if it would make more sense in places.

reply

phpisthebest 3 days ago

root parent next [–]

in IoT? wifi became commerically successful over zigbee for 2 reasons, the cost was not one of them IMO

1. Vendors want Cloud control over the product, they want it to phone home. Zigbee is built for Local Control

2. Consumers prefer "hub free" products that just "work", They are willing to give up privacy, security and possible performance issues for that convenience. They do not think or care about what will happen when their shitty comcast service breaks for the 100th time this year, or if the IoT? vendor is hacked, they want to pull out of the box, and have an app connect it in 2 seconds via a QR Code so they can blink the light quickly and not have a fuss about with a hub.

WiFi? is better than Zigbee at both of these things. Zigbee is beter for home automation where you want Local Control, and security. Local control and security has been losing that battle though the tide seems to be turning lately (I hope )

reply

--- rpi

https://www.jeffgeerling.com/blog/2023/hyperscale-your-homelab-compute-blade-arrives

---

https://jaycarlson.net/2023/02/04/the-cheapest-flash-microcontroller-you-can-buy-is-actually-an-arm-cortex-m0/ "I was interested in finding the cheapest flash microcontroller LCSC sells, and it turns out it’s not a crusty old 8051 or a PIC16 clone — and it’s not the WCH CH32V003 that the Internet is freaking out about — it’s actually an Arm Cortex-M0+ made by Puya, it’s less than 10 cents in moderate quantities, and it’s awesome." "Puya makes a range of ultra-low-cost Arm Cortex-M0+ parts in the PY32 series" "The F002A is the base model; it clocks in at 24 MHz, has 20K of flash, 3K of SRAM" "The PY32F003 comes in 16K and 32K flash versions (with 2K and 4K of RAM respectively)."

---

https://hackaday.com/2023/04/19/risc-v-supercluster-for-very-low-cost/

---

https://chipsandcheese.com/2023/05/28/arms-cortex-a53-tiny-but-important/

32kb l1 icache + 32kb l1 dcache "The A53 Technical Reference Manual states that the branch predictor has a global history table with 3072 entries...A53 struggles in our testing to recognize a repeating pattern with a period longer than 8 or 12...

Return Prediction For returns, the A53 has an eight-deep return stack."

Indirect branches can go to multiple targets, adding another level of difficulty to branch prediction. According to ARM’s Technical Reference Manual, the A53 has a 256 entry indirect target array. From testing, the A53 can reliably track two targets per branch for up to 64 branches.

... A lot of high performance CPUs employ branch target buffers to cache branch targets, but A53 doesn’t do that. Most of the time, it has to fetch the branch from L1i, decode it, and calculate the target before it knows where the branch goes. This process takes three cycles.

For very tiny loops, A53 has a single entry Branch Target Instruction Cache (BTIC) that holds instruction bytes for two fetch windows. I imagine each fetch window is 8 bytes, because that would correspond to two ARM instructions. The BTIC would probably be a 16 byte buffer in the branch predictor. From testing taken branch latency, we do see BTIC benefits fall off once branches are spaced by 16 bytes or more.

...

To make things even worse, there’s no L3 cache, and 256 KB of L2 is not a lot of capacity for a last level cache " -- https://chipsandcheese.com/2023/05/28/arms-cortex-a53-tiny-but-important/

"To further reduce power and frontend latency, the L1i stores instructions in an intermediate format, moving some decode work to predecode stages before the L1i is filled. That lets ARM simplify the main decode stages, at the cost of using more storage per instruction." "The 36b/40b opcodes stored in the I$ bear a close relation to the 16/32b Thumb encoding." -- https://developer.arm.com/documentation/ka001493/latest

---

https://hackaday.com/2023/08/03/its-snake-in-a-qr-code-but-smaller/

---

apple m1's l1 cache 192+128 KB per core (performance cores) 128+64 KB per core (efficient cores)

---

https://dottedmag.net/blog/libvirt-14-pcie-devices/

---

https://jiristepanovsky.cz/project.php?p=23cpu

---

" This is the Timex m851. It uses an 8-bit Seiko SC188 CPU, has 48KB of ROM, 2KB of RAM and a 42x11 dot matrix main display.

The cpu is designed for ultra-low power operation - a single battery can last 3 years! " -- https://lock.cmpxchg8b.com/timex.html

other mentions in the discussion on that:

adhesive_wombat 10 hours ago

parent prev next [–]

There's an ultra-low power mode in newer TMS430s, and you can use an external ultra-low-power RTC.

From memory, it's under 100nA on paper, including things like the power switch leakage. Crazy stuff.

reply

MrBuddyCasino? 7 hours ago

root parent next [–]

Do you mean the TI MSP430?

reply

WithinReason? 7 hours ago

root parent next [–]

https://en.wikipedia.org/wiki/TI_MSP430

reply

adhesive_wombat 7 hours ago

root parent prev next [–]

Ugh yes! Don't know why I always get that wrong!

reply

zenolove 4 hours ago

parent prev next [–]

The [E0C6S46](https://download.epson-europe.com/pub/electronics-de/asmic/4...) still powers the 1st and 2nd gen Tamagotchi!

4-bit (!) 32KHz MCU with 6,144 words of 12-bit (‼) ROM, 640 words of internal 4-bit RAM, and a 160-word 4-bit frame buffer for the integrated LCD driver (enough for double-buffering)!

The thing is a beauty! I wrote a Typescript emulator for it, a year ago or so, though for whatever reason I haven’t pushed it to GH yet (but I will if anyone’s interested! It can run unmodified Tamagotchi firmware.

reply

triyambakam 3 hours ago

root parent next [–]

Wow, I'd love to see the emulator

reply

---

https://www.copetti.org/writings/consoles/nintendo-3ds/

mentions a 32k l1

---

http://www.technoblogy.com/show?3Z2Y

ram: 2800 Lisp cells (11200 bytes). flash: 16k

---

https://olimex.wordpress.com/2023/04/21/neo6502-open-source-hardware-modern-retro-computer-project/ 65k ram https://hackaday.com/2023/08/31/the-neo6502-is-a-credit-card-sized-retro-computer/

---

https://www.gabotronics.com/products/102/oscilloscope-watch-details.html https://www.tomshardware.com/news/raspberry-pi-pico-oscilloscope ATXMEGA256A3U 256KB Flash, 16KB SRAM, 4KB EEPROM

---

" STMicroelectronics STM8S005K6T6C Microcontroller

No secrets here: the datasheet is readily available.

This member of the STM8 series has an 8-bit CPU, 32KB of flash memory, 2KB of RAM, and the usual assortment of timers, UART, I2C, and SPI interfaces. It’s just what you need for some simple configuration and control of another chip. " -- https://tomverbeure.github.io/2023/11/26/Monoprice-Blackbird-4K-Pro-HDCP-Converter.html

---