proj-oot-lowEndTargets-lowEndTargetsUnsorted

---

https://www.spark.io/

    STM32F103 microcontroller
    ARM Cortex M3 architecture
    32-bit 72Mhz processor
    128KB of Flash, 20KB of RAM

i think that has either no instruction cache or an 4K or 8k one, but not at all sure.

the take-home for us is probably the amounts of flash and RAM. again, would be nice to fit the main interpreter in 16k or less, and that the upper limit is about 64k.

---

The E64G401 Epiphany-IV 64-core 28nm Microprocessor has 32KB local (but shared) memory per core (so 32KB x 64 = 2MB total).

http://en.wikipedia.org/wiki/Adapteva

http://www.adapteva.com/wp-content/uploads/2013/06/e64g401_datasheet_4.13.6.14.pdf

---

woah these are cheap:

https://en.wikipedia.org/wiki/Odroid

i think the Exynos 4412 has 32KB/32KB L1 Cache -- http://malideveloper.arm.com/develop-for-mali/development-platforms/hardkernel-odroid-u2-development-platform/

---

http://linuxgizmos.com/intel-unveils-tiny-x86-minnowboard-max-open-sbc/

Raspberry Pi: $25/$35 BeagleBone? Black: $45 MinnowBoard? SBC: $99

" tdicola 13 hours ago

link

This looks neat for people that want a cheap board to hack on embedded Linux. However for serious control of signal generation, acquisition, PWM, servos, etc. you really don't want to be running a multitasking OS. Something like the Beaglebone Black, with its dedicated 200mhz programmable units in addition to embedded Linux, is much more interesting for hackers and makers IMHO.

reply "

" stonemetal 6 hours ago

link

PRU-> programmable real time unit

BBB-> BeagleBone? Black

The BBB has an extra dual core processor that runs at 200Mhz. It is interesting because it is like the processor they teach you about in your intro to computer architecture classes, every instruction is a single cycle instruction. Since it is a co-processor(not running an OS but controllable from the BBB's OS) and execution of instructions is deterministic, it is a good choice for running hard real time code. "

" ah- 13 hours ago

link

I wouldn't call the minnowboard a microcontroller, it's more similar to other single board computers like the Pandaboard and the odroid boards. And 2GB are already common for such boards, so 4GB are really not far off.

reply "

"

outside1234 6 hours ago

link

Does anyone know how the performance on something like this stacks up to something like the Raspberry Pi?

reply

wmf 5 hours ago

link

A 1.4 GHz Silvermont must be many times faster than a 700 MHz ARM11.

reply "

"

kqr2 14 hours ago

link

Intel also has the Galileo board which is hardware and software pin-compatible with shields designed for the Arduino Uno* R3.

http://www.intel.com/content/www/us/en/intelligent-systems/g...

reply

makomk 11 hours ago

link

The Galileo's one of those boards where it's very important to pay attention to the fine print. For example, the GPIO controller is hanging off a relatively slow I2C port, so access to GPIO is much, much slower than even the lowest-end Arduino. Also, it's a modified 486 which takes multiple clock cycles to carry out many instructions that are single-cycle on modern ARM, so it's not as fast at arithmetic as the clock speed would suggest.

reply

tdicola 14 hours ago

link

Be careful though, the Galileo emulates AVR code and is orders of magnitude slower than a real Arduino. Don't expect to pick up any shield and make it work, unfortunately.

reply

jpwright 3 hours ago

link

The Galileo actually only emulates a subset of the Arduino libraries. The AVR libraries themselves are, for the most part, not supported. This makes many popular libraries unusable even when hardware is not an issue.

reply "

" elnate 14 hours ago

link

How does this (note: the MinnowBoard? SBC) compare to a Raspberry Pi?

reply

vonmoltke 9 hours ago

link

Comparing the $99 version to the B ($35):

Overall, probably worth the extra cost if you need the power and features. The question is, who does? I'm considering this for no other reason than I want a board in this form factor and power class that has SATA and PCIe.

reply

nullc 6 hours ago

link

The RPI is really obscenely slow, far slower than the clock rate would suggest even for an arm. The RPI is pretty exciting as a microcontroller, though it's power usage is very high, but as a computer it's a real disappointment.

The real comparison should be with the odroid boards: http://hardkernel.com/main/products/prdt_info.php?g_code=G13... a quad arm (cortex-a9) at 1.7GHz with 2GB ram for ~$60.

reply "

--

--

--

(i already read this): http://www.digikey.com/en/articles/techzone/2012/jun/low-power-16-bit-mcus-expand-the-application-space-between-8--and-32-bit-options

---

a picture while discussing L4 cache in Crystalwell's eDRAM:

http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/3

so memory hierarchy jumps after 32k, 256k, 4M. Also, the text notes that both Intel and Microsoft found that 32M was a good amount of eDRAM to have.

---

https://en.wikipedia.org/wiki/Calxeda apparently had these manycore building block product:

"In March 2011 Calxeda announced a 480-core server in development, consisting of 120 quad-core ARM Cortex-A9 CPUs.[3][4][5] .. EnergyCore? ECX-1000, featuring four 32-bit ARMv7 Cortex-A9 CPU cores operating at 1.1–1.4 GHz, 32 KB L1 I-cache and 32 KB L1 D-cache per core, 4 MB shared L2 cache, 1.5 W per processor, 5 W per server node including 4 GB of DDR3 DRAM, 0.5 W when idle.[8][9] Each chip included five 10 gigabit Ethernet ports. Four chips are carried on each EnergyCard?.[8] "

Tilera's TILE-Gx8072 with 72 processors has

" Seventy-two cores operating at frequencies up to 1.2 GHz • 64-bit architecture (datapath and address)

...

32 KB L1 instruction cache and 32 KB L1 data cache per core • 256 KB L2 cache per core

"

---

http://www.realworldtech.com/haswell-cpu/2/ says the Haswell and Sandy Bridge front end includes a 1.5K "L0" uop cache, in front of the 32k L1 icache.

i guess that's about the same as a 32k icache, if you assume that there's about 1 uop per 16 bytes? But it's probably more like 1 uop per 6 bytes.

--

some 'matchbox pcs':

http://matchboxpc.thydzik.com/ https://en.wikipedia.org/wiki/Geode_%28processor%29#Geode_GXLV x86 processor 16k unified L1 cache

tiqit (pratt's): used one of these (they call it a '486sx'): http://www.cpu-world.com/CPUs/ElanSC400/ and they reference the expired page http://www.amd.com/products/lpd/techdocs/e86/21030.pdf which is probably http://support.amd.com/TechDocs/21030.pdf . Section 3.4 of that reference manual says the system has an 8k unified L1 cache (and no L2 cache) and uses a 486 instruction set at 100 mhz with no floating point unit.

http://www.pcworld.com/article/2044279/16-small-but-powerful-matchbox-pcs.html 16 small but powerful matchbox PCs By Serdar Yegulalp, Computerworld, Computerworld Jul 13, 2013 9:00 AM tiny PCs that even have their own keyboards like the Qi Ben NanoNote?

..." education-oriented Raspberry Pi ($35)

hobbyist-and-manufacturing-oriented Gumstix Overo series (from $99-$229).

hacker-friendly BeagleBone? Black ($44.95). These are three of the most popular devices in this category.

Other devices have surfaced in the wake of the success of the Raspberry Pi and its peers, each a variant on the theme.

Clockwise, starting at top left: The Gooseberry (about $62) is a repurposed printed circuit board assembly originally developed for tablets rather than an original design like the Pi, but no less use useful for that.

The Rascal Micro ($199) eschews video connectivity in favor of networking, so it can be used as a miniature headless system for controlling other devices.

And the PandaBoard? (and PandaBoard? ES, its successor), at around $175, is pricier than the Pi; it sports a few more connectors and slightly more expandability "

(by ..."' i mean 'contains mostly quotes but with much ellipsis and perhaps even paraphrasing')

$89 Korean-made Odroid U2 (left) packs in a Exynos4412 Prime ARM Cortex-A9 quad-core processor, much faster than the Pi's ARM-powered Broadcom SoC?.

Another board that's used widely in automation projects is the Arduino (right), now available in a whole cornucopia of editions.

The emphasis here isn't on power or speed, though: The Arduino Uno ($55 for the bare board, $60 for a retail box version), shown here, sports an 8-bit RISC processor running at a mere 80MHz (an Intel Core i7 runs around 3GHz).

Boxed up and ready to go

Many matchbox systems come as a bare board, for which you have to supply your own case. These units, on the other hand, come packaged in a case of some kind, courtesy of the manufacturer. They are often used as mini-media centers.

Clockwise, starting at top left: The Cotton Candy ($199) and Rikomagic (about $86) both run Android, while the CuBox? ($119) has additional hobbyist-friendly features, such as a recovery mode that prevents it from being bricked by mistake.

Almost a full PC

These built-up matchbox systems offer a little more breathing room.

Clockwise from top left: The Trim-Slice H packs not only an ARM Cortex-A9 processor and an NVIDIA Tegra 2 chipset but a 2.5-inch SATA hard disk into a fanless case. Prices start at $279, with developer kits available at $175.

The folks at Cappuccino PC build full-blown Intel systems (Atom or Core, your choice); the fanless SlimPro? SP675FP, shown here, measures 10 in. on its longest side and sells for $685.

CompuLab?'s fit-PC3, which starts at $275 with minimal configuration, uses a dual-core 64-bit AMD processor with a 2.5-in. hard disk and a Radeon HD 6250 or 6320 GPU.

Keyboard included

Some even come with a keyboard.

Clockwise, from top left: The Ben NanoNote? runs its own custom build of OpenWrt?, the Jlime distribution or anything else you can get to run on its 336MHz MIPS processor. Only 1,500 pre-manufactured units were made, but the hardware design is available as an open project.

Next up in size, the OpenPandora? (starting at $479), is billed as a mixture of PC and gaming console and is only a little larger than the Nintendo DS.

The Gecko Surfboard ($119) packs an Intel-powered system into a standard-sized keyboard but only uses 5 watts -- hearkening back to the everything-in-the-keyboard design of the Commodore 64/128.


education-oriented Raspberry Pi ($35) 32k unified l1 cache (eg like 16k icache)

hobbyist-and-manufacturing-oriented Gumstix Overo series (from $99-$229). 16k unified cache? ( https://pixhawk.ethz.ch/omap/start )

hacker-friendly BeagleBone? Black ($44.95) 32K/32K L1 cache

The Gooseberry (about $62) Allwinner A10 ARM Cortex-A8 (32+32 L1 cache, 512k L2 cache), Mali 400 graphics

The Rascal Micro ($199) AT91SAM9G20B-CU ? 400 MHz ARM (ARM926EJ-S), 32+32 l1 cache

PandaBoard? $175 TI OMAP4430 dual-core ARM Cortex-A9 CPU, with two ARM Cortex-M3 cores, 32+32k l1 cache

$89 Odroid U2 (left) packs in a Exynos4412 Prime ARM Cortex-A9 quad-core processor 32KB/32KB L1 Cache

Arduino Uno ($55 " The Arduino UNO has only 32K bytes of Flash memory and 2K bytes of SRAM" (and no cache?) https://learn.adafruit.com/memories-of-an-arduino/arduino-memory-architecture

Cotton Candy ($199) 1.2 GHz Exynos 4210 ( ARM Cortex-A9 (32+32k l1 cache) with 1MB L2 cache), Mali 400 graphics

Rikomagic (about $86) 32k+32k ( http://complete-concrete-concise.com/blog/raspberry-pi-and-the-mk802-a-side-by-side-comparison )

CuBox? ($119) Marvell Armada 510 (88AP510) SoC? with ARM v6/v7 (32/32 l1 cache?)

Gecko Surfboard ($119) https://en.wikipedia.org/wiki/Vortex86 16+16k l1 cache

intel galileo 16 KB L1 cache ( http://www.mouser.com/applications/open-source-hardware-galileo-pi/ ) "Adruino says it's 400MHz 32-bit Intel® Pentium instruction set architecture (ISA)-compatible processor o 16 KBytes on-die L1 cache which does not say us much: 80846 and Pentium has very little difference from is ISA POV and later models of both had 16 KByte cache thus 80486 looks plausible, too. "

---

http://iqjar.com/jar/an-overview-and-comparison-of-todays-single-board-micro-computers/

---

--

in this blog post is a list of popular embedded systems:

    ARM Cortex-M 	
    AVR
    AVR32
    ColdFire
    HC12
    MSP430
    PIC18
    PIC24/dsPIC 	
    PIC32 (MIPS) 	
    PowerPC 	
    RL78 	
    RX100/600 	
    SH 	
    V850 	
    x86 	

" Third parties offered a wide range of upgrades, for both SX and DX systems. The most popular ones were based on the Cyrix 486DLC/SLC core, which typically offered a substantial speed improvement due to its more efficient instruction pipeline and internal L1 SRAM cache. The cache was usually 1 kB, or sometimes 8 kB in the TI variant. " -- http://en.wikipedia.org/wiki/Intel_80386

---

intel Quark (which is the CPU of Intel Galileo) is said to be an updated 486 (google intel+quark+486 to find remarks like this). It says it uses the Pentium instruction set, but apparently this is similar to the 486 instruction set:

http://en.wikipedia.org/wiki/P5_%28microarchitecture%29#Major_improvements_over_i486_microarchitecture

---

The intel 80286 had a 24-bit address bus and was able to address up to 16 MB of RAM, compared to 1 MB for its predecessor.


Misc

http://www.cs.arizona.edu/~arvind/papers/lctes03.pdf

T able 1: AX Instructions AX Instruction Description

setpred support for predication in 16-bit code

setsbit sets the ’S’ bit to a v oid e xplicit cmp instructions

setsource sets the source re gister for the ne xt instruction

setdest sets the destination re gister for the ne xt instruction

setthird sets the third operand (support 3-address format)

setimm sets the immediate v alue for the ne xt instruction

setshift sets the shift type and amount for the ne xt instruction

setallhigh indicates ne xt instruction uses all high re gister


another libc:

https://github.com/lpsantil/rt0 https://news.ycombinator.com/item?id=8974024

really tiny (claims to be the smallest in the world):

michigan micro mote:

"8-bit CPU, a 52x40-bit DMEM, a 64x10-bit IMEM, a 64x10-bit IROM" [1]

---

	A reimplementation of NetBSD using a Microkernel [video] (youtube.com)
	149 points by agumonkey 2 days ago | flag | 83 comments
	
	
	agumonkey 2 days ago

Youtube video description:

Based on the MINIX 3 microkernel, we have constructed a system that to the user looks a great deal like NetBSD?. It uses pkgsrc, NetBSD? headers and libraries, and passes over 80% of the KYUA tests). However, inside, the system is completely different. At the bottom is a small (about 13,000 lines of code) microkernel that handles interrupts, message passing, low-level scheduling, and hardware related details. Nearly all of the actual operating system, including memory management, the file system(s), paging, and all the device drivers run as user-mode processes protected by the MMU. As a consequence, failures or security issues in one component cannot spread to other ones. In some cases a failed component can be replaced automatically and on the fly, while the system is running, and without user processes noticing it. The talk will discuss the history, goals, technology, and status of the project.

The latest work has been adding live update, making it possible to upgrade to a new version of the operating system WITHOUT a reboot and without running processes even noticing. No other operating system can do this.

The system is built on MINIX 3, a derivative of the original MINIX system, which was intended for education. However, after the original author, Andrew Tanenbaum, received a 2 million euro grant from the Royal Netherlands Academy of Arts and Sciences and a 2.5 million euro grant from the European Research Council, the focus changed to building a highly reliable, secure, fault tolerant operating system, with an emphasis on embedded systems. The code is open source and can be downloaded from www.minix3.org. It runs on the x86 and ARM Cortex V8 (e.g., BeagleBones?). Since 2007, the Website has been visited over 3 million times and the bootable image file has been downloaded over 600,000 times. The talk will discuss the history, goals, technology, and status of the project.

reply

Animats 2 days ago

That's nice, but late. QNX had that 10-15 years ago. With hard real time scheduling, too.

All you really need in a practical microkernel is process management, memory management, timer management, and message passing. (It's possible to have even less in the kernel; L4 moved the copying of messages out of the kernel. Then you have to have shared memory between processes to pass messages, which means the kernel is safe but processes aren't.)

The amusing thing is that Linux, after several decades, now has support for all that. But it also has all the legacy stuff which doesn't use those features. That's why the Linux kernel is insanely huge. The big advantage of a microkernel is that, if you do it right, you don't change it much, if at all. It can even be in ROM. That's quite common with QNX embedded systems.

(If QNX, the company, weren't such a pain... They went from closed source to partially open source (not free, but you could look at some code) to closed source to open source (you could look at the kernel) to closed source. Most of the developers got fed up and quit using it. It's still used; Boston Dynamics' robots use it. If you need hard real time and the problem is too big for something like VxWorks?, QNX is still the way to go.)

reply

vezzy-fnord 2 days ago

QNX is fascinating on its own, but MINIX 3 is still a different project in that its full adoption of a NetBSD? userland will probably make it more useful for generic servers and workstations as well. They also seem to be going much deeper with checkpointing and dynamic upgrades/hot code reloading.

If you need hard real time and the problem is too big for something like VxWorks?, QNX is still the way to go.

There's all sorts of much tinier RTOS like FreeRTOS?, MicroC?/OS and Contiki that are used out there for particularly critical and/or constrained environments.

 nchelluri 1 day ago

When is VxWorks? inappropriate, but QNX appropriate?

EDIT: http://www.embeddedrelated.com/showthread/comp.arch.embedded... says:

> the most fundamental difference between VxWorks? and QNX is as you have described, QNX lends itself to a message passing architecture while VxWorks? lends itself to a shared memory architecture.

>

> My personal opinion is that a message passing architecture is easier to get to grips with and as such is potentially easier to understand and debug.

> However, the majority of software engineers with experience of an embedded RTOS will be very well informed about the Shared Memory architecture.

reply

gte525u 1 day ago

I think it's less of an issue now than say 10 years ago. VxWorks? 6.x added support for protection domains (MPU/MMU support) and RTPs (real time processes). VxWorks? 5 everything operated in the kernel. Even with 6.x very little typically runs by default in user space on a VxWorks? setup.

With respect to the message passing - both support messaging. VxWorks? has several types of message queues - vxworks proprietary msgQLib API, POSIX api etc. QNX has much the same MsgSend?/MsgRecv? which is the microkernel API and POSIX. QNX has an add-on PubSub? middleware that the OP of the usenet group may be thinking of.

reply

saosebastiao 1 day ago

Would this imply no support for mmap(or similar) in qnx? Or just not very optimal to use it?

reply

gte525u 1 day ago

Both support mmap and shared memory - that's why I found the "shared memory" usenet post a little puzzling.

reply

unethical_ban 2 days ago

I'm watching the video now, but are you suggesting that QNX, which is not Free and Open Source, has already accomplished MINIX's stated goals of OS reliability?

I would like to hear Mr. Tanenbaum's answer to the less provocative form of the sentiment: "What design decisions were made with MINIX3 that other RTOS with microkernels didn't consider?"

reply

nickpsecurity 2 days ago

Tanenbaum cited QNX in Round 2 of the microkernel debate between he and Linus. It's had all sorts of great traits for a long time. It also had plenty of development time and a rip-off open source model to give it capabilities. Like Tanenbaum said in his paper, Minix 3 has had a small amount of core developers working on it for a relatively short amount of time. There's no way Minix 3 will trump QNX with such small resources and I doubt they planned to. It's more a start on building something using better engineering principles that might eventually become a great alternative to other UNIX's and Linux.

reply

jacquesm 2 days ago

QnX? achieved Minix's stated goals of OS reliability 20 years ago.

And Minix isn't a micro kernel in the same way that QnX? is.

reply

 jacquesm 1 day ago

You have a much smaller number of bugs because (a) each component is much simpler (b) runs as a separate process and so can be debugged and worked on by mere mortals and (c) works using a well defined interface (message passing) which makes testing and debugging a much simpler affair.

reply

stox 2 days ago

I think UNIX-RTR has met those goals.

reply

pjmlp 1 day ago

Thanks for pointing it out. I wasn't aware of it.

reply

carussell 2 days ago

I took a serious look at MINIX over the winter, and digested several of Tanenbaum's talks around that time. (For anyone wondering if this talk contains anything substantially different from past ones, the answer is no.)

Here are some things to add:

For anyone looking in to maybe starting to work with MINIX, I'd suggest assessing whether or not you would be comfortable striking out and doing things on your own, and then being prepared to do so. With MINIX, you aren't going to find a thriving community that you can just add your piece to, so as to contribute to the effort. You might run into a certain level of that sort of old-guard, paralyzing stop energy, so in a way it's got a lot of the downsides of a greenfield project except with few, if any, of the upsides.

reply

---

Wirth's RISC0 had 8K 32-bit words of program and 8K words for data (Harvard architecture eg code separate from data)

Wirth's RISC5 had 1MB (von Neumann architecture eg code and data in same memory space).

-- http://www.inf.ethz.ch/personal/wirth/FPGA-relatedWork/RISC-Arch.pdf

---

IBM's TrueNorth? chip family, not yet commercially available, is a (weakly) neuromorphic, deterministic integrate-leak-fire artificial neural network simulator. Its claim to fame is its low-power operation.

https://en.wikipedia.org/wiki/TrueNorth http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml

4096 cores of 256 neurons each; 256 synapses per neuron; i think no memory except that contained in the states of the synapses and the states of the neuronal cell bodies; 70mW power consumption

Paper on their 'corelet' programming language. Afaict the only thing it really says is that neural networks can be used as library modules inside other neural networks. Duh. Really i think this is just one of those papers describing a boring but important implementation step, because you have to do that to get credit. Annoyingly afaict their implementation is not actually available online, so there's nothing of interest here for us until they put it up.

http://www.research.ibm.com/software/IBMResearch/multimedia/IJCNN2013.corelet-language.pdf

(the most interested part is when they describe what's in their (non-available) library repository:

" The corelets currently in the Corelet Library include scalar functions, algebraic, logical, and temporal functions, splitters, aggregators, multiplexers, linear filters, kernel convolution (1D, 2D and 3D data), finite-state machines, non-linear filters, recursive spatio-temporal filters, motion detection, optical flow, saliency detectors and attention circuits, color segmentation, a Discrete Fourier Transform, linear and non-linear classifiers, a Restricted Boltzmann Machine, a Liquid State Machine, and more. The corelet abstraction and unified interfaces enable developers to easily replace a library corelet with an alternative implementation without disrupting the rest of the system. " )

paper on their simulator:

http://www.modha.org/blog/SC12/SC2012_Compass.pdf

summary/related work: http://www.artificialbrains.com/darpa-synapse-program#truenorth-compass

article: http://www.extremetech.com/extreme/187612-ibm-cracks-open-a-new-era-of-computing-with-brain-like-chip-4096-cores-1-million-neurons-5-4-billion-transistors

---

ok this is high end but:

http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/high-performance-xeon-phi-coprocessor-brief.pdf http://ark.intel.com/products/family/71840/Intel-Xeon-Phi-Coprocessors#@Server

intel xeon phi parallel coprocessor, ~8-16GB, ~1gHz freq, ~60 cores, ~244 threads (total, not per core; 4 threads per core), ~300 wats TDP

---

in Haskell each thread requires 1kb, so 1 million threads per GB of memory -- https://github.com/Gabriel439/post-rfc/blob/master/sotu.md

"Threads in Haskell are very cheap, and many people won't care about one additional thread. However, each thread comes with a stack, which takes memory. The stack starts off small (1Kb) and grows/shrinks in 32Kb chunks, but if it ever exceeds 1Kb, it never goes below 32Kb. For certain tasks (e.g. Shake build rules) often some operation will take a little over 1Kb in stack. Since each active rule (started but not finished) needs to maintain a stack, and for huge build systems there can be 30K active rules, you can get over 1Gb of stack memory. While stacks and threads are cheap, they aren't free." -- http://neilmitchell.blogspot.com/2014_06_01_archive.html

"Go 1.3 will have a minimum stack size back down at 4 kB. We hope that Go 1.4 will be able to ratchet the minimum stack size down to 1 or 2 kB." – Russ Cox Mar 11 '14 at 21:49 http://stackoverflow.com/questions/22326765/go-memory-consumption-with-many-goroutines#comment33947609_22333024

see also https://github.com/golang/go/issues/7514 , looks like it ended up at 2kb in Go 1.4 (see also https://github.com/golang/go/commit/6c934238c93f8f60775409f1ab410ce9c9ea2357 ): " A consequence is that stacks are no longer segmented, eliminating the "hot split" problem. When a stack limit is reached, a new, larger stack is allocated, all active frames for the goroutine are copied there, and any pointers into the stack are updated. Performance can be noticeably better in some cases and is always more predictable. Details are available in the design document.

The use of contiguous stacks means that stacks can start smaller without triggering performance issues, so the default starting size for a goroutine's stack in 1.4 has been reduced from 8192 bytes to 2048 bytes. " -- https://golang.org/doc/go1.4#runtime

---

https://en.wikipedia.org/wiki/Scratchpad_memory

" Sony's PS2 Emotion Engine employed a 16 KB scratchpad, to and from which DMA transfers could be issued to its GS, and main memory.

... NVIDIA's 8800 GPU running under CUDA provides 16 KB of scratchpad (NVIDIA calls it Shared Memory) per thread-bundle when being used for GPGPU tasks. Scratchpad also was used in later Fermi GPU (GeForce? 400 Series).[5] ... Cache control vs scratchpads

Many architectures such as PowerPC? attempt to avoid the need for cacheline locking or scratchpads through the use of cache control instructions. Marking an area of memory with "Data Cache Block: Zero" (allocating a line but setting its contents to zero instead of loading from main memory) and discarding it after use ('Data Cache Block: Invalidate', signaling that main memory didn't receive any updated data) the cache is made to behave as a scratchpad. ... Shared L2 vs Cell local stores

Regarding interprocessor communication in a multicore setup, there are similarities between the Cell's inter-localstore DMA and a Shared L2 cache setup as in the Intel Core 2 Duo or the Xbox 360's custom powerPC: the L2 cache allows processors to share results without those results having to be committed to main memory. This can be an advantage where the working set for an algorithm encompasses the entirety of the L2 cache. However, when a program is written to take advantage of inter-localstore DMA, the Cell has the benefit of each-other-Local-Store serving the purpose of BOTH the private workspace for a single processor AND the point of sharing between processors; i.e., the other Local Stores are on a similar footing viewed from one processor as the shared L2 cache in a conventional chip. The tradeoff is that of memory wasted in buffering and programming complexity for synchronization, though this would be similar to precached pages in a conventional chip. ... Extending the working set, e.g., a sweet spot for a merge sort where the data fits within 8x256 KB "

https://en.wikipedia.org/wiki/Cell_%28microprocessor%29

" A DMA operation can transfer either a single block area of size up to 16KB, or a list of 2 to 2048 such blocks. ... The PPE contains a 64 KiB? level 1 cache (32 KiB? instruction and a 32 KiB? data) and a 512 KiB? Level 2 cache. ...

With the current generation of the Cell, each SPE contains a 256 KiB? embedded SRAM for instruction and data, called "Local Storage" (not to be mistaken for "Local Memory" in Sony's documents that refer to the VRAM) which is visible to the PPE and can be addressed directly by software. Each SPE can support up to 4 GiB? of local store memory. ... The SPEs contain a 128-bit, 128-entry register file and measures 14.5 mm2 on a 90 nm process. An SPE can operate on sixteen 8-bit integers, eight 16-bit integers, four 32-bit integers, or four single-precision floating-point numbers in a single clock cycle, as well as a memory operation. ... Compared to its personal computer contemporaries, the relatively high overall floating point performance of a Cell processor seemingly dwarfs the abilities of the SIMD unit in CPUs like the Pentium 4 and the Athlon 64. However, comparing only floating point abilities of a system is a one-dimensional and application-specific metric. Unlike a Cell processor, such desktop CPUs are more suited to the general purpose software usually run on personal computers. In addition to executing multiple instructions per clock, processors from Intel and AMD feature branch predictors. The Cell is designed to compensate for this with compiler assistance, in which prepare-to-branch instructions are created. For double-precision floating point operations, as sometimes used in personal computers and often used in scientific computing, Cell performance drops by an order of magnitude, but still reaches 20.8 GFLOPS (1.8 GFLOPS per SPE, 6. GFLOPS per PPE). The PowerXCell? 8i variant, which was specifically designed for double-precision, reaches 102.4 GFLOPS in double-precision calculations.[36]

Tests by IBM show that the SPEs can reach 98% of their theoretical peak performance running optimized parallel matrix multiplication.[29] ... The EIB is a communication bus internal to the Cell processor which connects the various on-chip system elements: the PPE processor, the memory controller (MIC), the eight SPE coprocessors, and two off-chip I/O interfaces, for a total of 12 participants in the PS3 (the number of SPU can vary in industrial applications). The EIB also includes an arbitration unit which functions as a set of traffic lights. In some documents IBM refers to EIB participants as 'units'.

The EIB is presently implemented as a circular ring consisting of four 16 bytes wide unidirectional channels which counter-rotate in pairs. When traffic patterns permit, each channel can convey up to three transactions concurrently. As the EIB runs at half the system clock rate the effective channel rate is 16 bytes every two system clocks. At maximum concurrency, with three active transactions on each of the four rings, the peak instantaneous EIB bandwidth is 96 bytes per clock (12 concurrent transactions * 16 bytes wide / 2 system clocks per transfer). While this figure is often quoted in IBM literature it is unrealistic to simply scale this number by processor clock speed. The arbitration unit imposes additional constraints which are discussed in the Bandwidth Assessment section below. "

---

this is used for a flickering candle:

http://www.microchip.com/wwwproducts/Devices.aspx?product=PIC10F200

.375K ROM, 16 bytes in RAM

and the same blog post also suggested one of these:

http://www.atmel.com/products/microcontrollers/avr/tinyavr.aspx

he didn't say which one, but the smallest one is 0.5K (and 8-bit of course)

---

the Apollo Guidance Computer had 12K of 16-bit words of ROM, and 1k of 16-bit words of RAM

---

"If you’re not familiar with Funcards, they’re basically standard AT90S8515 AVR microcontrollers in smartcard format." -- https://www.makomk.com/2010/02/04/arduino-based-funcard-programmer/

http://www.atmel.com/images/doc0841.pdf 8k ROM, 512 bytes of RAM

" AtMega? Card (Funcard) SmartCard? Programming & Fuse Setup

I recently got an Atmel AtMega?163-based smartcard" -- http://colinoflynn.com/2012/09/atmega-card-funcard-smartcard-programming-fuse-setup-2/

http://www.atmel.com/Images/doc1142.pdf 16k flash, 1k RAM

---

novena laptops ($2100 including 240G SSD) use Freescale I.MX6, a SOC which:

https://en.wikipedia.org/wiki/I.MX#i.MX6x_series

https://en.wikipedia.org/wiki/Novena_%28computing_platform%29

http://www.freescale.com/products/arm-processors/i.mx-applications-processors-based-on-arm-cores/i.mx-6-processors/i.mx6qp/i.mx-6quad-processors-high-performance-3d-graphics-hd-video-arm-cortex-a9-core:i.MX6Q?tab=Documentation_Tab&pspll=1&SelectedAsset=Documentation&ProdMetaId=PID/DC/i.MX6Q&fromPSP=true&assetLockedForNavigation=true&componentId=2&leftNavCode=1&pageSize=25&Documentation=Documentation/00610Ksd1nd%60%60Data%20Sheets&fpsp=1&linkline=Data%20Sheets

http://spectrum.ieee.org/consumer-electronics/portable-devices/novena-a-laptop-with-no-secrets

http://cache.freescale.com/files/32bit/doc/data_sheet/IMX6DQIEC.pdf

https://www.crowdsupply.com/sutajio-kosagi/novena

the Novena also has a GPU and an FPGA and a CAAM:

http://www.cnx-software.com/2013/01/19/gpus-comparison-arm-mali-vs-vivante-gcxxx-vs-powervr-sgx-vs-nvidia-geforce-ulp/

The computing power of the Vivante's GC2000 GPU in i.mx6q

---

mafuyu 4 hours ago

ARM isn't open sourcing anything anytime soon. Take a look at OpenRISC? and RISC-V. They're aiming at a fully open source SoC? implementation in silicon.

http://openrisc.io/

http://riscv.org/

reply

agumonkey 4 hours ago

But IIRC (some blog benchmark) they are very very slow.

reply

zhemao 2 hours ago

Also, the benchmarks we did a while ago on a taped-out chip with our in-order RV64 core, Rocket, showed that it compared quite favorably to an ARM Cortex A5.

http://riscv.org/download.html#tab_rocket_core

Unfortunately, it would be quite difficult for outside organizations to replicate these measurements unless they can pay TSMC for a fab run.

reply

zhemao 2 hours ago

(Disclaimer: I am a PhD? student in the Berkeley computer architecture group, which designs RISC-V)

An ISA can't be "fast" or "slow". It's just a specification. There's no reason you can't build a RISC-V core that's just as fast as an ARM or x86 core. The only reason we haven't done so is because we don't have access to the modern fabrication technologies that Intel and commercial ARM licensees use.

reply

theresistor 1 hour ago

Instruction sets very much can be fast or slow, at least in the context of discussing specific use cases. Many of the inner loops that take up most of the active cycles on a CPU today (crypto, compression, imaging, signal processing, linear algebra) have very specific code patterns that are can be targeted with specialized instructions that provide integer-multiple reductions in instruction count.

Let's assume that application-targeted instructions can reduce the size of the inner loops in these applications by 2x. Even the RISCiest cores do not, in practice, run at 2x the clock speed or 2x the issue rate of cores with application-targeted instructions. Thus, ISAs with baked in support for these use-case accelerating instructions will be more performant.

The RISCy core probably wins on mm^2, but a perf/mm^2 analysis will be highly dependent on how well designed the application-specific instructions are for area conservation.

reply

---

nickpsecurity 4 hours ago

I know. I'm just assuming FPGA is trusted in TCB, esp with DMA, plus implying many people will want to use it. Your assessment of open HW is accurate. Far as FPGA's, there's progress on several fronts. Some for you to check out that your people might even consider using given the continual payoff of a FPGA w/out high unit costs.

Open-source FPGA architecture at 45nm http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-43...

Open-source bitstream generation for Xilinx w/out EULA violation or R.E. (!?) http://www.isi.edu/~nsteiner/publications/soni-2013-bitstrea...

Open tools for Lattice http://www.clifford.at/icestorm/

Open-source HW synthesis flow http://opencircuitdesign.com/qflow/

Note: Cliff is on a winning streak, eh?

I recently did a write-up showing what it could or would take at both ASIC and production levels here:

https://news.ycombinator.com/item?id=10468534

https://news.ycombinator.com/item?id=10468624

So, paths are clear and plenty of potential. Just no uptake. Will take government, corporate, or private sponsors working with academic (low NRE) and professional (experienced) HW designers that open-source stuff as they go. Proprietary, dual-licensed OSS is the only way to go for HW that I know of.

reply

---

kawsper 4 hours ago

People are afraid of the Intel Management Engine, and you can read about it here: http://libreboot.org/faq/#intel

It is a dense and very detailed text, but basically your Intel CPU contains Active Management Technology (AMT) which lets remote users control your computer, which may or may not be what you want, and there might be backdoors hiding here.

It also includes Intel Boot Guard, which prevents users from installing their own firmware (such as libreboot and coreboot) because it needs to be signed with a key from Intel.

The page sums it up like this:

> In summary, the Intel Management Engine and its applications are a backdoor with total access to and control over the rest of the PC. The ME is a threat to freedom, security, and privacy, and the libreboot project strongly recommends avoiding it entirely. Since recent versions of it can't be removed, this means avoiding all recent generations of Intel hardware.

reply

jakeogh 3 hours ago

Also see Joanna Rutkowska's recent paper: https://news.ycombinator.com/item?id=10458318 (fun reading).

reply

---

groby_b 4 hours ago

Small, since that's how they make their money (design licenses)

But there's a completely open ARMv2 clone: http://opencores.org/project,amber

reply

JoachimS? 51 minutes ago

Which is probably the third open source attempt. ARM usually stomps hard on these project. Anybody remember blackARM, nn ARM?

Clones are even worse out. TurboSilicon? was attacked so hard they were thankful to hand all their assets to ARM.

reply

---

nextos 5 hours ago

We need a cheap general purpose Novena-like laptop.

I'd argue a really good place to start from is a custom Rockchip machine (like Asus C201, which is now supported by Libreboot). Add a free GPU (or finish the Lima driver) and we are ready to go.

reply

---

aij 4 hours ago

Why didn't they use an open source CPU? Eg: OpenSPARC?

reply

xobs 1 minute ago

Aside from the fact that we're familiar with Freescale and ARM, many other chips we "could have used" are unobtanium. While it's true that there is a source-level implementation, I can't find any T1 or T2 parts available for purchase. It's the same reason why we went with an A9 instead of an A15 or a 64-bit chip: You just can't buy them unless you're a big company. And if a small two-person company can't buy them, how can we claim it's open source hardware if you can't buy them either?

The other possibility would be to fab a chip ourselves, but that's a whole other order of magnitude in terms of cost and complexity, and the result isn't that great in terms of speed and available peripherals. Plus, when you fab a chip like this a lot of the hardware blocks are IP provided by the chip foundry, e.g. flash controllers and DRAM cells, and those are always closed-source. It just moves the whole thing one turtle down.

reply

---

ASUS Chromebit CS10 SoC? Rockchip RK3288-C 4 x Cortex A17 + Mali T764 RAM 2 GB LPDDR3 NAND 16GB NAND Dimensions / Mass 123 x 31 x 17mm, 75g OS Chrome OS Other Connectivity 2x2 802.11a/b/g/n/ac + BT 4.0, HDMI 1.4, USB 2.0, DC-in Price $85

-- http://www.anandtech.com/show/9797/asus-launches-the-chromebit-cs10-hdmi-stick

---

" Continuing on from the focus on improving visual performance and responsiveness on Android 4.1 "Jelly Bean", the main objective of Android 4.4 was to optimize the platform for better performance on low-end devices, without compromising its overall capabilities and functionality. The initiative was codenamed "Project Svelte", which Android head of engineering Dave Burke joked was a weight loss plan after Jelly Bean's "Project Butter" added "weight" to the OS.[5] To simulate lower-spec devices, Android developers used Nexus 4 devices underclocked to run at a reduced CPU speed with only two cores active, 512 MB memory, and at qHD resolution—specifications meant to represent a "sweet spot" for entry-level devices.[5] "

---

Raspberry Pi Zero $5

" A Broadcom BCM2835 application processor 1GHz ARM11 core (40% faster than Raspberry Pi 1) 512MB of LPDDR2 SDRAM A micro-SD card slot A mini-HDMI socket for 1080p60 video output Micro-USB sockets for data and power An unpopulated 40-pin GPIO header Identical pinout to Model A+/B+/2B An unpopulated composite video header Our smallest ever form factor, at 65mm x 30mm x 5mm "

and zero W ($10, with wifi):

" 1GHz, single-core CPU 512MB RAM Mini-HDMI port Micro-USB On-The-Go port Micro-USB power HAT-compatible 40-pin header Composite video and reset headers CSI camera connector 802.11n wireless LAN Bluetooth 4.0 "

---

" The processor in the (Apple Macbook) charger is a MSP430F2003 ultra low power microcontroller with 1kB of flash and just 128 bytes of RAM....The 68000 microprocessor from the original Apple Macintosh and the 430 microcontroller in the charger aren't directly comparable as they have very different designs and instruction sets. But for a rough comparison, the 68000 is a 16/32 bit processor running at 7.8MHz, while the MSP430 is a 16 bit processor running at 16MHz. The Dhrystone benchmark measures 1.4 MIPS (million instructions per second) for the 68000 and much higher performance of 4.6 MIPS for the MSP430. The MSP430 is designed for low power consumption, using about 1% of the power of the 68000. "

---

$9 Computer Architecture: The Chips That Make C.H.I.P., C.H.I.P.

Speaker: Dave Rauchwerk Next Thing Co. [2]

About the talk:

Features: 1GHz ARM Cortex A8, 512MB RAM, 4GB NAND Flash, WiFi?, Bluetooth.

Completely open source.

This talk will provide a technical overview of the hardware and software system architecture of the world's first $9 computer.

Uses an Allwinner A8 CPU, whic is an ARM Cortex-A8 with 32k icache and 32k dcache (L1) and 256k L2 cache. https://github.com/NextThingCo/CHIP-Hardware/blob/master/CHIP%5Bv1_0%5D/CHIPv1_0-BOM-Datasheets/Allwinner%20R8%20Datasheet%20V1.2.pdf

---

also pocketCHIP which has a CHIP and a keyboard and screen and runs PICO-8 for $50:

https://getchip.com/pages/pocketchip

---

also the chippro (is this different from the above talk? the above talk speaks of an ARM Cortex A8, but this lists v7-A):

$16 " 1GHz ARMv7-A 256MB/512MB DDR3/SLC NAND I2S Audio Dual Mics WiFi? B/G/N & BT4.2 Fully Certified Open Source HW, OS, No NDAs! " "powered by" $6 R8 SoC? + 256MB DDR3

[2]

Mali400 GPU [3]

interestingly, this is recommended over the Raspberry Pi Zero W (and mb other models? not sure i understand) by multiple commentators b/c you can't get the Zero W in quantity [4]

---

ESP8266 "ESP-12E" ("ESP12"?)

https://www.adafruit.com/images/product-files/2471/0A-ESP8266__Datasheet__EN_v4.3.pdf wi-fi TCP/IP stack but also has a 32-bit MCU ("Tensilica L106") with 16-bit with ~36k user-available RAM, and at least 0.5k flash

---

" With the dramatic success of the IBM PC in the early 1980's, it was obvious that there would someday be lisp implementations for personal computers. But the limitations of the early PC's 16 bit processor and its hobbled memory addressing scheme meant that a Lisp running on the PC would be little more than a toy. Lisp did not take off on the PC until the x386 computers with a 32 bit flat addressing space became plentiful in the late 1980’s . But the Macintosh, despite its well-known limitations, used the same Motorola 68000 CPU used in many engineering workstations. The original Macintosh had only 128k bytes of memory, but it this was more than most PC's at the time, and a number of third party memory expansion kits were available in almost immediately. Apple announced their own memory-enhanced Macintosh within a few months, and it was available in the fall of 1984. Seen this way, the Macintosh was not a more powerful PC, but rather a small inexpensive workstation. There was reason to think that it could be used as a Lisp platform. " -- http://basalgangster.macgui.com/RetroMacComputing/The_Long_View/Entries/2013/2/17_Macintosh_Common_Lisp.html

---

ESP8266-based devkit for NodeMCU? Lua system:

http://www.ebay.com/itm/ESP8266-ESP-12-NodeMCU-Lua-WiFi-Internet-Of-Things-Free-Shipping-Arr-1-10-BizDay-/271730851063?pt=LH_DefaultDomain_0&hash=item3f446bbcf7 $15

NodeMCU?: http://nodemcu.com/index_en.html#fr_54745c8bd775ef4b99000011

Arduino-like hardware IO Nodejs style network API Less than $2 WI-FI MCU ESP8266 integrated and esay to prototyping development kit

code examples at http://wayback.archive.org/web/20151231082750/http://www.nodemcu.com/index_en.html#fr_5475f7667976d8501100000f

copied to here:

Connect to the wireless network

print(wifi.sta.getip()) --nil wifi.setmode(wifi.STATION) wifi.sta.config("SSID","password") print(wifi.sta.getip()) --192.168.18.110

Arduino like IO access

pin = 1 gpio.mode(pin,gpio.OUTPUT) gpio.write(pin,gpio.HIGH) gpio.mode(pin,gpio.INPUT) print(gpio.read(pin))

HTTP Client

-- A simple http client conn=net.createConnection(net.TCP, false) conn:on("receive", function(conn, pl) print(pl) end) conn:connect(80,"121.41.33.127") conn:send("GET / HTTP/1.1\r\nHost: www.nodemcu.com\r\n" .."Connection: keep-alive\r\nAccept: */*\r\n\r\n")

HTTP Server

-- a simple http server srv=net.createServer(net.TCP) srv:listen(80,function(conn) conn:on("receive",function(conn,payload) print(payload) conn:send("

Hello, NodeMCU?.

") end) end)

PWM

function led(r,g,b) pwm.setduty(1,r) pwm.setduty(2,g) pwm.setduty(3,b) end pwm.setup(1,500,512) pwm.setup(2,500,512) pwm.setup(3,500,512) pwm.start(1) pwm.start(2) pwm.start(3) led(512,0,0) -- red led(0,0,512) -- blue

Blinking Led

lighton=0 tmr.alarm(0,1000,1,function() if lighton==0 then lighton=1 led(512,512,512) -- 512/1024, 50% duty cycle else lighton=0 led(0,0,0) end end)

Bootstrap

--init.lua will be excuted file.open("init.lua","w") file.writeline([[print("Hello World!")]]) file.close() node.restart() -- this will restart the module.

Use timer to repeat

tmr.alarm(1,5000,1,function() print("alarm 1") end) tmr.alarm(0,1000,1,function() print("alarm 0") end) tmr.alarm(2,2000,1,function() print("alarm 2") end) -- after sometime tmr.stop(0)

A pure lua telnet server

-- a simple telnet server s=net.createServer(net.TCP,180) s:listen(2323,function(c) function s_output(str) if(c~=nil) then c:send(str) end end node.output(s_output, 0) -- re-direct output to function s_ouput. c:on("receive",function(c,l) node.input(l) --like pcall(loadstring(l)), support multiple separate lines end) c:on("disconnection",function(c) node.output(nil) --unregist redirect output function, output goes to serial end) print("Welcome to NodeMCU? world.") end)

Interfacing with sensor

-- read temperature with DS18B20 t=require("ds18b20") t.setup(9) addrs=t.addrs() -- Total DS18B20 numbers, assume it is 2 print(table.getn(addrs)) -- The first DS18B20 print(t.read(addrs[1],t.C)) print(t.read(addrs[1],t.F)) print(t.read(addrs[1],t.K)) -- The second DS18B20 print(t.read(addrs[2],t.C)) print(t.read(addrs[2],t.F)) print(t.read(addrs[2],t.K)) -- Just read print(t.read()) -- Just read as centigrade print(t.read(nil,t.C)) -- Don't forget to release it after use t = nil ds18b20 = nil package.loaded["ds18b20"]=nil

---

"There are three main companies out there making microcontrollers that are neither ancient 8051 clones or ARM devices: TI’s MSP430 series, Microchip and Atmel. " -- http://hackaday.com/2016/01/20/microchip-to-acquire-atmel-for-3-56-billion/

"Together, Microchip and Atmel will be the #3 MCU company in the world (trailing Renesas (OTCPK:RNECY) and NXP Semicondcutors (NASDAQ:NXPI) after its deal for Freescale), and Microchip will have a very fertile opportunity to drive margin synergies." -- http://seekingalpha.com/article/3817736-microchip-technology-atmel-right-match

---

"For years Microchip top management was like mule on bridge not wanting to step ahead :) They were refusing to buy ARM licensee and bet on MIPS and they were missing a lot of sale opportunities with this odd decision. Whatever they do with PIC32 it’s not so successful like the STM32s and LPCs and they miss sales for millions $$$. This is not because MIPS architecture is bad, quite opposite it’s well developed in networking devices, but MIPS Soc from Mediatek running Linux at 400Mhz cost $2 while Microchip sells MIPS PIC32 with no MMU running at 80Mhz for $5-6." -- https://olimex.wordpress.com/2016/01/20/you-guys-will-buy-your-avrs-from-microchip-from-now-on/

" So where is the evidence that open and free tools matter – well, lets have a look at Arduino – you cannot help but notice that the solution to almost every project that needs a micro-controller these days seems to be solved with an Arduino! and that platform has been built around Atmel parts, not Microchip parts. What happened here? With the Microchip parts you have much more choice and the on-board peripherals are generally broader in scope with more options and capabilities, and for the kinds of things that Arduino’s get used for, Microchip parts should have been a more obvious choice, but Atmel parts were used instead – why was that? ... The success of the Arduino platform is undeniable – if you put Arduino in your latest development product name its pretty much a foregone conclusion that you are going to sell it – just look at the frenzy amongst the component distributors and the Chinese dev board makers who are all getting in on the Arduino act, and why is this? well the Arduino platform has made micro-controllers accessible to the masses, and I don’t mean made them easy to buy, I mean made them easy to use for people that would otherwise not be able to set up and use a complex development environment, toolset and language, and the Arduino designers also removed the need to have a special programmer/debugger tool, a simple USB port and a boot-loader means that with just a board and a USB cable and a simple development environment you are up and running which is really excellent. You are not going to do real-time data processing or high speed control systems with an Arduino because of its hardware abstraction but for many other things the Arduino is more than good enough ...

Now this is the part where the product team, executives and the board at Microchip should pay very close attention. I made contact with David Cuartielles who is Assistant Professor at Malmo University in Sweden, but more relevant here is that he is one of the Co-founders of the original Arduino project. I wrote David and asked him…

“I am curious to know what drove the adoption of the Atmel micro controllers for the Arduino platform? I ask that in the context of knowing PIC micro controllers and wondering with the rich on-board peripherals of the PIC family which would have been useful in the Arduino platform why you chose Atmel devices.”

David was very gracious and responded within a couple of hours. He responded with the following statement:

“The decision was simple, despite the fact that -back in 2005- there was no Atmel chip with USB on board unlike the 18F family from Microchip that had native USB on through hole chips, the Atmel compiler was not only free, but open source, what allowed migrating it to all main operating systems at no expense. Cross-platform cross-compilation was key for us, much more than on board peripherals.” ... I am clearly complaining about the crippling of Microchip provided compilers...why do they suppress their developer community with crippled compiler tool software unless you pay large $$$... " -- http://gerrysweeney.com/microchip-pic-chips-could-have-been-the-power-behind-arduino/

various comments in this thread say the same thing: https://www.facepunch.com/showthread.php?t=1502428&p=49577864#post49577864

"although Atmel have similar products like Microchip and even better open source software support, they sales are terrible hard to deal with. Many components prices go unexpected up and down as Atmel production capabilities are humble, once some big customer place large order for one chip they stop making others and this make impossible to use them for serious projects. Once you put AVR in your product it is not unlikely these chips suddenly to go on allocation due to the poor management and planning Altmel has, something which (almost) never happen to Microchip." -- https://olimex.wordpress.com/2016/01/20/you-guys-will-buy-your-avrs-from-microchip-from-now-on/

---

some comments in here suggest that hobbyists are moving away from 8bit PIC and AVR to ARM:

https://www.reddit.com/r/arduino/comments/422vvr/what_does_the_impending_acquisition_of_atmel_by/

also

---

a comment on AVR vs MSP430:

" tptacek 1 day ago

So first a couple caveats: (1) I'm sure AVR is a hell of a let better than PIC, (2) I come at this from a really weird place (exclusively emulators and compilers), and (3) I'm not talking about the AVR parts themselves, which might be more cost-effective for a given project.

That said:

I could pick a bunch more nits that would only really be relevant to someone writing an emulator (complicated instruction decode, IO addressing, &c) but those are my big complaints.

I find MSP430 much more pleasant to work in.

reply "

---

here's a guide to choosing crypto key lengths (this is relevant here b/c i'm obsessed with choosing good native bit widths for computers; EVM's use of 256 bits for partially crypto reasons makes me want to know what the bit width of various crypto systems is; i still like 16 bits though for a VM):

https://www.keylength.com/en/3/

---

the 'data' in each ipfs object is less than 256k

---

ARM cortex a32 (MMU-full but 32-bit)

http://arstechnica.com/gadgets/2016/02/arms-cortex-a32-is-a-tiny-cpu-for-wearables-and-raspberry-pi-like-boards/

---

Core manager (CM):  Extended Xtensa - LX4  Scheduling specific instruction set  32KB for code  64KB for data Processing Elements (PEs)  Xtensa - LX4 from Tensilica (now Cadence)  32KB for code  32KB for data Application Core (App)  570T core from Tensilica (now Cadence)  16KB cache for code  16KB cache for data

2 x 128MB DRAM

http://dsg.uwaterloo.ca/seminars/notes/2014-15/Lehner.pdf

--- " Accordingly, Raspberry Pi 3 is now on sale for $35 (the same price as the existing Raspberry Pi 2), featuring:

    A 1.2GHz 64-bit quad-core ARM Cortex-A53 CPU (~10x the performance of Raspberry Pi 1)
    Integrated 802.11n wireless LAN and Bluetooth 4.1
    Complete compatibility with Raspberry Pi 1 and 2... The 900MHz 32-bit quad-core ARM Cortex-A7 CPU complex has been replaced by a custom-hardened 1.2GHz 64-bit quad-core ARM Cortex-A53. Combining a 33% increase in clock speed with various architectural enhancements, this provides a 50-60% increase in performance in 32-bit mode versus Raspberry Pi 2, or roughly a factor of ten over the original Raspberry Pi. ... VideoCore? IV 3D is the only publicly documented 3D graphics core for ARM-based SoCs?, and we want to make Raspberry Pi more open over time, not less. BCM2837 runs most of the VideoCore? IV subsystem at 400MHz and the 3D core at 300MHz (versus 250MHz for earlier devices). "

"

schappim 8 hours ago

What has changed:

What is the same:

---

"The PC world roughly began in 1975 with the introduction of the MITS Altair 8800, based on INTEL's 1MHz 8080 8-bit microprocessor. "

---

http://atomthreads.com/

"Atomthreads is a free, lightweight, portable, real-time scheduler for embedded systems."

---

https://www.technologyreview.com/s/601263/why-a-chip-thats-bad-at-math-can-help-computers-tackle-harder-problems/

Singular Computing's S1 chip "In a simulated test using software that tracks objects such as cars in video, Singular’s approach was capable of processing frames almost 100 times faster than a conventional processor restricted to doing correct math—while using less than 2 percent as much power...Ask it to add 1 and 1 and you will get answers like 2.01 or 1.98. The Pentagon research agency DARPA funded the creation of Singular’s chip..."

---

in a discussion about radiation-resistant systems:

"What MCU are you using? Mainstream ARM are most likely not suitable at all. Better alternatives would be something like Freescale MPC56, TI TMS570 etc. These have lock-step cores, ECC and lots of error detection and redundancy implemented in hardware."

(tangentially, on designing systems for redundancy in this sort of environment:

https://stackoverflow.com/questions/36827659/compiling-an-application-for-use-in-highly-radioactive-environments

)

---

randyrand 2 days ago

Lots of misunderstanding in this comments section.

Movidius makes low power neural network processors for mobile application. The Myriad V1 is used in google tango and the V2 (what the USB stick has) is used in the new DJI Phantom 4.

http://www.theverge.com/2016/3/16/11242578/movidius-myriad-2...

The Myriad chips are interesting because they combine MIPI camera interface lanes on the same chip as a general purpose NN/CV processor and an SDK suite of hardware accelerated computer vision functions (edge detection, Guassian blur, etc).

here's the white paper for the chip: http://uploads.movidius.com/1441734401-Myriad-2-product-brie...

Because programming these chips essentially requires having the hardware, and because the hardware was very hard to come by, programming these chips was mostly limited to Google, DJI, and other big partners.

With this release the everyday developer has access to these vision processing chips, and the barrier to development entry is considerably lower.

This is not meant to replace your titan X gpu.

reply

revelation 2 days ago

This is their own press release. What does that kind of hardware for CV primitives have to do with deep learning?

(Also, of course, this stick doesn't seem to have any kind of connectivity besides the USB to the host computer. How do I connect my camera? Having to shuffle the data from a camera to the stick passing the host computer somewhat defeats the point.)

reply

krasin 2 days ago

>What does that kind of hardware for CV primitives have to do with deep learning?

They have hardware convolutions on 12 SHAVE cores (kind of a DSP core). It means that the chip can run some useful subset of convolutional neural networks very fast and energy efficient.

They also have 2 general purpose SPARC cores, which allows you to have a "normal" program running there. Not sure, how locked the USB stick going to be, and if running your custom program would be an option.

>How do I connect my camera? The chip itself has a couple of MIPI lanes. The USB stick likely does not expose that. And I agree, that's suboptimal.

reply

---

some stuff mentioned in https://news.ycombinator.com/item?id=11777607 :

"...ESP8266 modules. No golang yet, but Arduino-C, Lua, Micropython, ...

here's micropython on ESP8266: https://github.com/micropython/micropython/releases/tag/v1.8

The ESP8266 is a line of very small (2cm) boards with wifi. They run a 32-bit RISC CPU: Tensilica Xtensa LX106 running at 80 MHz. They don't have an instruction cache [5] but they do have 64 KiB? of instruction RAM (in 2 32k banks, i think [http://www.danielcasner.org/guidelines-for-writing-code-for-the-esp8266/ ]?; this a harvard architecture so the 96k data RAM is separate).

WiPy?: https://www.pycom.io/solutions/py-boards/wipy/ https://www.kickstarter.com/projects/wipy/the-wipy-the-internet-of-things-taken-to-the-next

The WiPy? is a small (5cm) board that runs micropython and has wifi as well as various other interfaces. 256k RAM, CPU (MCU): Texas Instrument CC3200, Cortex-M4 @ 80MHz. Not sure what, if any, L1 cache it has.

http://www.acmesystems.it/arietta (5cm, ARM9 128M RAM; AT91SAM9G25 CPU, 16k icache, 16k dcache)

Arch Linux on ARM list of hardware platforms: https://archlinuxarm.org/platforms/ (lowest RAM in that list: 128GB)

https://wiki.openwrt.org/toh/unbranded/a5-v11 MIPS (MIPS 24KEc, 32-bit, i think MIPS32 release 2 [6] mb see [7]) with 4MB flash, 32MB RAM, 32k icache, 16k dcache

notes on ESP8266 and TI CC3200: https://blog.cesanta.com/esp8266-and-cc3200-how-we-made-them-work-on-our-iot-platform-presentation

---

the Transcend WiFi? SD card apparently uses a ARM926EJ-S (ARMv5) with 32k RAM. This model of ARM can have caches from 4k to 128k [8].

---

" Fermi and Kepler GPUs split 64 KB RAM between L1 and SMEM – Fermi GPUs ( CC 2.x ): 16:48 , 48:16 – Kepler GPUs ( CC 3.x ): 16:48 , 48:16 , 32:32 • Programmer can choose the split: – Default: 16 KB L1, 48 KB SMEM ...

Read-Only Cache An alternative to L1 when accessing DRAM – Also known as texture cache: all texture accesses use this cache

Caching is at 32 B granularity (L1, when caching operates at 128 B granularity)

Aggregate 48 KB per SM: 4 12-KB caches

" -- http://on-demand.gputechconf.com/gtc/2013/presentations/S3466-Programming-Guidelines-GPU-Architecture.pdf

" Constant Cache 8KB cache on each SM " -- http://courses.cms.caltech.edu/cs179/2016_lectures/cs179_2016_lec04.pdf

---

https://davidgf.net/page/41/e-ink-wifi-display

STM32F103ZE ARM CortexM?3 SoC? 64k RAM?

---

TI-85 graphic calculator:

TI-81:

TI-84+:

TI-89: * RAM: 256 KiB? * CPU: Motorola 68000 10 MHz

TI-Nspire:

see also

https://en.wikipedia.org/wiki/Comparison_of_Texas_Instruments_graphing_calculators

which graphing calculator do students use today (mid 2016)?

(afaict students can't just use their phones etc b/c tests don't permit arbitrary electronics) (crazily, some tests use whether the device has a QWERTY keyboard as a criterion, so the TI-89 is allowed and the TI-92 is not [9])

https://www.quora.com/Which-graphing-calculator-is-the-best-for-the-AP-Calculus-BC https://www.quora.com/What-is-the-best-graphing-calculator-for-AP-Calculus-AP-Physics-C-and-SAT-Math-subject-Test http://www.veronaschools.org/Page/790

The following are recommended sometimes but prohibited by some tests [10]:

all these are programmable with TI-BASIC. The TI-nspire is also programmable with Lua. However, TI-BASIC is very different on Z80 and 68k TI calculator models.:

https://en.wikipedia.org/wiki/TI-BASIC

---

http://www.atmel.com/devices/attiny85.aspx CPU: 8-bit AVR RAM: 512 bytes flash: 8k

recc. by a SENSORICA guy for a task that needed something small.

---

this emulator has 64k:

http://www.pcjs.org/devices/pcx86/machine/5150/cga/64kb/donkey/

---

https://en.m.wikipedia.org/wiki/Adapteva

"The Epiphany architecture could accommodate chips with up to 4,096 RISC out-of-order microprocessors, all sharing a single 32-bit flat memory space"

"Each RISC processor in the Epiphany architecture is superscalar with 64× 32-bit unified register file (integer or single precision)"

"Each RISC processor (in current implementations; not fixed in the architecture) has 32 KB of local memory. "

"The memory architecture is unusual in that it doesn't employ explicit hierarchy or hardware caches, similar to the Sony/Toshiba/IBM cell processor, but with the additional benefit of off-chip & inter-core loads & stores being supported - which simplifies porting software to the architecture. It is a hardware implementation of partitioned global address space."

---

OSs:

https://github.com/fuchsia-mirror/magenta/blob/master/docs/mg_and_lk.md https://github.com/littlekernel/lk/wiki

---

kragen 311 days ago [-]

The most memory I've seen on a machine with current hardware and constant latency is 512 kilobytes, like in some of the Atmel ARM SAM D1 chips. Above that, you start getting into off-chip buses‚ which generally have substantially higher latency, even before you get to the extra latencies added by DRAM chips.

---

erlang unikernel that fits on MIPS PIC32 MCU: " To shrink the size of the LING virtual machine a few things were left out (e.g. regular expressions and cryptographic functions) which shrank the resulting image size to about 1M. This meant that LING consumed roughly 50% of Program RAM. The Data RAM was used for heaps of Erlang processes."

---

" Microsoft today revealed a first look at the inside of its Holographic Processing Unit (HPU) chip used in its virtual reality HoloLens? specs.

The secretive HPU is a custom-designed TSMC-fabricated 28nm coprocessor that has 24 Tensilica DSP cores arranged in 12 clusters. It has about 65 million logic gates, 8MB of SRAM, and a layer of 1GB of low-power DDR3 RAM on top, all in a 12mm-by-12mm BGA package. We understand it can perform a trillion calculations a second.

It handles all the environment sensing and other input and output necessary for the virtual-reality goggles. It aggregates data from sensors and processes the wearer's gesture movements, all in hardware so it's faster than the equivalent code running on a general purpose CPU. Each DSP core is given a particular task to focus on.

The unit sits alongside a 14nm Intel Atom x86 Cherry Trail system-on-chip, which has its own 1GB of RAM and runs Windows 10 and apps that take advantage of the immersive noggin-fitted display. "

---

 ericseppanen 22 hours ago [-]

For those who haven't had the pleasure: developing on Tensilica Xtensa cores generally means living within 128-256KB of directly-accessible memory; a windowed register file that makes writing your own exception handlers "interesting"; a 6-year-old GCC bolted to a proprietary backend; per-seat licensing fees to use the compiler; and a corporate owner that's only halfway interested in the ecosystem they now control.

So yeah, kind of wishing it would just die and let ARM take over the embedded space.

reply

_yosefk 18 hours ago [-]

I'm not personally opposed to Tensilica "dying", especially if it doesn't involve Cadence dying, since they are a (somewhat indirect) competitor from my point of view, but ARM is not a substitute for Tensilica. You can't extend its ISA unless you license the arhictecture for $30M. A DSP like Tensilica's is also much more efficient than ARM on a range of tasks, and in particular, having local memory instead of caches is done for a reason. (The least efficient accelerator of all and the favorite of academics who have easy access to it, GPUs, also have this.)

As to their compiler licensing - that's what happens when you develop for a small niche, you get more expensive tools which are worse than the free ones used by the majority. But it doesn't mean that the thing doesn't have its uses. I hear that a recent chip by AMD had 40 Tensilica (smallish, inaccessible to most software) cores.

The same is true about CEVA (which was mentioned in a sister thread), more or less.

reply

---

" International versions of the top-end Android mobiles, which went on sale in March, sport a 14nm FinFET? Exynos 8890 system-on-chip that has four standard 1.6GHz ARM Cortex-A53 cores and four M1 cores running at 2.3 to 2.6GHz. Only two M1 cores are allowed to kick it up to the maximum frequency at any one time to avoid draining batteries and overheating pockets. Each M1 typically consumes less than three watts.

The M1, codenamed Mongoose, was designed from scratch in three years by a team in the US, and it runs 32-bit and 64-bit ARMv8-A code. In benchmarks, the Exynos 8890 SoC? is behind Apple's iPhone 6S A9 chip in terms of single-core performance, but pushes ahead in multi-core tests.

...

A basic branch prediction system works by building an internal table that has a two-bit counter per recently seen branch instruction. If a branch is taken, you add one to its counter up to a maximum of three, and if it isn't, you subtract one until you reach zero. So if a branch's counter is stuck at three then it is regularly taken, and you can anticipate that. If it is sitting on zero, it is rarely taken, so you can ignore it and continue fetching and decoding the following instructions. ... the ((M1's)) branch predictor uses a neural network ...AMD's Jaguar and Bobcat predictors use similar technology...AMD's Zen architect Mike Clark confirmed to us his microarchitecture uses a hashed perceptron system in its branch prediction. "Maybe I should have called it a neural net," he added." "

the M1 has a:

--- http://www.theregister.co.uk/2016/08/22/samsung_m1_core/

---

"By way of information an INTCODE BCPL2 with 16bit word sizes occupies about 20K of INTCODE plus data. CINTCODE is supposed to be more compact so a 16K ROM holding the CINTCODE and the rest in RAM seems completely in order. The interpreter code is much more compact than native assembler in most cases. 86.11.47.92 (talk) 16:43, 18 March 2016 (UTC)"