proj-oot-ootNumerology

powers of 2

---

what bit width for various things?

64-bit is a commonly used width

according to http://en.wikipedia.org/wiki/128-bit , "...there are currently no mainstream general-purpose processors built to operate on 128-bit integers or addresses...".

IPv6 has 128-bit addresses

following that advice to take the current value of something and double it, that suggests that either 256-bit or (2^7 = 128, 2^14 = 16384) 16384-bit is a comfortable maximum.

the number of atoms in the universe is about 10^80, which is around 2^256

this suggests that the max bitwidth should be between 2^8 and 2^256, or between 2^8 and 2^(2^8)

one bit per atom seems a bit much. halving the latter 8, we get 2^(2^4) = 65536 bits, which is 256^2 bits or 8 KiB?. That seems like a very large upper bound already for a bitwidth, when you compare it to 64 bits (8 bytes); it's an increase from 64 bits of a factor of 1024 (2^10); whereas 64 = 2^6, and the increase from 8-bit is only 8.

so 16384 seems better.

or we could even take 2^6 = 64 and double the 6 to get 2^12 = 4096 bits, or 0.5 KiB?.

that seems safe; with a field of 2^12 bits, we could already represent numbers higher than the number of atoms in the universe

if we never need numbers bigger than 2^(2^8), then we never need bit fields bigger than 2^8. so 256-bit (32 bytes) seems safe.

and if we never need numbers bigger than half of 2^(2^8), then 128-bit seems safe. So perhaps IPv6 is already crazy conservative.

so let's choose 128-bit as our max.

--

powers of 2 are special. are there special powers of two? i guess so:

2^2 = 4 2^2^2 = (2^2)^2 = 2^(2^2) = 2^4 = 16 ((2^2)^2)^2 = 2^2^2^2 = 2^8 = 256 2^(2^(2^2)) = 2^16 = 65536 2^(2^(2^(2^2))) = too big (((2^2)^2)^2)^2 = 2^2^2^2^2 = 2^16 = 65536 ((((2^2)^2)^2)^2)^2 = 2^2^2^2^2^2 = 2^32 = 4294967296 (((((2^2)^2)^2)^2)^2)^2 = 2^2^2^2^2^2^2 = 2^64 = 18446744073709551616 etc

so 1,2,4,16,65536 is special and 1,2,4,16,256,65536,2^32,2^64, etc is special

and presumably 1,2,16 (elements 1,2,4 of the first) and 1,2,16,2^64,.. (elements 1,2,4,8 of the second) are extra special

and presumably 2^1, 2^2, 2^4, 2^16 and 2^1, 2^2, 2^4, 2^16, 2^256

so recurring specialness are:

1,2,4,16

and less so: 65536 (2^16)

and less so: 2^8, 2^64, 2^256 (which implies that 8 and 64 and 256 are more special than 32, 128, that is, that 2^3, 2^6, 2^8 are more special than 2^5, 2^7)

searching for 1,2,4,16 on oeis, and ruling out all those that can be immediately seen by me to contain non-powers of 2, we get:

http://oeis.org/search?q=1%2C2%2C4%2C16&sort=&language=&go=Search

A014221 a(n+1) = 2^a(n) with a(0) = 0. This is the Ackermann function A_3(n+1) as defined in the Comments line. +20 52 0, 1, 2, 4, 16, 65536 (list; graph; refs; listen; history; text; internal format) OFFSET

0,3 COMMENTS

Next term has 19729 digits - Benoit Cloitre, Mar 28 2002

A073924 Smallest power of 2 that is greater than the previous term such that every partial sum (n>1) is a prime. +20 7 1, 2, 4, 16, 128, 65536, 9007199254740992, 73786976294838206464, 205688069665150755269371147819668813122841983204197482918576128

A060656 a(n)=2a(n-1)*a(n-2)/a(n-3) with a(0)=a(1)=1. +20 6 1, 1, 2, 4, 16, 64, 512, 4096, 65536, 1048576, 33554432, 1073741824, 68719476736, 4398046511104, 562949953421312, 72057594037927936, 18446744073709551616, 4722366482869645213696, 2417851639229258349412352

A001901 Successive numerators of Wallis's approximation to pi/2 (reduced). +20 4 1, 2, 4, 16, 64, 128, 256, 2048, 16384, 32768, 65536, 262144, 1048576, 2097152, 4194304, 67108864, 1073741824, 2147483648, 4294967296, 17179869184, 68719476736, 137438953472, 274877906944,

A053038 The first (largest) power of 2 arising in the iteration-sequence when A051593(cototient function) is repeatedly applied starting with n!. +20 3 1, 2, 4, 16, 32, 128, 512, 4096, 16384, 2048, 8192, 65536, 16384, 65536, 8388608, 134217728, 8388608, 8388608, 16777216, 2097152, 8388608, 268435456, 4398046511104, 70368744177664, 67108864, 67108864, 67108864, 68719476736 (list; graph; refs; listen; history; text; internal format)

A073923 a(1)=1, a(n) is the smallest power of 2 not included earlier such that sum(k=1,n,a(n)) is prime. +20 3 1, 2, 4, 16, 8, 4096, 32, 16384, 134217728, 256, 33554432, 1073741824, 2147483648, 68719476736, 8192, 262144, 2722258935367507707706996859454145691648, 1152921504606846976, 10889035741470030830827987437816582766592

A171381 Numbers n such that (3^n + 1)/2 is prime. +20 3 1, 2, 4, 16, 32, 64 (list; graph; refs; listen; history; text; internal format)

A174677 a(n) = 2* a(n-2) *a(n-1) with a(1)=1 and a(2)=2. +20 3 1, 2, 4, 16, 128, 4096, 1048576, 8589934592, 18014398509481984, 309485009821345068724781056, 11150372599265311570767859136324180752990208

A218148 a(n) = 2^((6+5*n+n^3)/6). +20 2 1, 2, 4, 16, 256, 32768, 67108864, 4398046511104, 18446744073709551616, 9903520314283042199192993792, 1361129467683753853853498429727072845824, 95780971304118053647396689196894323976171195136475136

A114641 a(n)=2^(a(n-2)+log_{2}(a(n-1))), a(0)=1, a(1)=2. +20 0 1, 2, 4, 16, 256, 16777216, 1942668892225729070919461906823518906642406839052139521251812409738904285205208498176

A179532 a(n)=2^ceiling(n(n+1)/3). +20 0 1, 2, 4, 16, 128, 1024, 16384, 524288, 16777216, 1073741824, 137438953472, 17592186044416, 4503599627370496, 2305843009213693952, 1180591620717411303424, 1208925819614629174706176,

A094384 Determinant of n X n partial Hadamard matrix with coefficient m(i,j) 1<=i,j<=n (see comment). +10 0 1, -2, 4, 16, -32, -128, -512, 4096, -8192, -32768, -131072, 1048576, 4194304, -33554432, 268435456, 4294967296, -8589934592, -34359738368, -137438953472, 1099511627776, 4398046511104,

so the next smallest terms after 16:

2^16, 2^7, 2^6, 2^7, 2^5, 2^3, 2^5, 2^7, 2^8, 2^8, 2^7, 2^5:

histogram: 2^3: 1 2^5: 3 2^6: 1 2^7: 4 2^8: 2 2^16: 1

this seems to contradict the hypothesis that 2^3, 2^6, 2^8 are more special than 2^5, 2^7.

A014221, the Ackermann function, is the only entry in oeis with 1,2,4,16

if we don't start with 1, we also get

A051285 a(n) = a(n-1)^a(n-2), n >= 2. +20 1 2, 2, 4, 16, 65536, 115792089237316195423570985008687907853269984665640564039457584007913129639936

if we search oeis for all sequences starting with 2,4,16, excluding those for which this sequence shows up far from the start, we also get:

A019279 Superperfect numbers: sigma(sigma(n)) = 2n where sigma is the sum-of-divisors function A000203. +20 60 2, 4, 16, 64, 4096, 65536, 262144, 1073741824, 1152921504606846976, 309485009821345068724781056, 81129638414606681695789005144064, 85070591730234615865843651857942052864

A061652 Even superperfect numbers: 2^(p-1) where 2^p-1 is a Mersenne prime (A000043 and A000668). +20 56 2, 4, 16, 64, 4096, 65536, 262144, 1073741824, 1152921504606846976, 309485009821345068724781056, 81129638414606681695789005144064, 85070591730234615865843651857942052864 (list; graph; refs; listen; history; text; internal format)

A001146 2^(2^n). (Formerly M1297 N0497) +20 39 2, 4, 16, 256, 65536, 4294967296, 18446744073709551616, 340282366920938463463374607431768211456, 115792089237316195423570985008687907853269984665640564039457584007913129639936

A061286 Smallest integer for which the number of divisors is the n-th prime. +20 26 2, 4, 16, 64, 1024, 4096, 65536, 262144, 4194304, 268435456, 1073741824, 68719476736, 1099511627776, 4398046511104, 70368744177664, 4503599627370496, 288230376151711744, 1152921504606846976

A101926 2^A101925(n). +20 7 2, 4, 16, 32, 256, 512, 2048, 4096, 65536, 131072, 524288, 1048576, 8388608, 16777216, 67108864, 134217728, 4294967296, 8589934592, 34359738368, 68719476736, 549755813888, 1099511627776, 4398046511104, 8796093022208 (

A112535 Number of truth tables generated by 3CNF expressions of n variables. +20 4 2, 4, 16, 256, 43146, 120510132, 4977694100656

A063401 a(n)=a(n-1)*a(n-2)*a(n-3) with a(0)=1, a(1)=2, a(2)=2. +20 3 1, 2, 2, 4, 16, 128, 8192, 16777216, 17592186044416, 2417851639229258349412352, 713623846352979940529142984724747568191373312, 30354201441027016733116592294117482916287606860189680019559568902170379456331382784

A155543 a(n)=2*A081294(n). +20 3 2, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864, 268435456, 1073741824, 4294967296, 17179869184, 68719476736, 274877906944, 1099511627776, 4398046511104, 17592186044416, 70368744177664

A171381 Numbers n such that (3^n + 1)/2 is prime. +20 3 1, 2, 4, 16, 32, 64

A186108 Denominator of the frequency of the n-th dropping time in the Collatz iteration. +20 3 2, 4, 16, 16, 128, 256, 256, 2048, 8192, 32768, 16384, 262144, 262144, 2097152, 8388608, 16777216, 33554432, 134217728, 536870912, 1073741824, 4294967296, 2147483648, 34359738368, 137438953472, 274877906944, 274877906944, 2199023255552, 4398046511104,

A165420 a(1) = 1, a(2) = 2, a(n) = product of the previous terms for n >= 3. +20 2 1, 2, 2, 4, 16, 256, 65536, 4294967296, 18446744073709551616, 340282366920938463463374607431768211456, 115792089237316195423570985008687907853269984665640564039457584007913129639936

A186110 Denominator of the cumulative frequency of the dropping time in the Collatz iteration. +20 2 2, 4, 16, 8, 128, 256, 16, 2048, 8192, 32768, 32768, 262144, 16384, 2097152, 8388608, 16777216, 33554432, 134217728, 536870912, 1073741824, 4294967296, 4294967296, 34359738368, 137438953472, 274877906944, 137438953472, 2199023255552, 4398046511104, 4398046511104,

A051285 a(n) = a(n-1)^a(n-2), n >= 2. +20 1 2, 2, 4, 16, 65536, 115792089237316195423570985008687907853269984665640564039457584007913129639936

A079556 a(1) = a(2) = 2; for n>2, a(n) = a(n-2)^a(n-1). +20 1 2, 2, 4, 16, 4294967296

A135569 a(n) = S2(n)*2^n; where S2(n) is digit sum of n, n in binary notation. +20 1 0, 2, 4, 16, 16, 64, 128, 384, 256, 1024, 2048, 6144, 8192, 24576, 49152, 131072, 65536, 262144, 524288, 1572864, 2097152, 6291456, 12582912, 33554432, 33554432, 100663296, 201326592, 536870912, 805306368, 2147483648, 4294967296

A217994 a(n) = 2^((2+n+n^2)/2). +20 1 2, 4, 16, 128, 2048, 65536, 4194304, 536870912, 137438953472, 70368744177664, 72057594037927936, 147573952589676412928, 604462909807314587353088

A168088 2^tetranacci(n) +20 0 1, 1, 1, 2, 2, 4, 16, 256, 32768, 536870912, 72057594037927936, 324518553658426726783156020576256, 411376139330301510538742295639337626245683966408394965837152256

A168089 2^pentanacci(n) +20 0 0, 0, 0, 0, 2, 2, 4, 16, 256, 65536, 2147483648, 2305843009213693952, 1329227995784915872903807060280344576, 110427941548649020598956093796432407239217743554726184882600387580788736

A171163 Number of children at height n in a doubly logarithmic tree. Leaves are at height 0. +20 0 0, 2, 2, 4, 16, 256, 65536, 4294967296, 18446744073709551616

next terms 2^:

16, 6, 6, 8, 6, 5, 8, 7, 6, 5, 7, 8, 7, 16, 32, 6, 7, 8, 8, 8,

histogram: 5: 2 6: 5 7: 4 8: 6 16: 2 32: 1

combined histogram

2^3: 1 2^5: 5 2^6: 6 2^7: 8 2^8: 8 2^16: 3 2^32: 1

this still seems to contradict the hypothesis that 2^3, 2^6, 2^8 are more special than 2^5, 2^7. although both lines of reasoning point to 2^8 as being somewhat special

so we are left with just 1,2,4,16 as the specialest, and maybe 2^8 = 16*16 = 256 and 2^16 = 65536 also.

not sure if this numerology is useful for anything, but perhaps it suggests bit widths for instruction components, memory sizes for massively parallel machines, number of registers, amounts of parallelism, word sizes, etc.

e.g. maybe instruction components should be 1,2,4, or 16 bits, and instruction sizes should be 4 or 16 bits, machines should have 4 or 16 registers, machines should have word sizes of 4 or 16, machines should address 64k of memory, machines should have 64k processors.

otoh in the real world we have to deal with ipv6 and 64-bit machines, so we have a 2^6 and 2^7 in there already. the closest 'maybe special' to 64 is either 16 or 256, and to 128 is 256. So maybe we should make use of 256 bit widths for some things.

e.g. maybe instruction components should be 1,2,4, or 16 bits, and instruction sizes should be 4 or 16 bits, machines should have 4 or 16 or 256 registers, machines should have word sizes of 4 or 16 bits, machines should address 64k bits (8k bytes) or possibly 2^256 bits of networked memory, machines should have 64k processors.

hmm.. 2^256 is a big number.. perhaps we should give up on addressing every bit on the network and only have 'learned routes' with no absolute network addresses? also i see no need for 256 registers in BYTES but if this is in BITS that's 32 bytes of registers which seems reasonable. Otoh 16 registers with word size 16 bits also seems reasonable.

so i'm thinking:

alternately with 1-bit 'words':

that seems more sensible actually; the 16-bit instructions could contain fields allowing operations to treat things as 4-bit or 16-bit or 256-bit words

so there's a massively parallel architecture for you. each node has 64k local processors and each processor has 8k memory, so each node has 2^29 bytes of memory, about half a gigabyte.

Each processor is like a neuron with 1024 afferents and 64 bits of information at each synapse.

that seems too much memory per neuron.. how about 4 bits per synapse? we're off by a factor of 2^4. But the next step down from 64k bits local memory is 256 bits, which is a step down of 2^8. Let's try it though:

now each node has about 524k bytes of memory. If each processor is a neuron with 256 afferents, it can store 1 bit per afferent. Not enough.

History shows that computing likes to increase bit widths, and also individual neurons have megabytes to gigabytes of DNA which is at least ROM and may also set the scale for an epigenetic RAM system, so so let's err on the larger size, with 8k cache-like local memory per processor.

as for the instruction breakdown:

if we're doing register-to-register we have 4 bits for each of two operands and 4 bits for an opcode, that's 12 bits already so we only have 4 left. that's enough for 2 bits of addressing mode for each operand.

If we have 64k words in memory then it would take all of the 16 bits in the instruction to address local memory at the level of a bit. But we could have a few bits for scale, plus the address. E.g. if we had 4 bits to address, and we used 2 bits to divide by 4, then 16/4 = 4 so we'd still be short 2 bits. Hmm..

maybe we can't have immediate mode addresses then? that seems bad..

also we need some bits for scaling (choosing a word size per-instruction) if we don't have a set word size. and we wanted to have stack-ish options in addition to registers. and it would be nice to have a third operand sometimes!

well that sucks. no wonder everyone uses words and accumulators and 32-bit+ variable length instructions. PIC has weird accumulator stuff. AVR has 16-bit instruction but also has some instructions which take an additional 16 bits. The 16-bit ARM thumb instruction set has weird variable encodings and i think only 8 registers: https://ece.uwaterloo.ca/~ece222/ARM/ARM7-TDMI-manual-pt3.pdf

ok for each 'scale bit' that we take, we can remove one bit from EACH operand, since the scale is set for the whole instruction. So with two scale bits we only need two bits for each operand, plus two addressing mode bits, and so we have two bits left over. The fourth addressing mode can be stack-like (immediate, register, register indirect, stack-like). But if registers are only 4 bits then we can't yet address 64k of memory, right? Unless scale applies twice.. then with both scale bits set, we divide by (1,2,4,8 = 8) and we have two registers with 1 byte each, and we have 64k bits/8 = 8192 bytes to address, so by using both registers as one 16-bit address we can get there. But in that case shouldn't the scale be 16?

What would be great would be if we could just dispense with registers and address memory directly. Can we get there?

We have 16 bits of memory so we need at least 3 scale bits, giving us scales of 1,2,4,8,16,32,64,256. If we had just 8 more bits to address, 2^16/256 = 2^8, and we could reach everything, although beyond 16k we could only address using 256-bit (8-byte) word sizes!

alternately with 2 scale bits we could have scales of 1,2,4,16. Then we'd need 12 bits to address the 4096 words in memory.

also note that we want to reserve half the opcodes for custom, so if we only have 4 bits we only get to define 16 of them. And we want to reserve half the addressing modes (which i guess are also like types) for custom, so if we have 2 bits per operand we need to add another bit.

So with only one operand we already have 4 bits opcode + 3 bits addressing mode = 7 bits, plus 2 scale bits = 9 bits, and that's without the operand itself (which would then be at most 7 bits if we want to hit 16 total). If we had two operands that's 4 + 2*3 + 2 = 12 bits before the operands themselves (now 2 bits are left per operand). And the 3 bits for the addressing mode are odd, which doesn't fit our numerology of 1,2,4,16.

So we could go the variable length route, and have 16 bits be the length for the instruction, with all semantic modifiers (addressing modes, scales, and whatnot) but not the operands themselves, each of which could be another 16 bits. Now we can have 3 full-length operands again, and there is no need (from an instruction coding point of view) to have a few registers rather than just accessing memory.

So now we have 16 bits to play with, and we've spent 12. We have 4 bits left. We could add 2 more scale bits, and multiply their representation, giving us 1,2,4,16,32,64,256. And we could add another bit to each operand's addressing mode to make it even.

Or we could add a third operand. 4 + 3*3 + 2 = 15 bits, 1 bit left over.

if we have only 2 scale bits, with 1,2,4,16, and we have 64k bits of memory to address, we still need only 12 bits to address it in 16-word mode (and only 1/16th, that is, 4096 bits, are available for single bit addressing). So if each of the operands is a 16-bit word, we have 4 bits left over in each operand. We can put the 2 bits of addressing mode plus the one bit custom addressing mode flag in there, and still have one bit per operand left.

So now the first word only have 4 bits for the opcode plus 2 bits for the scale. We now have 10 bits left there!

Some of those bits could be a second scale to allow us to use less than 12 bits per operand. If we only have 2 operands and we only use 4 bits per operand plus 2 bit of addr mode plus 1 bit custom addr mode plus 1 bit reserved as of now, then two operands cost (4 + 4)*2 = 16 bits. So in this case we only need 2 16-bit words instead of 3. Choosing between 4 and 16 bits per operand costs at least 1 bit. If we have 2 scale bits, then the operands could be 1,2,4, or 16 bits each. If they are 2 bits ea. then we still need 12 bits for two operands (4 + 2)*2 = 12. If they are 1 bit each then we still need 10 bits. So these options don't help as much. So let's just have 1 more scale bit.

So now we have 9 bits left in the first instruction word.

Otoh what if we have 3 operands? if they are 1 bit apiece that's (4 + 1)*3 = 15 bits. So in this case we can fit everything into just 32 bits! This is like only having two registers, but what the hey, i guess that's like using the top 2 items on the stack. So maybe we do want a second scale bit. Can we use the other option? 2 bits apiece? (4 + 2)*3 = 18. If we didn't use the other reserved operand bit that would be 15 again though and we'd be golden.

So now we have 8 bits left in the first instruction word, and the outstanding question of what to do with a 2 in the second scale reg.

Now what if we put some operands into those 8 bits? we'd have:

(4+4)*2 = 16 - 8 = 8 (4+2)*2 = 12 - 8 = 4 (4+1)*2 = 10 - 8 = 2

(4+4)*3 = 24 - 8 = 16 (4+2)*3 = 18 - 8 = 10 (4+1)*3 = 15 - 8 = 7

So if we use the last 8 bits for operands then we could fit 3 4-bit operands into two 16-bit words. And even if we don't, if we have two scale bits and use 1-bit operands, we can fit 3 1-bit operands into two 16-bit words.

btw note that a hypercube, as noted in the Connection Machine, might be a good network topology for 2^16 processors; each processor would have 16 immediate neighbors, and the graph (diameter? max hops) would be 16.

this thing with the 4 bits suggests 16 memory-mapped registers..

also we can get rid of the useless '2' scale option by using variable length scale..

--

note: the 8 KiB? of memory indicated above (via 2^16 bits) is already 8x less than the 256 KiB? in a Cell processor SPE (there are 8 SPEs per cell), and less than the L1 instruction cache on many processors (which is often at least 16 KiB?)

but 4x more than the RAM in a Parallax Propeller Cog (there are 8 Cogs per propeller), which is 512 32-bit words, or 2 KiB?

and of course we are going to have 65536 processors, not 8, an increase by a factor of 8k

--

and what if you said, 65536 = 2^(2^(2^2)), but instead of having 65536 bits, since we know 16 is special (2^2^2), we'll have 65536 16-bit words? that's

(2^2^2) * 2^(2^(2^2))

to complete the pattern,

2 * (2^2) * (2^2^2) * 2^(2^(2^2))

2 * 4 * 16 * 65536

2^(1 + 2 + 4 + 16)

2^23

8388608

2^23 bits is 2^20 bytes, which is one mebibyte.

It's not clear to me that this makes things better, though. 23-bit addressing?

---

so, what are the 'most special' powers of 2?

as noted above, we have the sequence with initial condition 1 and inductive step 2^(previous member), which starts:

1, 2, 4, 16, 65536

after that, we have powers of 2 whose exponent is itself a power of two, which starts:

2, 4, 16, 256, 65536 (adding only 256 to the previous sequence)

next we have everything of the form 2^(n*k), where n is a power of 2 (ie of the form 2^a), and k is one of 0,1,2,3 (or maybe 1,2,3,4; you get the same answer either way, except for 1) (the point is that k ranges over numbers up to a power of two, in fact, up to the first power of two after 2, although one could imagine that in another, further step, we would add in numbers of the same form except where k ranged up to 2^3=8 or maybe 2^4=16). When k is 0 you get 1, when k is 1,2, or 4, then n*k a power of two (so we just get 2^n where n is a power of two, which we've covered previously; 2,4,16,256,65536); so the only new case is when k is 3. So we get 2^(2*3) = 2^6, 2^(2^2*3) = 2^12, 2^(2^3*3) = 2^24, 2^48, 2^96, with the "most special" of those being where n was a power of a power of two, that is, 2^6, 2^12, 2^48:

64, 4096, 281474976710656 (or about 281.5e12, or about 281.5T; note that 4096 = 64^2, and 281474976710656 = 4096^4)

in summary, the most special (thru 2^16) are:

2, 4, 16, 65536

and then 256

and then 64, 4096, 12

Where is the 8 found in the ubiquitous 8-bit "byte"? We do see 8 in the above by starting with 256 and taking log_2(256) = 8. But this method also gives us log_2(64) = 6 and log_2(4096) = 12. 6, especially, does not seem very "2-ish". My opinion is that 8 is actually not that special (as a power of two, that is; the fact that it is a cubic number is of interest in itself, but for different reasons. It has other interesting properties, such as being a bad size for committees: http://arxiv.org/pdf/0804.2202v1.pdf ; ). In fact, old computers used to have other preferred bit-lengths. My guess is that 8 was settled on because (a) 256 values is about right for holding something like ASCII (all alphanumeric latin characters, then some punctuation), plus leaving half the values unassigned; (b) 255 is bigger than most small integers you want to do arithmetic on, big enough to hold the size of small lists you might want, and big enough to fit instruction sets in comfortably.

4: 4 seems to the number of assembly-language-computer-like stuff; 4 1-bit flags, etc.

12 is interesting as the result of 16 - 4; eg if you have a 16-bit bitfield, and you need to use 4 of those bits for something, then you have 12 bits left to use for something else.

In powers-of-two-land, 16 is an elaboration of 4 (2^4); also very powers-of-two-sy/computersy

4096 is often found in computers, eg in memory page sizes. 4096 (and sometimes, 256) seem to be regarded as 'the smallest chunk of memory in which you could put something useful'.

2^16 = 65536 is the smallest power of two bitwidth that can really hold a decent amount, the length of lists that aren't 'small' (that is to say, the smallest number of the form 2^(2^n) such that the resulting number of instructions can hold a decently-sized computer program). 65536 to me seems like the powers-of-two equivalent of how some human languages/cultures use specific largish numbers as a placeholders for 'arbitrarily large' (but not necessarily infinite) (eg 10,000: http://www.merriam-webster.com/dictionary/myriad ; various examples of 10,000 in classical Chinese literature). 65536 is the first power of two bit-length that is a reasonable size for the number of memory locations in a general-purpose computer (although if you don't care if the bit-length is a power of 2, 2^12 = 4096 is a reasonable choice).

(this sort of reasoning makes you wonder why Unicode didn't make characters fit within 2 bytes; shouldn't 65536 be enough to express ourselves? I'm not very knowledgable about this but my impression is that they initially tried; a Unicode encoding called UCS-2, which uses two bytes, used to be popular; but politically, Unicode had to take the union of all of the characters of all large or powerful human communities, which led to a demand for more than 65536 code points; note that this is different from saying that 65536 isnt enought to form an alphabet, this is saying that 65536 isnt enough to encode all alphabets that through historical accident have been created so far and still seem important today (note: some ideogram languages do have more than 65536 ideograms, but perhaps not much more (i heard 70,000 for Chinese once), and i bet many of these are not in active use outside of proper names). To this day, the "BMP" or Basic Multilingual Plane, which is the first 65536 unicode code points, is often talked about.)

It is interesting to consider what the corresponding numbers for some non-binary number systems would be. For 3: 3^(3^1) = 27 (so a trinary number in 3 trinary memory locations, or 'trits', can hold 27 distinct values); 3^(3^2) = 19683. 3^(3^3) = 7625597484987. 3^(3^(3^3)) = too large to work with. So, the equivalent of the formula for 65536 ((2^(2^(2^2))) = 2^(2^(4)) = 2^16; the number of locations that can be addressed by (the number of bits that can be addressed by (the number of bits that can be addressed by (the number of bits that can be addressed by (one bit))))) is too large to work with. Even 3^(3^3) is in the trillions. 3^(3^2) = 3^9 is the one that is on the order of 65536; the smallest 'special number' for base 3 that denotes a number of instructions that could hold a decently-sized computer program; this is the number of locations that can be addressed by (the number of trits that can be addressed by (two trits)). For 5: 5^5 = 3125; 5^(5^2) = 298023223876953125, which is one the order of 298023 trillion (roughly on the order of 2^64). For 12, 12^12 = 8916100448256, roughly 9 trillion, or a little less than 2^48.

---

could use sequence of 2^n bitwidths (= sequence '2^(2^n)' addressable locations) for different things; this seems to be close to how these are actually used:

shorter summary:

so what we might do (consolidating/leaving out every other step above; this lets us keep the all-important 4, 16, and also keep the commodity hardware 64, although we lose the 8 (the byte!)):

this could perhaps be continued in another pattern: 1, 2, 4, 16, 64, 256, ... = 2^0, 2^1, 2^2, 2^4, 2^6, 2^8, ... the differences in the powers of 2 are 1, 1, 2, 2, 2; suggesting it follows with 3,3,3,3,4,4,4,4,4, 5,... so , 2^11, 2^14, 2^17, 2^20, 2^24, 2^28, 2^32, 2^36, 2^40, 2^45... but this seems messy.. so maybe follow with 4,4,4,4, 16,16,16,16,16: 2^12, 2^16, 2^20, 2^24, 2^40, 2^56, 2^72, ... but we've skipped 2^32 and 2^64 so maybe follow with 4,4,4,4, 8,8,8,8,8, 16, ...: 2^12, 2^16, 2^20, 2^24, 2^32, 2^40, 2^48, 2^56, 2^64, 2^80, ...; this corresponds to bit widths of 1,2,4,16,64,256,4096,65536,...

indeed the last series seems to catch most of the useful/most used numbers (8 is conspicuously absent, though): 1,2,4,16,64,256,4096,65536,

how does it proceed from there?

1 Mib, 16 MiB?, 4 GiB?, 1 TiB?, 256 TiB?, 64 PiB?, 16 EiB?, 1 YiB?

so the first 16 elements of this series are:

1,2,4,16,64,256,4096,65536, 1 Mib, 16 MiB?, 4 GiB?, 1 TiB?, 256 TiB?, 64 PiB?, 16 EiB?, 1 YiB?

---

to summarize the previous segment:

a useful series is: 1,2,4,16,64,256,4096,65536, 1 Mib, 16 MiB?, 4 GiB?, 1 TiB?, 256 TiB?, 64 PiB?, 16 EiB?, 1 YiB?, ... (computed as 2^0, 2^1, 2^2, 2^4, 2^6, 2^8, 2^12, 2^16, 2^20, 2^24, 2^32, 2^40, 2^48, 2^56, 2^64, 2^80, ...; the differences of powers of 2 are 1,1,2,2,2, 4,4,4,4,8,8,8,8,8, 16, ...)

a computer (or VM) could be built as:

---

could also use the above series as the bitstring for the initial prefix of arbitrary precision scales: eg

000001 means #6 on that series, 256; that then becomes the 'word size' (in words, or in bits? probably words) of the following data structure. Eg if the data structure is just one atomic (arbitrary-precision) word, then if initial prefix is 000001, then the following quantity occupies 256 words (256 16-bit words). In practice though would probably just use 16-bit words for the prefix, in which case the above gets too big too fast; just say that if the prefix is 0, keep going; so a 1-word prefix specifies any length from 1 to 64k-1, a 2-word prefix (so, the first word is 0) from 64k to 4GiB?, with a step size of 64k, etc.

similarly, could use the above series in a barrel shifter for compressed representation of common constants. Eg:

s1 + s2*x, where x is the number actually represented in the main operand, and s1 and s2 are entries from the above series. If the main entry is 8 bit, and the other 2 are each 4 bits, for a total of 16 bits, then could represent eg 2^16 + 64*200. In this case, however, there's not much need for the series entries under 8 bits for s1; so could use 3 bits for the other two, have a 10 bit main entry, and have the series be: 1 (or zero, if used in the addition), 256, 4096, 65536, 1MiB?, 16MiB?, 4GiB? (2^32), 1TiB?. Or, have a 12-bit main entry, and have the series be: 1 (or zero, for s1), 4096, 65536, 1MiB? (2^20). This lets every number from 0 to 4096 be represented, plus many other numbers up to 2^24 + 2^12*2^20 = ~2^32. As a refinement, change the series for s1 to be more useful: 0, -1, +1, (mode); and the series for s2 is 1, 4096, 65536, 1Mib (2^20); this lets every number from -1 to 4096 be represented, and some nubers up to 2^32 + 4096. If we are trying to express a 32-bit value we can use -1 and 2^32 + 1 as two distinct sentinels; and we can use (mode) as a packed flag (for example, to represent sign). This is pretty complicated though, and all it allows us to do is to express many useful 32-bit quantities in 16-bits, while still being able to express 12 bits exactly; not sure if saving 16-bits (over just using 16 bits) is worth the complexity there; otoh the mode and the sentinel are also very valuable. But maybe we can get similar things out of floating point half-precision (16-bit), which is more standard, although there we are given two 'modes': negativity, and fractionness (

x< 1) and a 10 bit significand rather than 12, and we can only go up to 65520? Eh actually floating point sounds even more difficult to deal with.

however, maybe an easier thing to do would be to just say:

---

so in summary we're at:

---

if we use the threadlocal memory for a stack, how do we spill the stack?

instead of doing it one-by-one as we call and return from functions, could spill/unspill the stack so that after each time, half of the 64k local stack memory is unused. Eg if we are calling, then spill the top 32k or so, swap the bottom 32k with the top, and now we have 32k free under us for more calls; and if returning, only load 32k, so that again we have 32k under us for calls. Note that this implies that it would be great to have a 'flip' operation that (effectively; we could do this by just taking the stack pointer mod n where n is the stack size, and keeping track of when we go over the boundary and the midpoint) swaps the bottom 32k of local memory with the top (or rather, copies the bottom into the top and ignores the top)

---

https://github.com/ethereum/wiki/wiki/RLP provides a variable-length binary encoding for semi-arbitrary-length byte arrays and lists (the items of the list can be lists, or byte arrays); by 'semi-arbitrary-length' i mean that it can't actually represent arbitrarily large objects, but it can represent very large objects. RLP can represent objects up to length 2^64-1, i think. It is optimized so that a single byte whose value is <128 can be represented in a single byte (no length header byte is needed); and so that a single length header byte can represent the length of byte arrays or lists of lengths up to 55 (inclusive); longer bytearrays and lists require a length-header-length byte followed by a length header (which may be up to 8 bytes long) followed by the payload.

furthermore, the representation of integers is defined as big endian binary form with no leading zeros (i don't quite get the 'no leading zeros' requirement; perhaps they are trying to ensure that there is only one encoding for a given number? this makes sense as the one of the application requirements of RLP in Etherium is to make sure that the same value is always encoded the same way, so that if two clients encode the same value they will have the same hashe; https://github.com/ethereum/wiki/wiki/Design-Rationale mentions this).

furthermore, the representation of dictionary is defined to be [[k1,v1],[k2,v2]...] with keys in lexicographic order (although the RLP page also mentions a Patricia Tree encoding, which i think is sloppy wording; i think what they mean is that there is also another data type, a Patricia Tree).

this is presented as a serialization format.

imo one potential problem here is that this serialization does not include type information, eg the same entry in a list might be interpreted as either an integer or a string, and some lists might be able to be alternately interpreted as dicts or as Patricia Trees. They are assuming that in their application the client already knows the data types, which is probably valid but imo may lead to confusion with security implications later if a protocol designer makes a mistake.

they mention bencode, protobuf, BSON as alternative serialization formats.

bencode: has byte strings, integers, lists, dicts. Does encode the type of items. Encodes integers as "integer encoded in base ten ASCII"! Has closing delimiters. There is only one possible encoding for each value. Makes some attempt to be human-readable, but since it allows byte strings, is not completely human-readable and also is not text editor-safe.

protobuf: scalar types: double, float, int32, int64, uint32, uint64, sint32, sint64, fixed32 (alternative int32 more efficient for large values), fixed64, sfixed32, sfixed64, bool, string, bytes [2]. composite types: enum, message, repeated (array), maps (syntactic sugar for a 'message' header plus a 'repeated'), timestamp, duration, Struct, wrapped types (can be null), FieldMask? (?), ListValue? (?difference from repeated?), Value (?is this just a type that could be any scalar value?), NullValue?, Any (dynamic type)

https://en.wikipedia.org/wiki/Protocol_Buffers https://developers.google.com/protocol-buffers/docs/overview?hl=en

bson: types: string, integer (32- or 64-bit), double, date (integer number of milliseconds since the Unix epoch), byte array (binary data), boolean, null, BSON object, BSON array

see also

https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats https://en.wikipedia.org/wiki/Serialization https://en.wikipedia.org/wiki/Category:Data_serialization_formats

note the column headers in https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats:

Name Creator-maintainer Based on Standardized? Specification Binary? Human-readable? Supports references?e Schema-IDL? Standard APIs

note: https://developers.google.com/protocol-buffers/docs/overview?hl=en provides a good explanation of when XML is good (beyond concerns of human readability) vs a binary format like protobuf: XML is good when interleaving structure with text ('markup') as opposed to a fixed structure.

---

2^4 = 2^(2^2) 16, which seems to be a key number in binary-based systems

i guess the equivalent for trinary-based systems would be

3^9 = 3^(3^2) = 19683

much bigger! probably would play a similar role to 2^16 = 64k

or mb even

3^(3^3) = 3^27 = 7625597484987

what about 3^3? 3^3 = 27, that's closer to 16 i guess.

(what would it be for e? e^e = 15.2. e^(e^2) = 1618.1. e^(e^e) = 3814279.1)

so, one imagines that in trinary systems, 3 trits would be the equivalent of a 'nibble' and 9 trits might be the equivalent of a 'word'.

---

so which is better for ISA design etc? for design, the nibble size matters more. 16 is smaller (and is close to e^e!), so that's probably better. reinforces the idea of binary as simpler than trinary. Of course then there's base 12 (2*2*3), which suggests that 12 is even better.

---

the Rex Neo has 256 cores per chip, which makes sense. Each core has 128k (so 32M per chip) and it has DMA access by any core to any other core's memory, even between chips i think. I wonder how many bits those are addressed by? 128k x 256 = 32M = 2^25. So if there's 32 bit-addressing, there's space for 2^7 = 128 chips. If 64-bit addressing, then space for 2^39 = 512G of chips. Note that IPv4 addressing has 32-bit addresses, so this would be able to address one chips for every IPv4 address (128 'IPv4 networks', or, 128 chips for every IPv4 address). That sounds pretty decent. Put another way, 64-bit addressing can address 16 exbibytes, which can be broken up into 2^39 chunks of 32M per chunk.

To make things a little more even, what if each core could address 2^24 bytes (16M) instead of 2^17? Now you could have exactly 2^32 chips; within each 64-bit address, you'd have 32 bits to specify the chip, 8 bits to specify the core, and 24 bits to specify the memory address. Or, what if each core could address only 2^16 bytes (64k)? Now each core could be addressed by 48 bits (40 bits to specify the chip, and 8 bits to specify the core). One might imagine that one might 'cluster' chips into clusters of 256 chips, in which case those 40 bits per chip could be broken into 32 bits to specify the cluster and 8 bits to specify the chip within the cluster. (or, if you had a 3d array of chips with a power of two in each direction, you might have 16 in each direction, yielding 4096 chips per cluster, or 2^12, so the 40 bits per chip might be broken into 28 bits to specify cluster and 12 bits to specify chip within cluster)

So, in summary, if you have 256 cores per chip, it's still reasonable to use 64-bit addressing to address any byte of memory within any chip. 32-bit addressing in that case would only work for at most 128 chips.

---

remember that many processors have an L1 icache of 32k. So it would be nice if the OVM core would fit in 32k (or even better, fit in there with some room to spare for some other stuff like frequently-used variables or low-level library routines).

---

on the prettiness of 12 as a bitwidth:

16 bits are pretty for the reasons given above.

2^12 = 4k, which is the page size on many contemporary systems, and is a common/notable RAM size on various MCUs

By the measure 'the smallest bit width that is good enough', 16 bits wins out of the powers-of-two (because 8 bits is too small; you need more than 256 RAM locations, and more than 256 program instructions, to accomplish a lot of things); but 4k might be just large enough for many of these things.

12 bits is the message size of binary Golay encoding (6 trits for ternary).

12 (and 6) are both SHCN and colossally abundant numbers and, i argue, good radixes (number bases). Although exactly 12 locations cannot be addressed by any number of bits, 2 bits and one trit do address 12.

12 is not a power of two, but it is right in the middle of two powers of 2 (8 and 16), and it is divisible by every power of 2 except for the one right before it (that is, it is divisible by 2 and 4, although not 8). But 2 and 4 are very important primitive '2-ish' numbers, 8 not so much.

---

i guess the previous section enhances my faith that 12-bit immediates are a good idea for MEDIUM mode, and 4k or 2k (b/c signed) limits on things like numbers of instructions, numbers of functions exported by a module etc are a good idea.

---

there are 26 sizes (possible strengths) of synapses; so ~4.7 bits of information [3]

---

how many words do ppl use?

"A more useful number from the Oxford English Dictionary would be the 171,476 words that are in current use." [4]

" Working vocabulary or extended vocabulary? Most adults have a working vocabulary of ten to twenty thousand words but you can get through an average day talking about ordinary things with fewer than a thousand words.

Most adults know or are somewhat familiar with around thirty to forty thousand words, they just don’t use all of them regularly - but they’d know them if they heard them, or read them in a book.

...

It is said that the average American has about 10,000 words in his vocabulary. An educated American may have as many as 80,000. By way of comparison, there are 750,000 entries in the Oxford English Dictionary.

...

The average adult knows 20,000 to 35,000 words.

The average 20 year old knows 42,000 words and phrases (known as lemmas— including idioms). http://www.sciencemag.org/news/sifter/average-20-year-old-american-knows-42000-words-depending-how-you-count-them

...

However, many people believe that Americans use an average number of 20,000 words (working vocabulary).

...

Based on an analysis of the literature and a large scale crowdsourcing experiment, we estimate that an average 20-year-old native speaker of American English knows 42,000 lemmas and 4,200 non-transparent multiword expressions, derived from 11,100 word families. The numbers range from 27,000 lemmas for the lowest 5% to 52,000 for the highest 5%. Between the ages of 20 and 60, the average person learns 6,000 extra lemmas or about one new lemma every 2 days. The knowledge of the words can be as shallow as knowing that the word exists. In addition, people learn tens of thousands of inflected forms and proper nouns (names), which account for the substantially high numbers of ‘words known’ mentioned in other publications.

" [5]

so it seems to me that although there's a lot of words 'in current use' practically speaking a given person tends to know only around 64k words. This means that if the brain was forced to use 16-bit 'words' to represent English words, it wouldn't do too badly. Otoh since the number does get pretty close to 64k, it would be very surprising if the brain actually had a hard limit of 64k, suggesting that its actual data representation of words has more than 16 bits.

otoh this does suggest that you could design an alien brain and society that only used 16 bits to represent words, without losing too much expressiveness. Which suggests that yes, 16-bits is a good enough foundation for an assembly language.

---

i guess the std bitwidths are reasonable afterall; if 16 bits is the reasonable minimum, then 8bit is one less than that (for less memory) , 32 bits is one more than that (a safe default), and 64 bits is 2 more than that (for big stuff)

note that altho ipv6 has 128bit addrs, some say these were sort of meant to be thought of as 64 bit network addrs and a 64 bit subaddr space for each host.


an argument for 32 bits:

the total number of DNA base pairs in an individual human genome is about 6.4 billion [6] [7].

That's 6400000000

to compare, 2^32 = 4294967296

so to address a single base pair, you'd actually need a little more than 32 bits.

Also, there are 4 possible base pairs, so each base pair is actually 2 bits of data. So to write that down in binary, you'd need twice as many bits, or 12800000000, and 12800000000/4294967296 is almost 3. Otoh, in bytes, 12800000000 bits is only 12800000000/8 = 1600000000.0, which is about 1/3 of 2^32.

In any case, to write an indivdual human genome in computer memory, you'd need about as much memory as 32 bits can address. Now, i bet a cell doesn't need to be able to (uniformally) address (all) of its DNA with the precision of a single base pair. But i'd be surprised if it can't even address to the precision of a 64k sequence of base pairs. Which would mean that 16 bit addressing would not be enough.

And 32 is the next power of 2 after 16. So there's an argument that even if 64k of memory is enough to do something interesting, it's not quite enough to do what you probably want.

And more evidence for that is that there are even some MCU OSs where some ppl feel you need more than 64k; one guy here says that for "serious development" on NuttX? "128KiB? is the minimal size I find comfortable" [8]. And there are some 'embedded' programming languages requiring more than 64k (e.g. https://micropython.org/ says " Yet it is compact enough to fit and run within just 256k of code space and 16k of RAM. ")

So, although 64k is probably still the 'minimalist' choice such that you could in theory do whatever i need (for an Oot implementation), i think it might be very uncomfortable and involve a lot of optimized coding. 32-bit addressing might really be needed in order to do whatever i need in a straightforward way.

---

removed from boot_design.adoc

Why 16-bit?

8- and 16-bit processors appear to be unnecessarily spartan; even in the embedded space there are lots of 32-bit processors available. But, within the family of embedded processors, 16-bits appears to be squeezed by 8-bit and 32-bit; that is, there are many 32-bit processors, and several 8-bit processors, but only a few 16-bit processors. So, if we are not going spartan, why not use 32-bit; and if we are, why not use 8-bit?

We chose 16-bit rather than 8-bit because many 8-bit processors use pointers of size greater than 8-bits. 16-bits seems to be the smallest multiple of 8-bits that is wide enough to address a 'decent' amount of memory (for example, the CPUs in old Apple IIs and Commodore 64s could address 16-bits of memory). We are going for simplicity here, and it's simpler if we assume that a pointer can fit in one word of memory.

Why am I even interested in going as low as 16-bits? The ultimate purpose of Boot is a substrate for Oot, and we don't imagine that a high-level language like Oot will be the best choice for the manual, resource-constrained projects that often characterize embedded systems programming; so why should we care about a small bitwidth? The reason is that I'm interested in creating a programming language that might be good for making massively concurrent programs, because I think the brain works that way (and by massively concurrent, I don't mean hundreds of processors, I mean tens of thousands, or preferably billions). In order to achieve this amount of concurrency, I imagine that a hardware architecture would probably scale way back on per-processor complexity. I doubt that the elementary components of the brain (whether these turn of to be synapses, neurons, or clusters of neurons) compute with a high degree of numerical precision -- and in fact, I believe that the brain's components have a lot of noise/errors, which would further limit accuracy to even less than whatever precision it has. So, I suspect that hardware that supports massive concurrency may be composed of individual processors that are simpler (smaller bitwidth, etc) than we are accustomed to today. Of course, it's exceptionally unlikely that I can guess future architectural specifications to a degree that I can make Boot compatible with a class of machine that doesn't yet exist --- but by limiting ourselves in this way we at least make it more likely that it will be less troublesome to create a variant of Boot in the future, than it would be if we based everything on, say, 64-bit floating point.

Interestingly, there's some evidence that artificial neural networks can use 8-bit values for execution, and 16-bits for training (although stochastic rounding, which is non-standard, may be needed). [9] [10] [11]. This is not conclusive, but merely adds one more application to the list of applications that don't need more than 16-bits.

Why not some number of bits in between 8 and 16? Practically, there aren't many implementations that use something in between, and more subjectively, 16 is a power-of-two which seems aesthetically appropriate for systems based on binary logic.

Tangentially but interestingly, 16 is a nice number subjectively/aesthetically/'numerologically', and has lots of connections to the numbers 2 and 4, and also has some symmetries relating to exponentiation of 2s and 4s. 16 = 2*2*2*2 (4 2s multiplied together). 2^(2^2) = (2^2)^2 = 16; 16 is the last power of two that can be written that way 'associatively' because exponentiation is not associative and for larger series of 2s raised to each other, the order of operations matters, eg (222)2 != 2(222). In fact, 16 is the only integer n such that there are distinct integers x,y such that: x^y = y^x (eg 2^4 = 4^2). Because 16 = 2^(2^2), it is a member of the sequence 1,2,4,16,65536,..., which starts with 1 and then raises 2 to the previous sequence member. This sequence has an 'application' in the following sequence of bitwidths: 1-bit; the bitwidth whose bits can themselves be addressed by a 1-bit address (which is 2-bit); the bitwidth whose bits can themselves be addressed by a 2-bit address (which is 4-bit); the bitwidth whose bits can themselves be addressed by a 1-bit address (which is 2-bit); the bitwidth whose bits can themselves be addressed by a 4-bit address (which is 16-bit); the bitwidth whose bits can themselves be addressed by a 16-bit address (which is 65536-bit); (also, 16 is the last widely available member of this sequence, because 65536-bit bitwidths is obviously way bigger than found in present-day commodity implementations). Another way to think of this 'application' is: start with a bit as an primitive index; you can index 2 items. Now, you can use your two-item memory to store 2 1-bit indices; this allows you to build a memory that can index 2^2 = 4 items (you have 2 dimensions of indices, and the index in each dimension is 1 bit, so you can index 2*2 = 4 items); now, you can use your 4-item memory to store 4 1-bit indices; etc. And because 16 = (2^2)^2, it is a member of the sequence 2,4,16,256,..., which starts with 2 and then squares the previous sequence member (perhaps there is a similarly compelling computersy 'application' for this sequence?). Simplifying the numbers in parentheses of 16 = 2^(2^2) = (2^2)^2, we see that 16 = 2^4 = 4^2. 16 is a power of a prime (2^4); since 4 is the first composite number, 16 is the first power of a prime where the number that the prime was raised to is composite. 16 is a also a square number (4^2). 16 is a 'practical' number (defined as a positive integer n such that all smaller positive integers can be represented as sums of distinct divisors of n) (all powers of 2 are practical). 16 has 4 identical prime factors (if you count multiplicity), and it is equal to 4*4. The 16-cell (sometimes called a '4-D hyperoctahedron', '4-dimensional cross-polytope' or '4-D orthoplex', or a 'hexadecachoron'), is one of only 6 regular polytopes in 4-D, and it has 16 faces. The 16-cell honeycomb is one of only 3 regular tesselations of (Euclidean) 4-D space. Of course, all small integers have lots of 'special properties' like these, and 8 has a similar list. However, 8 doesn't have the x^y = y^x property, nor can it be represented as a bunch of 2s exponentiated together, and although it's a power of 2, 8 is the third power of 2, and 3 is not a power of two; and its geometric relevance is mainly to 3-D, in contrast to 16's relevance to 4-D, and again 4 is a power of 2 and 3 is not; so in a way 16 seems to be more 'purely' connected to 2 and its powers, whereas 8 has a strong connection to 2 but also a connection to non-power-of-two 3 (in place of 16's connection to 4). Since usage of a number as a bitwidth is intimately connected both to exponentiation and to 2, aesthetically, 16 seems to have some qualities to recommend it for this purpose, even if 8, the next smaller power-of-two, is smaller and is therefore simpler. 32, the next larger power-of-two, lacks these qualities and is also larger, so aesthetically it's not as nice for this purpose. 4 is even more simple than 8 and unlike 8 is 'purely' connected to 2, but a bitwidth of 4 is clearly 'too small' in that it's not very available in commodity implementations. This 'numerological' stuff may not have any real practical bearing on the decision, but it shows that 16 is an aesthetically nice choice for a bitwidth, in a way that 8 and 32 are not.