powers of 2

---

what bit width for various things?

64-bit is a commonly used width

according to http://en.wikipedia.org/wiki/128-bit , "...there are currently no mainstream general-purpose processors built to operate on 128-bit integers or addresses...".

IPv6 has 128-bit addresses

following that advice to take the current value of something and double it, that suggests that either 256-bit or (2^7 = 128, 2^14 = 16384) 16384-bit is a comfortable maximum.

the number of atoms in the universe is about 10^80, which is around 2^256

this suggests that the max bitwidth should be between 2^8 and 2^256, or between 2^8 and 2^(2^8)

one bit per atom seems a bit much. halving the latter 8, we get 2^(2^4) = 65536 bits, which is 256^2 bits or 8 KiB?. That seems like a very large upper bound already for a bitwidth, when you compare it to 64 bits (8 bytes); it's an increase from 64 bits of a factor of 1024 (2^10); whereas 64 = 2^6, and the increase from 8-bit is only 8.

so 16384 seems better.

or we could even take 2^6 = 64 and double the 6 to get 2^12 = 4096 bits, or 0.5 KiB?.

that seems safe; with a field of 2^12 bits, we could already represent numbers higher than the number of atoms in the universe

if we never need numbers bigger than 2^(2^8), then we never need bit fields bigger than 2^8. so 256-bit (32 bytes) seems safe.

and if we never need numbers bigger than half of 2^(2^8), then 128-bit seems safe. So perhaps IPv6 is already crazy conservative.

so let's choose 128-bit as our max.

powers of 2 are special. are there special powers of two? i guess so:

2^2 = 4 2^2^2 = (2^2)^2 = 2^(2^2) = 2^4 = 16 ((2^2)^2)^2 = 2^2^2^2 = 2^8 = 256 2^(2^(2^2)) = 2^16 = 65536 2^(2^(2^(2^2))) = too big (((2^2)^2)^2)^2 = 2^2^2^2^2 = 2^16 = 65536 ((((2^2)^2)^2)^2)^2 = 2^2^2^2^2^2 = 2^32 = 4294967296 (((((2^2)^2)^2)^2)^2)^2 = 2^2^2^2^2^2^2 = 2^64 = 18446744073709551616 etc

so 1,2,4,16,65536 is special and 1,2,4,16,256,65536,2^32,2^64, etc is special

and presumably 1,2,16 (elements 1,2,4 of the first) and 1,2,16,2^64,.. (elements 1,2,4,8 of the second) are extra special

and presumably 2^1, 2^2, 2^4, 2^16 and 2^1, 2^2, 2^4, 2^16, 2^256

so recurring specialness are:

1,2,4,16

and less so: 65536 (2^16)

and less so: 2^8, 2^64, 2^256 (which implies that 8 and 64 and 256 are more special than 32, 128, that is, that 2^3, 2^6, 2^8 are more special than 2^5, 2^7)

searching for 1,2,4,16 on oeis, and ruling out all those that can be immediately seen by me to contain non-powers of 2, we get:

http://oeis.org/search?q=1%2C2%2C4%2C16&sort=&language=&go=Search

A014221 a(n+1) = 2^a(n) with a(0) = 0. This is the Ackermann function A_3(n+1) as defined in the Comments line. +20 52 0, 1, 2, 4, 16, 65536 (list; graph; refs; listen; history; text; internal format) OFFSET

0,3 COMMENTS

Next term has 19729 digits - Benoit Cloitre, Mar 28 2002

A073924 Smallest power of 2 that is greater than the previous term such that every partial sum (n>1) is a prime. +20 7 1, 2, 4, 16, 128, 65536, 9007199254740992, 73786976294838206464, 205688069665150755269371147819668813122841983204197482918576128

A060656 a(n)=2a(n-1)*a(n-2)/a(n-3) with a(0)=a(1)=1. +20 6 1, 1, 2, 4, 16, 64, 512, 4096, 65536, 1048576, 33554432, 1073741824, 68719476736, 4398046511104, 562949953421312, 72057594037927936, 18446744073709551616, 4722366482869645213696, 2417851639229258349412352

A001901 Successive numerators of Wallis's approximation to pi/2 (reduced). +20 4 1, 2, 4, 16, 64, 128, 256, 2048, 16384, 32768, 65536, 262144, 1048576, 2097152, 4194304, 67108864, 1073741824, 2147483648, 4294967296, 17179869184, 68719476736, 137438953472, 274877906944,

A053038 The first (largest) power of 2 arising in the iteration-sequence when A051593(cototient function) is repeatedly applied starting with n!. +20 3 1, 2, 4, 16, 32, 128, 512, 4096, 16384, 2048, 8192, 65536, 16384, 65536, 8388608, 134217728, 8388608, 8388608, 16777216, 2097152, 8388608, 268435456, 4398046511104, 70368744177664, 67108864, 67108864, 67108864, 68719476736 (list; graph; refs; listen; history; text; internal format)

A073923 a(1)=1, a(n) is the smallest power of 2 not included earlier such that sum(k=1,n,a(n)) is prime. +20 3 1, 2, 4, 16, 8, 4096, 32, 16384, 134217728, 256, 33554432, 1073741824, 2147483648, 68719476736, 8192, 262144, 2722258935367507707706996859454145691648, 1152921504606846976, 10889035741470030830827987437816582766592

A171381 Numbers n such that (3^n + 1)/2 is prime. +20 3 1, 2, 4, 16, 32, 64 (list; graph; refs; listen; history; text; internal format)

A174677 a(n) = 2* a(n-2) *a(n-1) with a(1)=1 and a(2)=2. +20 3 1, 2, 4, 16, 128, 4096, 1048576, 8589934592, 18014398509481984, 309485009821345068724781056, 11150372599265311570767859136324180752990208

A218148 a(n) = 2^((6+5*n+n^3)/6). +20 2 1, 2, 4, 16, 256, 32768, 67108864, 4398046511104, 18446744073709551616, 9903520314283042199192993792, 1361129467683753853853498429727072845824, 95780971304118053647396689196894323976171195136475136

A114641 a(n)=2^(a(n-2)+log_{2}(a(n-1))), a(0)=1, a(1)=2. +20 0 1, 2, 4, 16, 256, 16777216, 1942668892225729070919461906823518906642406839052139521251812409738904285205208498176

A179532 a(n)=2^ceiling(n(n+1)/3). +20 0 1, 2, 4, 16, 128, 1024, 16384, 524288, 16777216, 1073741824, 137438953472, 17592186044416, 4503599627370496, 2305843009213693952, 1180591620717411303424, 1208925819614629174706176,

A094384 Determinant of n X n partial Hadamard matrix with coefficient m(i,j) 1<=i,j<=n (see comment). +10 0 1, -2, 4, 16, -32, -128, -512, 4096, -8192, -32768, -131072, 1048576, 4194304, -33554432, 268435456, 4294967296, -8589934592, -34359738368, -137438953472, 1099511627776, 4398046511104,

so the next smallest terms after 16:

2^16, 2^7, 2^6, 2^7, 2^5, 2^3, 2^5, 2^7, 2^8, 2^8, 2^7, 2^5:

histogram: 2^3: 1 2^5: 3 2^6: 1 2^7: 4 2^8: 2 2^16: 1

this seems to contradict the hypothesis that 2^3, 2^6, 2^8 are more special than 2^5, 2^7.

A014221, the Ackermann function, is the only entry in oeis with 1,2,4,16

if we don't start with 1, we also get

A051285 a(n) = a(n-1)^a(n-2), n >= 2. +20 1 2, 2, 4, 16, 65536, 115792089237316195423570985008687907853269984665640564039457584007913129639936

if we search oeis for all sequences starting with 2,4,16, excluding those for which this sequence shows up far from the start, we also get:

A019279 Superperfect numbers: sigma(sigma(n)) = 2n where sigma is the sum-of-divisors function A000203. +20 60 2, 4, 16, 64, 4096, 65536, 262144, 1073741824, 1152921504606846976, 309485009821345068724781056, 81129638414606681695789005144064, 85070591730234615865843651857942052864

A061652 Even superperfect numbers: 2^(p-1) where 2^p-1 is a Mersenne prime (A000043 and A000668). +20 56 2, 4, 16, 64, 4096, 65536, 262144, 1073741824, 1152921504606846976, 309485009821345068724781056, 81129638414606681695789005144064, 85070591730234615865843651857942052864 (list; graph; refs; listen; history; text; internal format)

A001146 2^(2^n). (Formerly M1297 N0497) +20 39 2, 4, 16, 256, 65536, 4294967296, 18446744073709551616, 340282366920938463463374607431768211456, 115792089237316195423570985008687907853269984665640564039457584007913129639936

A061286 Smallest integer for which the number of divisors is the n-th prime. +20 26 2, 4, 16, 64, 1024, 4096, 65536, 262144, 4194304, 268435456, 1073741824, 68719476736, 1099511627776, 4398046511104, 70368744177664, 4503599627370496, 288230376151711744, 1152921504606846976

A101926 2^A101925(n). +20 7 2, 4, 16, 32, 256, 512, 2048, 4096, 65536, 131072, 524288, 1048576, 8388608, 16777216, 67108864, 134217728, 4294967296, 8589934592, 34359738368, 68719476736, 549755813888, 1099511627776, 4398046511104, 8796093022208 (

A112535 Number of truth tables generated by 3CNF expressions of n variables. +20 4 2, 4, 16, 256, 43146, 120510132, 4977694100656

A063401 a(n)=a(n-1)*a(n-2)*a(n-3) with a(0)=1, a(1)=2, a(2)=2. +20 3 1, 2, 2, 4, 16, 128, 8192, 16777216, 17592186044416, 2417851639229258349412352, 713623846352979940529142984724747568191373312, 30354201441027016733116592294117482916287606860189680019559568902170379456331382784

A155543 a(n)=2*A081294(n). +20 3 2, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864, 268435456, 1073741824, 4294967296, 17179869184, 68719476736, 274877906944, 1099511627776, 4398046511104, 17592186044416, 70368744177664

A171381 Numbers n such that (3^n + 1)/2 is prime. +20 3 1, 2, 4, 16, 32, 64

A186108 Denominator of the frequency of the n-th dropping time in the Collatz iteration. +20 3 2, 4, 16, 16, 128, 256, 256, 2048, 8192, 32768, 16384, 262144, 262144, 2097152, 8388608, 16777216, 33554432, 134217728, 536870912, 1073741824, 4294967296, 2147483648, 34359738368, 137438953472, 274877906944, 274877906944, 2199023255552, 4398046511104,

A165420 a(1) = 1, a(2) = 2, a(n) = product of the previous terms for n >= 3. +20 2 1, 2, 2, 4, 16, 256, 65536, 4294967296, 18446744073709551616, 340282366920938463463374607431768211456, 115792089237316195423570985008687907853269984665640564039457584007913129639936

A186110 Denominator of the cumulative frequency of the dropping time in the Collatz iteration. +20 2 2, 4, 16, 8, 128, 256, 16, 2048, 8192, 32768, 32768, 262144, 16384, 2097152, 8388608, 16777216, 33554432, 134217728, 536870912, 1073741824, 4294967296, 4294967296, 34359738368, 137438953472, 274877906944, 137438953472, 2199023255552, 4398046511104, 4398046511104,

A051285 a(n) = a(n-1)^a(n-2), n >= 2. +20 1 2, 2, 4, 16, 65536, 115792089237316195423570985008687907853269984665640564039457584007913129639936

A079556 a(1) = a(2) = 2; for n>2, a(n) = a(n-2)^a(n-1). +20 1 2, 2, 4, 16, 4294967296

A135569 a(n) = S2(n)*2^n; where S2(n) is digit sum of n, n in binary notation. +20 1 0, 2, 4, 16, 16, 64, 128, 384, 256, 1024, 2048, 6144, 8192, 24576, 49152, 131072, 65536, 262144, 524288, 1572864, 2097152, 6291456, 12582912, 33554432, 33554432, 100663296, 201326592, 536870912, 805306368, 2147483648, 4294967296

A217994 a(n) = 2^((2+n+n^2)/2). +20 1 2, 4, 16, 128, 2048, 65536, 4194304, 536870912, 137438953472, 70368744177664, 72057594037927936, 147573952589676412928, 604462909807314587353088

A168088 2^tetranacci(n) +20 0 1, 1, 1, 2, 2, 4, 16, 256, 32768, 536870912, 72057594037927936, 324518553658426726783156020576256, 411376139330301510538742295639337626245683966408394965837152256

A168089 2^pentanacci(n) +20 0 0, 0, 0, 0, 2, 2, 4, 16, 256, 65536, 2147483648, 2305843009213693952, 1329227995784915872903807060280344576, 110427941548649020598956093796432407239217743554726184882600387580788736

A171163 Number of children at height n in a doubly logarithmic tree. Leaves are at height 0. +20 0 0, 2, 2, 4, 16, 256, 65536, 4294967296, 18446744073709551616

next terms 2^:

16, 6, 6, 8, 6, 5, 8, 7, 6, 5, 7, 8, 7, 16, 32, 6, 7, 8, 8, 8,

histogram: 5: 2 6: 5 7: 4 8: 6 16: 2 32: 1

combined histogram

2^3: 1 2^5: 5 2^6: 6 2^7: 8 2^8: 8 2^16: 3 2^32: 1

this still seems to contradict the hypothesis that 2^3, 2^6, 2^8 are more special than 2^5, 2^7. although both lines of reasoning point to 2^8 as being somewhat special

so we are left with just 1,2,4,16 as the specialest, and maybe 2^8 = 16*16 = 256 and 2^16 = 65536 also.

not sure if this numerology is useful for anything, but perhaps it suggests bit widths for instruction components, memory sizes for massively parallel machines, number of registers, amounts of parallelism, word sizes, etc.

e.g. maybe instruction components should be 1,2,4, or 16 bits, and instruction sizes should be 4 or 16 bits, machines should have 4 or 16 registers, machines should have word sizes of 4 or 16, machines should address 64k of memory, machines should have 64k processors.

otoh in the real world we have to deal with ipv6 and 64-bit machines, so we have a 2^6 and 2^7 in there already. the closest 'maybe special' to 64 is either 16 or 256, and to 128 is 256. So maybe we should make use of 256 bit widths for some things.

e.g. maybe instruction components should be 1,2,4, or 16 bits, and instruction sizes should be 4 or 16 bits, machines should have 4 or 16 or 256 registers, machines should have word sizes of 4 or 16 bits, machines should address 64k bits (8k bytes) or possibly 2^256 bits of networked memory, machines should have 64k processors.

hmm.. 2^256 is a big number.. perhaps we should give up on addressing every bit on the network and only have 'learned routes' with no absolute network addresses? also i see no need for 256 registers in BYTES but if this is in BITS that's 32 bytes of registers which seems reasonable. Otoh 16 registers with word size 16 bits also seems reasonable.

so i'm thinking:

16 bit instruction size
instruction fields of 1,2,4 bits (note: so i guess we have 2^4 = 32 basic opcodes)
4-bit or 16-bit word size? i'm leaning towards 4 actually..
16 registers (if 4-bit word size, that's 8 bytes in the registers, or 4 16-bit words)
64k words in each processor's local memory (that's 32k bytes because we have 4-bit words, or 16k 16-bit words), addressed with 16 bits
64k local processors
256 bit addresses to combine network address with local addresses (so we have room for 128-bit IPv6 addrs + 64-bit addressed local addresses, with 64 bits to spare, so we can interoperate fine; in our system we only need 16 bits for local addresses, plus 16 bits to specify the processor, so we have 224 bits to spare. Assuming we use 128 bits for network addresses, we have 128-16 = 96 bits to spare)

alternately with 1-bit 'words':

16 bit instruction size
instruction fields of 1,2,4 bits (note: so i guess we have 2^4 = 32 basic opcodes)
16 1-bit registers (that's only 2 bytes in the registers, or 1 16-bit word)
64k bits in each processor's local memory (that's 8k bytes), addressed with 16 bits
64k local processors
256 bit addresses to combine network address with local addresses (so we have room for 128-bit IPv6 addrs + 64-bit addressed local addresses, with 64 bits to spare, so we can interoperate fine; in our system we only need 16 bits for local addresses, plus 16 bits to specify the processor, so we have 224 bits to spare. Assuming we use 128 bits for network addresses, we have 128-16 = 96 bits to spare)

that seems more sensible actually; the 16-bit instructions could contain fields allowing operations to treat things as 4-bit or 16-bit or 256-bit words

so there's a massively parallel architecture for you. each node has 64k local processors and each processor has 8k memory, so each node has 2^29 bytes of memory, about half a gigabyte.

Each processor is like a neuron with 1024 afferents and 64 bits of information at each synapse.

that seems too much memory per neuron.. how about 4 bits per synapse? we're off by a factor of 2^4. But the next step down from 64k bits local memory is 256 bits, which is a step down of 2^8. Let's try it though:

16 bit instruction size
instruction fields of 1,2,4 bits (note: so i guess we have 2^4 = 32 basic opcodes)
16 1-bit registers (that's only 2 bytes in the registers, or 1 16-bit word)
256 bits in each processor's local memory (that's 8 bytes), addressed with 8 bits
64k local processors
256 bit addresses to combine network address with local addresses (so we have room for 128-bit IPv6 addrs + 64-bit addressed local addresses, with 64 bits to spare, so we can interoperate fine; in our system we only need 8 bits for local addresses, plus 16 bits to specify the processor, so we have 232 bits to spare. Assuming we use 128 bits for network addresses, we have 128-16 = 104 bits to spare)

now each node has about 524k bytes of memory. If each processor is a neuron with 256 afferents, it can store 1 bit per afferent. Not enough.

History shows that computing likes to increase bit widths, and also individual neurons have megabytes to gigabytes of DNA which is at least ROM and may also set the scale for an epigenetic RAM system, so so let's err on the larger size, with 8k cache-like local memory per processor.

as for the instruction breakdown:

if we're doing register-to-register we have 4 bits for each of two operands and 4 bits for an opcode, that's 12 bits already so we only have 4 left. that's enough for 2 bits of addressing mode for each operand.

If we have 64k words in memory then it would take all of the 16 bits in the instruction to address local memory at the level of a bit. But we could have a few bits for scale, plus the address. E.g. if we had 4 bits to address, and we used 2 bits to divide by 4, then 16/4 = 4 so we'd still be short 2 bits. Hmm..

maybe we can't have immediate mode addresses then? that seems bad..

also we need some bits for scaling (choosing a word size per-instruction) if we don't have a set word size. and we wanted to have stack-ish options in addition to registers. and it would be nice to have a third operand sometimes!

well that sucks. no wonder everyone uses words and accumulators and 32-bit+ variable length instructions. PIC has weird accumulator stuff. AVR has 16-bit instruction but also has some instructions which take an additional 16 bits. The 16-bit ARM thumb instruction set has weird variable encodings and i think only 8 registers: https://ece.uwaterloo.ca/~ece222/ARM/ARM7-TDMI-manual-pt3.pdf

ok for each 'scale bit' that we take, we can remove one bit from EACH operand, since the scale is set for the whole instruction. So with two scale bits we only need two bits for each operand, plus two addressing mode bits, and so we have two bits left over. The fourth addressing mode can be stack-like (immediate, register, register indirect, stack-like). But if registers are only 4 bits then we can't yet address 64k of memory, right? Unless scale applies twice.. then with both scale bits set, we divide by (1,2,4,8 = 8) and we have two registers with 1 byte each, and we have 64k bits/8 = 8192 bytes to address, so by using both registers as one 16-bit address we can get there. But in that case shouldn't the scale be 16?

What would be great would be if we could just dispense with registers and address memory directly. Can we get there?

We have 16 bits of memory so we need at least 3 scale bits, giving us scales of 1,2,4,8,16,32,64,256. If we had just 8 more bits to address, 2^16/256 = 2^8, and we could reach everything, although beyond 16k we could only address using 256-bit (8-byte) word sizes!

alternately with 2 scale bits we could have scales of 1,2,4,16. Then we'd need 12 bits to address the 4096 words in memory.

also note that we want to reserve half the opcodes for custom, so if we only have 4 bits we only get to define 16 of them. And we want to reserve half the addressing modes (which i guess are also like types) for custom, so if we have 2 bits per operand we need to add another bit.

So with only one operand we already have 4 bits opcode + 3 bits addressing mode = 7 bits, plus 2 scale bits = 9 bits, and that's without the operand itself (which would then be at most 7 bits if we want to hit 16 total). If we had two operands that's 4 + 2*3 + 2 = 12 bits before the operands themselves (now 2 bits are left per operand). And the 3 bits for the addressing mode are odd, which doesn't fit our numerology of 1,2,4,16.

So we could go the variable length route, and have 16 bits be the length for the instruction, with all semantic modifiers (addressing modes, scales, and whatnot) but not the operands themselves, each of which could be another 16 bits. Now we can have 3 full-length operands again, and there is no need (from an instruction coding point of view) to have a few registers rather than just accessing memory.

So now we have 16 bits to play with, and we've spent 12. We have 4 bits left. We could add 2 more scale bits, and multiply their representation, giving us 1,2,4,16,32,64,256. And we could add another bit to each operand's addressing mode to make it even.

Or we could add a third operand. 4 + 3*3 + 2 = 15 bits, 1 bit left over.

if we have only 2 scale bits, with 1,2,4,16, and we have 64k bits of memory to address, we still need only 12 bits to address it in 16-word mode (and only 1/16th, that is, 4096 bits, are available for single bit addressing). So if each of the operands is a 16-bit word, we have 4 bits left over in each operand. We can put the 2 bits of addressing mode plus the one bit custom addressing mode flag in there, and still have one bit per operand left.

So now the first word only have 4 bits for the opcode plus 2 bits for the scale. We now have 10 bits left there!

Some of those bits could be a second scale to allow us to use less than 12 bits per operand. If we only have 2 operands and we only use 4 bits per operand plus 2 bit of addr mode plus 1 bit custom addr mode plus 1 bit reserved as of now, then two operands cost (4 + 4)*2 = 16 bits. So in this case we only need 2 16-bit words instead of 3. Choosing between 4 and 16 bits per operand costs at least 1 bit. If we have 2 scale bits, then the operands could be 1,2,4, or 16 bits each. If they are 2 bits ea. then we still need 12 bits for two operands (4 + 2)*2 = 12. If they are 1 bit each then we still need 10 bits. So these options don't help as much. So let's just have 1 more scale bit.

So now we have 9 bits left in the first instruction word.

Otoh what if we have 3 operands? if they are 1 bit apiece that's (4 + 1)*3 = 15 bits. So in this case we can fit everything into just 32 bits! This is like only having two registers, but what the hey, i guess that's like using the top 2 items on the stack. So maybe we do want a second scale bit. Can we use the other option? 2 bits apiece? (4 + 2)*3 = 18. If we didn't use the other reserved operand bit that would be 15 again though and we'd be golden.

So now we have 8 bits left in the first instruction word, and the outstanding question of what to do with a 2 in the second scale reg.

Now what if we put some operands into those 8 bits? we'd have:

(4+4)*2 = 16 - 8 = 8 (4+2)*2 = 12 - 8 = 4 (4+1)*2 = 10 - 8 = 2

(4+4)*3 = 24 - 8 = 16 (4+2)*3 = 18 - 8 = 10 (4+1)*3 = 15 - 8 = 7

So if we use the last 8 bits for operands then we could fit 3 4-bit operands into two 16-bit words. And even if we don't, if we have two scale bits and use 1-bit operands, we can fit 3 1-bit operands into two 16-bit words.

btw note that a hypercube, as noted in the Connection Machine, might be a good network topology for 2^16 processors; each processor would have 16 immediate neighbors, and the graph (diameter? max hops) would be 16.

this thing with the 4 bits suggests 16 memory-mapped registers..

also we can get rid of the useless '2' scale option by using variable length scale..

note: the 8 KiB? of memory indicated above (via 2^16 bits) is already 8x less than the 256 KiB? in a Cell processor SPE (there are 8 SPEs per cell), and less than the L1 instruction cache on many processors (which is often at least 16 KiB?)

but 4x more than the RAM in a Parallax Propeller Cog (there are 8 Cogs per propeller), which is 512 32-bit words, or 2 KiB?

and of course we are going to have 65536 processors, not 8, an increase by a factor of 8k

and what if you said, 65536 = 2^(2^(2^2)), but instead of having 65536 bits, since we know 16 is special (2^2^2), we'll have 65536 16-bit words? that's

(2^2^2) * 2^(2^(2^2))

to complete the pattern,

2 * (2^2) * (2^2^2) * 2^(2^(2^2))

2 * 4 * 16 * 65536

2^(1 + 2 + 4 + 16)

2^23

8388608

2^23 bits is 2^20 bytes, which is one mebibyte.

It's not clear to me that this makes things better, though. 23-bit addressing?

---

so, what are the 'most special' powers of 2?

as noted above, we have the sequence with initial condition 1 and inductive step 2^(previous member), which starts:

1, 2, 4, 16, 65536

after that, we have powers of 2 whose exponent is itself a power of two, which starts:

2, 4, 16, 256, 65536 (adding only 256 to the previous sequence)

next we have everything of the form 2^(n*k), where n is a power of 2 (ie of the form 2^a), and k is one of 0,1,2,3 (or maybe 1,2,3,4; you get the same answer either way, except for 1) (the point is that k ranges over numbers up to a power of two, in fact, up to the first power of two after 2, although one could imagine that in another, further step, we would add in numbers of the same form except where k ranged up to 2^3=8 or maybe 2^4=16). When k is 0 you get 1, when k is 1,2, or 4, then n*k a power of two (so we just get 2^n where n is a power of two, which we've covered previously; 2,4,16,256,65536); so the only new case is when k is 3. So we get 2^(2*3) = 2^6, 2^(2^2*3) = 2^12, 2^(2^3*3) = 2^24, 2^48, 2^96, with the "most special" of those being where n was a power of a power of two, that is, 2^6, 2^12, 2^48:

64, 4096, 281474976710656 (or about 281.5e12, or about 281.5T; note that 4096 = 64^2, and 281474976710656 = 4096^4)

in summary, the most special (thru 2^16) are:

2, 4, 16, 65536

and then 256

and then 64, 4096, 12

Where is the 8 found in the ubiquitous 8-bit "byte"? We do see 8 in the above by starting with 256 and taking log_2(256) = 8. But this method also gives us log_2(64) = 6 and log_2(4096) = 12. 6, especially, does not seem very "2-ish". My opinion is that 8 is actually not that special (as a power of two, that is; the fact that it is a cubic number is of interest in itself, but for different reasons. It has other interesting properties, such as being a bad size for committees: http://arxiv.org/pdf/0804.2202v1.pdf ; ). In fact, old computers used to have other preferred bit-lengths. My guess is that 8 was settled on because (a) 256 values is about right for holding something like ASCII (all alphanumeric latin characters, then some punctuation), plus leaving half the values unassigned; (b) 255 is bigger than most small integers you want to do arithmetic on, big enough to hold the size of small lists you might want, and big enough to fit instruction sets in comfortably.

4: 4 seems to the number of assembly-language-computer-like stuff; 4 1-bit flags, etc.

12 is interesting as the result of 16 - 4; eg if you have a 16-bit bitfield, and you need to use 4 of those bits for something, then you have 12 bits left to use for something else.

In powers-of-two-land, 16 is an elaboration of 4 (2^4); also very powers-of-two-sy/computersy

4096 is often found in computers, eg in memory page sizes. 4096 (and sometimes, 256) seem to be regarded as 'the smallest chunk of memory in which you could put something useful'.

2^16 = 65536 is the smallest power of two bitwidth that can really hold a decent amount, the length of lists that aren't 'small' (that is to say, the smallest number of the form 2^(2^n) such that the resulting number of instructions can hold a decently-sized computer program). 65536 to me seems like the powers-of-two equivalent of how some human languages/cultures use specific largish numbers as a placeholders for 'arbitrarily large' (but not necessarily infinite) (eg 10,000: http://www.merriam-webster.com/dictionary/myriad ; various examples of 10,000 in classical Chinese literature). 65536 is the first power of two bit-length that is a reasonable size for the number of memory locations in a general-purpose computer (although if you don't care if the bit-length is a power of 2, 2^12 = 4096 is a reasonable choice).

(this sort of reasoning makes you wonder why Unicode didn't make characters fit within 2 bytes; shouldn't 65536 be enough to express ourselves? I'm not very knowledgable about this but my impression is that they initially tried; a Unicode encoding called UCS-2, which uses two bytes, used to be popular; but politically, Unicode had to take the union of all of the characters of all large or powerful human communities, which led to a demand for more than 65536 code points; note that this is different from saying that 65536 isnt enought to form an alphabet, this is saying that 65536 isnt enough to encode all alphabets that through historical accident have been created so far and still seem important today (note: some ideogram languages do have more than 65536 ideograms, but perhaps not much more (i heard 70,000 for Chinese once), and i bet many of these are not in active use outside of proper names). To this day, the "BMP" or Basic Multilingual Plane, which is the first 65536 unicode code points, is often talked about.)

It is interesting to consider what the corresponding numbers for some non-binary number systems would be. For 3: 3^(3^1) = 27 (so a trinary number in 3 trinary memory locations, or 'trits', can hold 27 distinct values); 3^(3^2) = 19683. 3^(3^3) = 7625597484987. 3^(3^(3^3)) = too large to work with. So, the equivalent of the formula for 65536 ((2^(2^(2^2))) = 2^(2^(4)) = 2^16; the number of locations that can be addressed by (the number of bits that can be addressed by (the number of bits that can be addressed by (the number of bits that can be addressed by (one bit))))) is too large to work with. Even 3^(3^3) is in the trillions. 3^(3^2) = 3^9 is the one that is on the order of 65536; the smallest 'special number' for base 3 that denotes a number of instructions that could hold a decently-sized computer program; this is the number of locations that can be addressed by (the number of trits that can be addressed by (two trits)). For 5: 5^5 = 3125; 5^(5^2) = 298023223876953125, which is one the order of 298023 trillion (roughly on the order of 2^64). For 12, 12^12 = 8916100448256, roughly 9 trillion, or a little less than 2^48.

---

could use sequence of 2^n bitwidths (= sequence '2^(2^n)' addressable locations) for different things; this seems to be close to how these are actually used:

4 bits/16 locations: number of registers
8 bits/256 locs: size of a byte (smallest addressable memory location); sometimes # of registers
(note: not a 2^2^n but memory page sizes often tend to be 4k)
16 bits/64k: L1 cache sizes (eg in recent processor generations, although this is often split into separate icache and dcache)? ??processor-local memory? cache page/smallest unit of allocable memory/message size/width of shared memory channels (actually 4k seems to be used for that stuff)?
32 bits/4GiB?: processor-local memory (eg max in Cell processor SPEs; in a recent generation actual size was 256k for each of 8 SPEs, which is 2 GiB? total over all 8 processors)?
64 bits/16 EiB?: node-local 'main memory' (eg in commodity computers, which are now typically 64-bit); sometimes the memory access alignment size (eg align to 8 bytes; smallest aligned addressable memory location; 'word size')
(note: some cryptographic research appears to assume that no adversary can do more than ~2^80 steps: [1])
128 bits: ('local') network addresses (ipV6)
256 bits: word size for some applications (eg Ethereum; see https://github.com/ethereum/wiki/wiki/Design-Rationale section '32 byte word size')
denotes bitwidths that are in the sequence 2, 2^2, 2^(2^2), 2^(2^(2^2)), etc; so eg 1 bit can address each bit in 2 bits, 2 bits can address 4 bits, 4 bits can address each bit in 16 bits, 16 bits can address each bit in 64k bits (= 8k bytes)

shorter summary:

4 bits: # of bits to address 16 registers
8 bits: size of a byte (smallest addressable memory location; smallest word size); sometimes # of registers, esp. if there are register windows or other hidden register/register paging scheme
16 bits: some form of processor-local memory
32 bits: some form of processor-local memory
64 bits: node-local 'main memory'/'shared memory'; aligned/preferred word size
128 bits: network
256 bits: largest word size

so what we might do (consolidating/leaving out every other step above; this lets us keep the all-important 4, 16, and also keep the commodity hardware 64, although we lose the 8 (the byte!)):

4 bits: have 16 registers in low-level instruction encoding; also various instructions have 4-bit operands that address bits in 16-bit values, or that set 4 flags, or that choose 16 variants
16 bits: maximum size of (virtual) processor-local(/low-level threadlocal) memory; smallest unit of addressable memory, but with an instruction set that lets you treat a 16-bit word as a packed field of 1, 2-, 4- or 16- bit quantities; unit of instruction encoding
64 bits: maximum size of per-node 'main memory'/'shared memory'; medium word size
256 bits: largest word size; network addresses

this could perhaps be continued in another pattern: 1, 2, 4, 16, 64, 256, ... = 2^0, 2^1, 2^2, 2^4, 2^6, 2^8, ... the differences in the powers of 2 are 1, 1, 2, 2, 2; suggesting it follows with 3,3,3,3,4,4,4,4,4, 5,... so , 2^11, 2^14, 2^17, 2^20, 2^24, 2^28, 2^32, 2^36, 2^40, 2^45... but this seems messy.. so maybe follow with 4,4,4,4, 16,16,16,16,16: 2^12, 2^16, 2^20, 2^24, 2^40, 2^56, 2^72, ... but we've skipped 2^32 and 2^64 so maybe follow with 4,4,4,4, 8,8,8,8,8, 16, ...: 2^12, 2^16, 2^20, 2^24, 2^32, 2^40, 2^48, 2^56, 2^64, 2^80, ...; this corresponds to bit widths of 1,2,4,16,64,256,4096,65536,...

indeed the last series seems to catch most of the useful/most used numbers (8 is conspicuously absent, though): 1,2,4,16,64,256,4096,65536,

how does it proceed from there?

1 Mib, 16 MiB?, 4 GiB?, 1 TiB?, 256 TiB?, 64 PiB?, 16 EiB?, 1 YiB?

so the first 16 elements of this series are:

1,2,4,16,64,256,4096,65536, 1 Mib, 16 MiB?, 4 GiB?, 1 TiB?, 256 TiB?, 64 PiB?, 16 EiB?, 1 YiB?

---

to summarize the previous segment:

a useful series is: 1,2,4,16,64,256,4096,65536, 1 Mib, 16 MiB?, 4 GiB?, 1 TiB?, 256 TiB?, 64 PiB?, 16 EiB?, 1 YiB?, ... (computed as 2^0, 2^1, 2^2, 2^4, 2^6, 2^8, 2^12, 2^16, 2^20, 2^24, 2^32, 2^40, 2^48, 2^56, 2^64, 2^80, ...; the differences of powers of 2 are 1,1,2,2,2, 4,4,4,4,8,8,8,8,8, 16, ...)

a computer (or VM) could be built as:

word size 16 bits
16 registers
16 addressing modes
instructions to work in word sizes of one of: 1, 2, 4, 16, 64, 256, platform-specific pointer (guaranteed <= 256?), arbitrary precision arithmetic; and signed or unsigned (the 1,2,4 choices treat a 16-bit word as a packed array of 1-, 2-, 4-bit subwords)
??: memory page size/pool size of 4096 words?
(virtual) processor-local (threadlocal) memory of (max) 65536 words (16 bit addressing) (??:16 pages of 4k words each?)
(maximum) 'main memory' (per-node interprocessor shared memory) address space of 64 bits
(maximum) 'network' address space of 256 bits

---

could also use the above series as the bitstring for the initial prefix of arbitrary precision scales: eg

000001 means #6 on that series, 256; that then becomes the 'word size' (in words, or in bits? probably words) of the following data structure. Eg if the data structure is just one atomic (arbitrary-precision) word, then if initial prefix is 000001, then the following quantity occupies 256 words (256 16-bit words). In practice though would probably just use 16-bit words for the prefix, in which case the above gets too big too fast; just say that if the prefix is 0, keep going; so a 1-word prefix specifies any length from 1 to 64k-1, a 2-word prefix (so, the first word is 0) from 64k to 4GiB?, with a step size of 64k, etc.

similarly, could use the above series in a barrel shifter for compressed representation of common constants. Eg:

s1 + s2*x, where x is the number actually represented in the main operand, and s1 and s2 are entries from the above series. If the main entry is 8 bit, and the other 2 are each 4 bits, for a total of 16 bits, then could represent eg 2^16 + 64*200. In this case, however, there's not much need for the series entries under 8 bits for s1; so could use 3 bits for the other two, have a 10 bit main entry, and have the series be: 1 (or zero, if used in the addition), 256, 4096, 65536, 1MiB?, 16MiB?, 4GiB? (2^32), 1TiB?. Or, have a 12-bit main entry, and have the series be: 1 (or zero, for s1), 4096, 65536, 1MiB? (2^20). This lets every number from 0 to 4096 be represented, plus many other numbers up to 2^24 + 2^12*2^20 = ~2^32. As a refinement, change the series for s1 to be more useful: 0, -1, +1, (mode); and the series for s2 is 1, 4096, 65536, 1Mib (2^20); this lets every number from -1 to 4096 be represented, and some nubers up to 2^32 + 4096. If we are trying to express a 32-bit value we can use -1 and 2^32 + 1 as two distinct sentinels; and we can use (mode) as a packed flag (for example, to represent sign). This is pretty complicated though, and all it allows us to do is to express many useful 32-bit quantities in 16-bits, while still being able to express 12 bits exactly; not sure if saving 16-bits (over just using 16 bits) is worth the complexity there; otoh the mode and the sentinel are also very valuable. But maybe we can get similar things out of floating point half-precision (16-bit), which is more standard, although there we are given two 'modes': negativity, and fractionness (

x	< 1) and a 10 bit significand rather than 12, and we can only go up to 65520? Eh actually floating point sounds even more difficult to deal with.

however, maybe an easier thing to do would be to just say:

arithmetic types are signed or unsigned, integer or floating point, and bit-widths of one of: 1, 4, 16, 64. Arbitrary arithmetic has its own instructions.

---

so in summary we're at:

word size is 16 bits
arithmetic ops available in: bit-widths of one of: 1, 4, 16, 64, arbitrary (rational)
fixed-width arithmetic types are signed or unsigned, integer or floating point, and bit-widths of one of: 1, 4, 16, 64
(virtual) processor-local (threadlocal) memory of (max) 65536 words (16 bit addressing of 16-bit words)
(maximum) 'main memory' (per-node interprocessor shared memory) address space of 64 bits

---

if we use the threadlocal memory for a stack, how do we spill the stack?

instead of doing it one-by-one as we call and return from functions, could spill/unspill the stack so that after each time, half of the 64k local stack memory is unused. Eg if we are calling, then spill the top 32k or so, swap the bottom 32k with the top, and now we have 32k free under us for more calls; and if returning, only load 32k, so that again we have 32k under us for calls. Note that this implies that it would be great to have a 'flip' operation that (effectively; we could do this by just taking the stack pointer mod n where n is the stack size, and keeping track of when we go over the boundary and the midpoint) swaps the bottom 32k of local memory with the top (or rather, copies the bottom into the top and ignores the top)

---

https://github.com/ethereum/wiki/wiki/RLP provides a variable-length binary encoding for semi-arbitrary-length byte arrays and lists (the items of the list can be lists, or byte arrays); by 'semi-arbitrary-length' i mean that it can't actually represent arbitrarily large objects, but it can represent very large objects. RLP can represent objects up to length 2^64-1, i think. It is optimized so that a single byte whose value is <128 can be represented in a single byte (no length header byte is needed); and so that a single length header byte can represent the length of byte arrays or lists of lengths up to 55 (inclusive); longer bytearrays and lists require a length-header-length byte followed by a length header (which may be up to 8 bytes long) followed by the payload.

furthermore, the representation of integers is defined as big endian binary form with no leading zeros (i don't quite get the 'no leading zeros' requirement; perhaps they are trying to ensure that there is only one encoding for a given number? this makes sense as the one of the application requirements of RLP in Etherium is to make sure that the same value is always encoded the same way, so that if two clients encode the same value they will have the same hashe; https://github.com/ethereum/wiki/wiki/Design-Rationale mentions this).

furthermore, the representation of dictionary is defined to be [[k1,v1],[k2,v2]...] with keys in lexicographic order (although the RLP page also mentions a Patricia Tree encoding, which i think is sloppy wording; i think what they mean is that there is also another data type, a Patricia Tree).

this is presented as a serialization format.

imo one potential problem here is that this serialization does not include type information, eg the same entry in a list might be interpreted as either an integer or a string, and some lists might be able to be alternately interpreted as dicts or as Patricia Trees. They are assuming that in their application the client already knows the data types, which is probably valid but imo may lead to confusion with security implications later if a protocol designer makes a mistake.

they mention bencode, protobuf, BSON as alternative serialization formats.

bencode: has byte strings, integers, lists, dicts. Does encode the type of items. Encodes integers as "integer encoded in base ten ASCII"! Has closing delimiters. There is only one possible encoding for each value. Makes some attempt to be human-readable, but since it allows byte strings, is not completely human-readable and also is not text editor-safe.

protobuf: scalar types: double, float, int32, int64, uint32, uint64, sint32, sint64, fixed32 (alternative int32 more efficient for large values), fixed64, sfixed32, sfixed64, bool, string, bytes [2]. composite types: enum, message, repeated (array), maps (syntactic sugar for a 'message' header plus a 'repeated'), timestamp, duration, Struct, wrapped types (can be null), FieldMask? (?), ListValue? (?difference from repeated?), Value (?is this just a type that could be any scalar value?), NullValue?, Any (dynamic type)

https://en.wikipedia.org/wiki/Protocol_Buffers https://developers.google.com/protocol-buffers/docs/overview?hl=en

bson: types: string, integer (32- or 64-bit), double, date (integer number of milliseconds since the Unix epoch), byte array (binary data), boolean, null, BSON object, BSON array

Why 16-bit?

8- and 16-bit processors appear to be unnecessarily spartan; even in the embedded space there are lots of 32-bit processors available. But, within the family of embedded processors, 16-bits appears to be squeezed by 8-bit and 32-bit; that is, there are many 32-bit processors, and several 8-bit processors, but only a few 16-bit processors. So, if we are not going spartan, why not use 32-bit; and if we are, why not use 8-bit?

We chose 16-bit rather than 8-bit because many 8-bit processors use pointers of size greater than 8-bits. 16-bits seems to be the smallest multiple of 8-bits that is wide enough to address a 'decent' amount of memory (for example, the CPUs in old Apple IIs and Commodore 64s could address 16-bits of memory). We are going for simplicity here, and it's simpler if we assume that a pointer can fit in one word of memory.

Why am I even interested in going as low as 16-bits? The ultimate purpose of Boot is a substrate for Oot, and we don't imagine that a high-level language like Oot will be the best choice for the manual, resource-constrained projects that often characterize embedded systems programming; so why should we care about a small bitwidth? The reason is that I'm interested in creating a programming language that might be good for making massively concurrent programs, because I think the brain works that way (and by massively concurrent, I don't mean hundreds of processors, I mean tens of thousands, or preferably billions). In order to achieve this amount of concurrency, I imagine that a hardware architecture would probably scale way back on per-processor complexity. I doubt that the elementary components of the brain (whether these turn of to be synapses, neurons, or clusters of neurons) compute with a high degree of numerical precision -- and in fact, I believe that the brain's components have a lot of noise/errors, which would further limit accuracy to even less than whatever precision it has. So, I suspect that hardware that supports massive concurrency may be composed of individual processors that are simpler (smaller bitwidth, etc) than we are accustomed to today. Of course, it's exceptionally unlikely that I can guess future architectural specifications to a degree that I can make Boot compatible with a class of machine that doesn't yet exist --- but by limiting ourselves in this way we at least make it more likely that it will be less troublesome to create a variant of Boot in the future, than it would be if we based everything on, say, 64-bit floating point.

Interestingly, there's some evidence that artificial neural networks can use 8-bit values for execution, and 16-bits for training (although stochastic rounding, which is non-standard, may be needed). [9] [10] [11]. This is not conclusive, but merely adds one more application to the list of applications that don't need more than 16-bits.

Why not some number of bits in between 8 and 16? Practically, there aren't many implementations that use something in between, and more subjectively, 16 is a power-of-two which seems aesthetically appropriate for systems based on binary logic.

Tangentially but interestingly, 16 is a nice number subjectively/aesthetically/'numerologically', and has lots of connections to the numbers 2 and 4, and also has some symmetries relating to exponentiation of 2s and 4s. 16 = 2*2*2*2 (4 2s multiplied together). 2^(2^2) = (2^2)^2 = 16; 16 is the last power of two that can be written that way 'associatively' because exponentiation is not associative and for larger series of 2s raised to each other, the order of operations matters, eg (222)2 != 2(222). In fact, 16 is the only integer n such that there are distinct integers x,y such that: x^y = y^x (eg 2^4 = 4^2). Because 16 = 2^(2^2), it is a member of the sequence 1,2,4,16,65536,..., which starts with 1 and then raises 2 to the previous sequence member. This sequence has an 'application' in the following sequence of bitwidths: 1-bit; the bitwidth whose bits can themselves be addressed by a 1-bit address (which is 2-bit); the bitwidth whose bits can themselves be addressed by a 2-bit address (which is 4-bit); the bitwidth whose bits can themselves be addressed by a 1-bit address (which is 2-bit); the bitwidth whose bits can themselves be addressed by a 4-bit address (which is 16-bit); the bitwidth whose bits can themselves be addressed by a 16-bit address (which is 65536-bit); (also, 16 is the last widely available member of this sequence, because 65536-bit bitwidths is obviously way bigger than found in present-day commodity implementations). Another way to think of this 'application' is: start with a bit as an primitive index; you can index 2 items. Now, you can use your two-item memory to store 2 1-bit indices; this allows you to build a memory that can index 2^2 = 4 items (you have 2 dimensions of indices, and the index in each dimension is 1 bit, so you can index 2*2 = 4 items); now, you can use your 4-item memory to store 4 1-bit indices; etc. And because 16 = (2^2)^2, it is a member of the sequence 2,4,16,256,..., which starts with 2 and then squares the previous sequence member (perhaps there is a similarly compelling computersy 'application' for this sequence?). Simplifying the numbers in parentheses of 16 = 2^(2^2) = (2^2)^2, we see that 16 = 2^4 = 4^2. 16 is a power of a prime (2^4); since 4 is the first composite number, 16 is the first power of a prime where the number that the prime was raised to is composite. 16 is a also a square number (4^2). 16 is a 'practical' number (defined as a positive integer n such that all smaller positive integers can be represented as sums of distinct divisors of n) (all powers of 2 are practical). 16 has 4 identical prime factors (if you count multiplicity), and it is equal to 4*4. The 16-cell (sometimes called a '4-D hyperoctahedron', '4-dimensional cross-polytope' or '4-D orthoplex', or a 'hexadecachoron'), is one of only 6 regular polytopes in 4-D, and it has 16 faces. The 16-cell honeycomb is one of only 3 regular tesselations of (Euclidean) 4-D space. Of course, all small integers have lots of 'special properties' like these, and 8 has a similar list. However, 8 doesn't have the x^y = y^x property, nor can it be represented as a bunch of 2s exponentiated together, and although it's a power of 2, 8 is the third power of 2, and 3 is not a power of two; and its geometric relevance is mainly to 3-D, in contrast to 16's relevance to 4-D, and again 4 is a power of 2 and 3 is not; so in a way 16 seems to be more 'purely' connected to 2 and its powers, whereas 8 has a strong connection to 2 but also a connection to non-power-of-two 3 (in place of 16's connection to 4). Since usage of a number as a bitwidth is intimately connected both to exponentiation and to 2, aesthetically, 16 seems to have some qualities to recommend it for this purpose, even if 8, the next smaller power-of-two, is smaller and is therefore simpler. 32, the next larger power-of-two, lacks these qualities and is also larger, so aesthetically it's not as nice for this purpose. 4 is even more simple than 8 and unlike 8 is 'purely' connected to 2, but a bitwidth of 4 is clearly 'too small' in that it's not very available in commodity implementations. This 'numerological' stuff may not have any real practical bearing on the decision, but it shows that 16 is an aesthetically nice choice for a bitwidth, in a way that 8 and 32 are not.

proj-oot-ootNumerology