powers of 2

---

what bit width for various things?

64-bit is a commonly used width

according to http://en.wikipedia.org/wiki/128-bit , "...there are currently no mainstream general-purpose processors built to operate on 128-bit integers or addresses...".

IPv6 has 128-bit addresses

following that advice to take the current value of something and double it, that suggests that either 256-bit or (2^7 = 128, 2^14 = 16384) 16384-bit is a comfortable maximum.

the number of atoms in the universe is about 10^80, which is around 2^256

this suggests that the max bitwidth should be between 2^8 and 2^256, or between 2^8 and 2^(2^8)

one bit per atom seems a bit much. halving the latter 8, we get 2^(2^4) = 65536 bits, which is 256^2 bits or 8 KiB?. That seems like a very large upper bound already for a bitwidth, when you compare it to 64 bits (8 bytes); it's an increase from 64 bits of a factor of 1024 (2^10); whereas 64 = 2^6, and the increase from 8-bit is only 8.

so 16384 seems better.

or we could even take 2^6 = 64 and double the 6 to get 2^12 = 4096 bits, or 0.5 KiB?.

that seems safe; with a field of 2^12 bits, we could already represent numbers higher than the number of atoms in the universe

if we never need numbers bigger than 2^(2^8), then we never need bit fields bigger than 2^8. so 256-bit (32 bytes) seems safe.

and if we never need numbers bigger than half of 2^(2^8), then 128-bit seems safe. So perhaps IPv6 is already crazy conservative.

so let's choose 128-bit as our max.

powers of 2 are special. are there special powers of two? i guess so:

2^2 = 4 2^2^2 = (2^2)^2 = 2^(2^2) = 2^4 = 16 ((2^2)^2)^2 = 2^2^2^2 = 2^8 = 256 2^(2^(2^2)) = 2^16 = 65536 2^(2^(2^(2^2))) = too big (((2^2)^2)^2)^2 = 2^2^2^2^2 = 2^16 = 65536 ((((2^2)^2)^2)^2)^2 = 2^2^2^2^2^2 = 2^32 = 4294967296 (((((2^2)^2)^2)^2)^2)^2 = 2^2^2^2^2^2^2 = 2^64 = 18446744073709551616 etc

so 1,2,4,16,65536 is special and 1,2,4,16,256,65536,2^32,2^64, etc is special

and presumably 1,2,16 (elements 1,2,4 of the first) and 1,2,16,2^64,.. (elements 1,2,4,8 of the second) are extra special

and presumably 2^1, 2^2, 2^4, 2^16 and 2^1, 2^2, 2^4, 2^16, 2^256

so recurring specialness are:

1,2,4,16

and less so: 65536 (2^16)

and less so: 2^8, 2^64, 2^256 (which implies that 8 and 64 and 256 are more special than 32, 128, that is, that 2^3, 2^6, 2^8 are more special than 2^5, 2^7)

searching for 1,2,4,16 on oeis, and ruling out all those that can be immediately seen by me to contain non-powers of 2, we get:

http://oeis.org/search?q=1%2C2%2C4%2C16&sort=&language=&go=Search

A014221 a(n+1) = 2^a(n) with a(0) = 0. This is the Ackermann function A_3(n+1) as defined in the Comments line. +20 52 0, 1, 2, 4, 16, 65536 (list; graph; refs; listen; history; text; internal format) OFFSET

0,3 COMMENTS

Next term has 19729 digits - Benoit Cloitre, Mar 28 2002

A073924 Smallest power of 2 that is greater than the previous term such that every partial sum (n>1) is a prime. +20 7 1, 2, 4, 16, 128, 65536, 9007199254740992, 73786976294838206464, 205688069665150755269371147819668813122841983204197482918576128

A060656 a(n)=2a(n-1)*a(n-2)/a(n-3) with a(0)=a(1)=1. +20 6 1, 1, 2, 4, 16, 64, 512, 4096, 65536, 1048576, 33554432, 1073741824, 68719476736, 4398046511104, 562949953421312, 72057594037927936, 18446744073709551616, 4722366482869645213696, 2417851639229258349412352

A001901 Successive numerators of Wallis's approximation to pi/2 (reduced). +20 4 1, 2, 4, 16, 64, 128, 256, 2048, 16384, 32768, 65536, 262144, 1048576, 2097152, 4194304, 67108864, 1073741824, 2147483648, 4294967296, 17179869184, 68719476736, 137438953472, 274877906944,

A053038 The first (largest) power of 2 arising in the iteration-sequence when A051593(cototient function) is repeatedly applied starting with n!. +20 3 1, 2, 4, 16, 32, 128, 512, 4096, 16384, 2048, 8192, 65536, 16384, 65536, 8388608, 134217728, 8388608, 8388608, 16777216, 2097152, 8388608, 268435456, 4398046511104, 70368744177664, 67108864, 67108864, 67108864, 68719476736 (list; graph; refs; listen; history; text; internal format)

A073923 a(1)=1, a(n) is the smallest power of 2 not included earlier such that sum(k=1,n,a(n)) is prime. +20 3 1, 2, 4, 16, 8, 4096, 32, 16384, 134217728, 256, 33554432, 1073741824, 2147483648, 68719476736, 8192, 262144, 2722258935367507707706996859454145691648, 1152921504606846976, 10889035741470030830827987437816582766592

A171381 Numbers n such that (3^n + 1)/2 is prime. +20 3 1, 2, 4, 16, 32, 64 (list; graph; refs; listen; history; text; internal format)

A174677 a(n) = 2* a(n-2) *a(n-1) with a(1)=1 and a(2)=2. +20 3 1, 2, 4, 16, 128, 4096, 1048576, 8589934592, 18014398509481984, 309485009821345068724781056, 11150372599265311570767859136324180752990208

A218148 a(n) = 2^((6+5*n+n^3)/6). +20 2 1, 2, 4, 16, 256, 32768, 67108864, 4398046511104, 18446744073709551616, 9903520314283042199192993792, 1361129467683753853853498429727072845824, 95780971304118053647396689196894323976171195136475136

A114641 a(n)=2^(a(n-2)+log_{2}(a(n-1))), a(0)=1, a(1)=2. +20 0 1, 2, 4, 16, 256, 16777216, 1942668892225729070919461906823518906642406839052139521251812409738904285205208498176

A179532 a(n)=2^ceiling(n(n+1)/3). +20 0 1, 2, 4, 16, 128, 1024, 16384, 524288, 16777216, 1073741824, 137438953472, 17592186044416, 4503599627370496, 2305843009213693952, 1180591620717411303424, 1208925819614629174706176,

A094384 Determinant of n X n partial Hadamard matrix with coefficient m(i,j) 1<=i,j<=n (see comment). +10 0 1, -2, 4, 16, -32, -128, -512, 4096, -8192, -32768, -131072, 1048576, 4194304, -33554432, 268435456, 4294967296, -8589934592, -34359738368, -137438953472, 1099511627776, 4398046511104,

so the next smallest terms after 16:

2^16, 2^7, 2^6, 2^7, 2^5, 2^3, 2^5, 2^7, 2^8, 2^8, 2^7, 2^5:

histogram: 2^3: 1 2^5: 3 2^6: 1 2^7: 4 2^8: 2 2^16: 1

this seems to contradict the hypothesis that 2^3, 2^6, 2^8 are more special than 2^5, 2^7.

A014221, the Ackermann function, is the only entry in oeis with 1,2,4,16

if we don't start with 1, we also get

A051285 a(n) = a(n-1)^a(n-2), n >= 2. +20 1 2, 2, 4, 16, 65536, 115792089237316195423570985008687907853269984665640564039457584007913129639936

if we search oeis for all sequences starting with 2,4,16, excluding those for which this sequence shows up far from the start, we also get:

A019279 Superperfect numbers: sigma(sigma(n)) = 2n where sigma is the sum-of-divisors function A000203. +20 60 2, 4, 16, 64, 4096, 65536, 262144, 1073741824, 1152921504606846976, 309485009821345068724781056, 81129638414606681695789005144064, 85070591730234615865843651857942052864

A061652 Even superperfect numbers: 2^(p-1) where 2^p-1 is a Mersenne prime (A000043 and A000668). +20 56 2, 4, 16, 64, 4096, 65536, 262144, 1073741824, 1152921504606846976, 309485009821345068724781056, 81129638414606681695789005144064, 85070591730234615865843651857942052864 (list; graph; refs; listen; history; text; internal format)

A001146 2^(2^n). (Formerly M1297 N0497) +20 39 2, 4, 16, 256, 65536, 4294967296, 18446744073709551616, 340282366920938463463374607431768211456, 115792089237316195423570985008687907853269984665640564039457584007913129639936

A061286 Smallest integer for which the number of divisors is the n-th prime. +20 26 2, 4, 16, 64, 1024, 4096, 65536, 262144, 4194304, 268435456, 1073741824, 68719476736, 1099511627776, 4398046511104, 70368744177664, 4503599627370496, 288230376151711744, 1152921504606846976

A101926 2^A101925(n). +20 7 2, 4, 16, 32, 256, 512, 2048, 4096, 65536, 131072, 524288, 1048576, 8388608, 16777216, 67108864, 134217728, 4294967296, 8589934592, 34359738368, 68719476736, 549755813888, 1099511627776, 4398046511104, 8796093022208 (

A112535 Number of truth tables generated by 3CNF expressions of n variables. +20 4 2, 4, 16, 256, 43146, 120510132, 4977694100656

A063401 a(n)=a(n-1)*a(n-2)*a(n-3) with a(0)=1, a(1)=2, a(2)=2. +20 3 1, 2, 2, 4, 16, 128, 8192, 16777216, 17592186044416, 2417851639229258349412352, 713623846352979940529142984724747568191373312, 30354201441027016733116592294117482916287606860189680019559568902170379456331382784

A155543 a(n)=2*A081294(n). +20 3 2, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864, 268435456, 1073741824, 4294967296, 17179869184, 68719476736, 274877906944, 1099511627776, 4398046511104, 17592186044416, 70368744177664

A171381 Numbers n such that (3^n + 1)/2 is prime. +20 3 1, 2, 4, 16, 32, 64

A186108 Denominator of the frequency of the n-th dropping time in the Collatz iteration. +20 3 2, 4, 16, 16, 128, 256, 256, 2048, 8192, 32768, 16384, 262144, 262144, 2097152, 8388608, 16777216, 33554432, 134217728, 536870912, 1073741824, 4294967296, 2147483648, 34359738368, 137438953472, 274877906944, 274877906944, 2199023255552, 4398046511104,

A165420 a(1) = 1, a(2) = 2, a(n) = product of the previous terms for n >= 3. +20 2 1, 2, 2, 4, 16, 256, 65536, 4294967296, 18446744073709551616, 340282366920938463463374607431768211456, 115792089237316195423570985008687907853269984665640564039457584007913129639936

A186110 Denominator of the cumulative frequency of the dropping time in the Collatz iteration. +20 2 2, 4, 16, 8, 128, 256, 16, 2048, 8192, 32768, 32768, 262144, 16384, 2097152, 8388608, 16777216, 33554432, 134217728, 536870912, 1073741824, 4294967296, 4294967296, 34359738368, 137438953472, 274877906944, 137438953472, 2199023255552, 4398046511104, 4398046511104,

A051285 a(n) = a(n-1)^a(n-2), n >= 2. +20 1 2, 2, 4, 16, 65536, 115792089237316195423570985008687907853269984665640564039457584007913129639936

A079556 a(1) = a(2) = 2; for n>2, a(n) = a(n-2)^a(n-1). +20 1 2, 2, 4, 16, 4294967296

A135569 a(n) = S2(n)*2^n; where S2(n) is digit sum of n, n in binary notation. +20 1 0, 2, 4, 16, 16, 64, 128, 384, 256, 1024, 2048, 6144, 8192, 24576, 49152, 131072, 65536, 262144, 524288, 1572864, 2097152, 6291456, 12582912, 33554432, 33554432, 100663296, 201326592, 536870912, 805306368, 2147483648, 4294967296

A217994 a(n) = 2^((2+n+n^2)/2). +20 1 2, 4, 16, 128, 2048, 65536, 4194304, 536870912, 137438953472, 70368744177664, 72057594037927936, 147573952589676412928, 604462909807314587353088

A168088 2^tetranacci(n) +20 0 1, 1, 1, 2, 2, 4, 16, 256, 32768, 536870912, 72057594037927936, 324518553658426726783156020576256, 411376139330301510538742295639337626245683966408394965837152256

A168089 2^pentanacci(n) +20 0 0, 0, 0, 0, 2, 2, 4, 16, 256, 65536, 2147483648, 2305843009213693952, 1329227995784915872903807060280344576, 110427941548649020598956093796432407239217743554726184882600387580788736

A171163 Number of children at height n in a doubly logarithmic tree. Leaves are at height 0. +20 0 0, 2, 2, 4, 16, 256, 65536, 4294967296, 18446744073709551616

next terms 2^:

16, 6, 6, 8, 6, 5, 8, 7, 6, 5, 7, 8, 7, 16, 32, 6, 7, 8, 8, 8,

histogram: 5: 2 6: 5 7: 4 8: 6 16: 2 32: 1

combined histogram

2^3: 1 2^5: 5 2^6: 6 2^7: 8 2^8: 8 2^16: 3 2^32: 1

this still seems to contradict the hypothesis that 2^3, 2^6, 2^8 are more special than 2^5, 2^7. although both lines of reasoning point to 2^8 as being somewhat special

so we are left with just 1,2,4,16 as the specialest, and maybe 2^8 = 16*16 = 256 and 2^16 = 65536 also.

not sure if this numerology is useful for anything, but perhaps it suggests bit widths for instruction components, memory sizes for massively parallel machines, number of registers, amounts of parallelism, word sizes, etc.

e.g. maybe instruction components should be 1,2,4, or 16 bits, and instruction sizes should be 4 or 16 bits, machines should have 4 or 16 registers, machines should have word sizes of 4 or 16, machines should address 64k of memory, machines should have 64k processors.

otoh in the real world we have to deal with ipv6 and 64-bit machines, so we have a 2^6 and 2^7 in there already. the closest 'maybe special' to 64 is either 16 or 256, and to 128 is 256. So maybe we should make use of 256 bit widths for some things.

e.g. maybe instruction components should be 1,2,4, or 16 bits, and instruction sizes should be 4 or 16 bits, machines should have 4 or 16 or 256 registers, machines should have word sizes of 4 or 16 bits, machines should address 64k bits (8k bytes) or possibly 2^256 bits of networked memory, machines should have 64k processors.

hmm.. 2^256 is a big number.. perhaps we should give up on addressing every bit on the network and only have 'learned routes' with no absolute network addresses? also i see no need for 256 registers in BYTES but if this is in BITS that's 32 bytes of registers which seems reasonable. Otoh 16 registers with word size 16 bits also seems reasonable.

so i'm thinking:

16 bit instruction size
instruction fields of 1,2,4 bits (note: so i guess we have 2^4 = 32 basic opcodes)
4-bit or 16-bit word size? i'm leaning towards 4 actually..
16 registers (if 4-bit word size, that's 8 bytes in the registers, or 4 16-bit words)
64k words in each processor's local memory (that's 32k bytes because we have 4-bit words, or 16k 16-bit words), addressed with 16 bits
64k local processors
256 bit addresses to combine network address with local addresses (so we have room for 128-bit IPv6 addrs + 64-bit addressed local addresses, with 64 bits to spare, so we can interoperate fine; in our system we only need 16 bits for local addresses, plus 16 bits to specify the processor, so we have 224 bits to spare. Assuming we use 128 bits for network addresses, we have 128-16 = 96 bits to spare)

alternately with 1-bit 'words':

16 bit instruction size
instruction fields of 1,2,4 bits (note: so i guess we have 2^4 = 32 basic opcodes)
16 1-bit registers (that's only 2 bytes in the registers, or 1 16-bit word)
64k bits in each processor's local memory (that's 8k bytes), addressed with 16 bits
64k local processors
256 bit addresses to combine network address with local addresses (so we have room for 128-bit IPv6 addrs + 64-bit addressed local addresses, with 64 bits to spare, so we can interoperate fine; in our system we only need 16 bits for local addresses, plus 16 bits to specify the processor, so we have 224 bits to spare. Assuming we use 128 bits for network addresses, we have 128-16 = 96 bits to spare)

that seems more sensible actually; the 16-bit instructions could contain fields allowing operations to treat things as 4-bit or 16-bit or 256-bit words

so there's a massively parallel architecture for you. each node has 64k local processors and each processor has 8k memory, so each node has 2^29 bytes of memory, about half a gigabyte.

Each processor is like a neuron with 1024 afferents and 64 bits of information at each synapse.

that seems too much memory per neuron.. how about 4 bits per synapse? we're off by a factor of 2^4. But the next step down from 64k bits local memory is 256 bits, which is a step down of 2^8. Let's try it though:

16 bit instruction size
instruction fields of 1,2,4 bits (note: so i guess we have 2^4 = 32 basic opcodes)
16 1-bit registers (that's only 2 bytes in the registers, or 1 16-bit word)
256 bits in each processor's local memory (that's 8 bytes), addressed with 8 bits
64k local processors
256 bit addresses to combine network address with local addresses (so we have room for 128-bit IPv6 addrs + 64-bit addressed local addresses, with 64 bits to spare, so we can interoperate fine; in our system we only need 8 bits for local addresses, plus 16 bits to specify the processor, so we have 232 bits to spare. Assuming we use 128 bits for network addresses, we have 128-16 = 104 bits to spare)

now each node has about 524k bytes of memory. If each processor is a neuron with 256 afferents, it can store 1 bit per afferent. Not enough.

History shows that computing likes to increase bit widths, and also individual neurons have megabytes to gigabytes of DNA which is at least ROM and may also set the scale for an epigenetic RAM system, so so let's err on the larger size, with 8k cache-like local memory per processor.

as for the instruction breakdown:

if we're doing register-to-register we have 4 bits for each of two operands and 4 bits for an opcode, that's 12 bits already so we only have 4 left. that's enough for 2 bits of addressing mode for each operand.

If we have 64k words in memory then it would take all of the 16 bits in the instruction to address local memory at the level of a bit. But we could have a few bits for scale, plus the address. E.g. if we had 4 bits to address, and we used 2 bits to divide by 4, then 16/4 = 4 so we'd still be short 2 bits. Hmm..

maybe we can't have immediate mode addresses then? that seems bad..

also we need some bits for scaling (choosing a word size per-instruction) if we don't have a set word size. and we wanted to have stack-ish options in addition to registers. and it would be nice to have a third operand sometimes!

well that sucks. no wonder everyone uses words and accumulators and 32-bit+ variable length instructions. PIC has weird accumulator stuff. AVR has 16-bit instruction but also has some instructions which take an additional 16 bits. The 16-bit ARM thumb instruction set has weird variable encodings and i think only 8 registers: https://ece.uwaterloo.ca/~ece222/ARM/ARM7-TDMI-manual-pt3.pdf

ok for each 'scale bit' that we take, we can remove one bit from EACH operand, since the scale is set for the whole instruction. So with two scale bits we only need two bits for each operand, plus two addressing mode bits, and so we have two bits left over. The fourth addressing mode can be stack-like (immediate, register, register indirect, stack-like). But if registers are only 4 bits then we can't yet address 64k of memory, right? Unless scale applies twice.. then with both scale bits set, we divide by (1,2,4,8 = 8) and we have two registers with 1 byte each, and we have 64k bits/8 = 8192 bytes to address, so by using both registers as one 16-bit address we can get there. But in that case shouldn't the scale be 16?

What would be great would be if we could just dispense with registers and address memory directly. Can we get there?

We have 16 bits of memory so we need at least 3 scale bits, giving us scales of 1,2,4,8,16,32,64,256. If we had just 8 more bits to address, 2^16/256 = 2^8, and we could reach everything, although beyond 16k we could only address using 256-bit (8-byte) word sizes!

alternately with 2 scale bits we could have scales of 1,2,4,16. Then we'd need 12 bits to address the 4096 words in memory.

also note that we want to reserve half the opcodes for custom, so if we only have 4 bits we only get to define 16 of them. And we want to reserve half the addressing modes (which i guess are also like types) for custom, so if we have 2 bits per operand we need to add another bit.

So with only one operand we already have 4 bits opcode + 3 bits addressing mode = 7 bits, plus 2 scale bits = 9 bits, and that's without the operand itself (which would then be at most 7 bits if we want to hit 16 total). If we had two operands that's 4 + 2*3 + 2 = 12 bits before the operands themselves (now 2 bits are left per operand). And the 3 bits for the addressing mode are odd, which doesn't fit our numerology of 1,2,4,16.

So we could go the variable length route, and have 16 bits be the length for the instruction, with all semantic modifiers (addressing modes, scales, and whatnot) but not the operands themselves, each of which could be another 16 bits. Now we can have 3 full-length operands again, and there is no need (from an instruction coding point of view) to have a few registers rather than just accessing memory.

So now we have 16 bits to play with, and we've spent 12. We have 4 bits left. We could add 2 more scale bits, and multiply their representation, giving us 1,2,4,16,32,64,256. And we could add another bit to each operand's addressing mode to make it even.

Or we could add a third operand. 4 + 3*3 + 2 = 15 bits, 1 bit left over.

if we have only 2 scale bits, with 1,2,4,16, and we have 64k bits of memory to address, we still need only 12 bits to address it in 16-word mode (and only 1/16th, that is, 4096 bits, are available for single bit addressing). So if each of the operands is a 16-bit word, we have 4 bits left over in each operand. We can put the 2 bits of addressing mode plus the one bit custom addressing mode flag in there, and still have one bit per operand left.

So now the first word only have 4 bits for the opcode plus 2 bits for the scale. We now have 10 bits left there!

Some of those bits could be a second scale to allow us to use less than 12 bits per operand. If we only have 2 operands and we only use 4 bits per operand plus 2 bit of addr mode plus 1 bit custom addr mode plus 1 bit reserved as of now, then two operands cost (4 + 4)*2 = 16 bits. So in this case we only need 2 16-bit words instead of 3. Choosing between 4 and 16 bits per operand costs at least 1 bit. If we have 2 scale bits, then the operands could be 1,2,4, or 16 bits each. If they are 2 bits ea. then we still need 12 bits for two operands (4 + 2)*2 = 12. If they are 1 bit each then we still need 10 bits. So these options don't help as much. So let's just have 1 more scale bit.

So now we have 9 bits left in the first instruction word.

Otoh what if we have 3 operands? if they are 1 bit apiece that's (4 + 1)*3 = 15 bits. So in this case we can fit everything into just 32 bits! This is like only having two registers, but what the hey, i guess that's like using the top 2 items on the stack. So maybe we do want a second scale bit. Can we use the other option? 2 bits apiece? (4 + 2)*3 = 18. If we didn't use the other reserved operand bit that would be 15 again though and we'd be golden.

So now we have 8 bits left in the first instruction word, and the outstanding question of what to do with a 2 in the second scale reg.

Now what if we put some operands into those 8 bits? we'd have:

(4+4)*2 = 16 - 8 = 8 (4+2)*2 = 12 - 8 = 4 (4+1)*2 = 10 - 8 = 2

(4+4)*3 = 24 - 8 = 16 (4+2)*3 = 18 - 8 = 10 (4+1)*3 = 15 - 8 = 7

So if we use the last 8 bits for operands then we could fit 3 4-bit operands into two 16-bit words. And even if we don't, if we have two scale bits and use 1-bit operands, we can fit 3 1-bit operands into two 16-bit words.

btw note that a hypercube, as noted in the Connection Machine, might be a good network topology for 2^16 processors; each processor would have 16 immediate neighbors, and the graph (diameter? max hops) would be 16.

this thing with the 4 bits suggests 16 memory-mapped registers..

also we can get rid of the useless '2' scale option by using variable length scale..

note: the 8 KiB? of memory indicated above (via 2^16 bits) is already 8x less than the 256 KiB? in a Cell processor SPE (there are 8 SPEs per cell), and less than the L1 instruction cache on many processors (which is often at least 16 KiB?)

but 4x more than the RAM in a Parallax Propeller Cog (there are 8 Cogs per propeller), which is 512 32-bit words, or 2 KiB?

and of course we are going to have 65536 processors, not 8, an increase by a factor of 8k

and what if you said, 65536 = 2^(2^(2^2)), but instead of having 65536 bits, since we know 16 is special (2^2^2), we'll have 65536 16-bit words? that's

(2^2^2) * 2^(2^(2^2))

to complete the pattern,

2 * (2^2) * (2^2^2) * 2^(2^(2^2))

2 * 4 * 16 * 65536

2^(1 + 2 + 4 + 16)

2^23

8388608

2^23 bits is 2^20 bytes, which is one mebibyte.

It's not clear to me that this makes things better, though. 23-bit addressing?

---

so, what are the 'most special' powers of 2?

as noted above, we have the sequence with initial condition 1 and inductive step 2^(previous member), which starts:

1, 2, 4, 16, 65536

after that, we have powers of 2 whose exponent is itself a power of two, which starts:

2, 4, 16, 256, 65536 (adding only 256 to the previous sequence)

next we have everything of the form 2^(n*k), where n is a power of 2 (ie of the form 2^a), and k is one of 0,1,2,3 (or maybe 1,2,3,4; you get the same answer either way, except for 1) (the point is that k ranges over numbers up to a power of two, in fact, up to the first power of two after 2, although one could imagine that in another, further step, we would add in numbers of the same form except where k ranged up to 2^3=8 or maybe 2^4=16). When k is 0 you get 1, when k is 1,2, or 4, then n*k a power of two (so we just get 2^n where n is a power of two, which we've covered previously; 2,4,16,256,65536); so the only new case is when k is 3. So we get 2^(2*3) = 2^6, 2^(2^2*3) = 2^12, 2^(2^3*3) = 2^24, 2^48, 2^96, with the "most special" of those being where n was a power of a power of two, that is, 2^6, 2^12, 2^48:

64, 4096, 281474976710656 (or about 281.5e12, or about 281.5T; note that 4096 = 64^2, and 281474976710656 = 4096^4)

in summary, the most special (thru 2^16) are:

2, 4, 16, 65536

and then 256

and then 64, 4096, 12

Where is the 8 found in the ubiquitous 8-bit "byte"? We do see 8 in the above by starting with 256 and taking log_2(256) = 8. But this method also gives us log_2(64) = 6 and log_2(4096) = 12. 6, especially, does not seem very "2-ish". My opinion is that 8 is actually not that special (as a power of two, that is; the fact that it is a cubic number is of interest in itself, but for different reasons. It has other interesting properties, such as being a bad size for committees: http://arxiv.org/pdf/0804.2202v1.pdf ; ). In fact, old computers used to have other preferred bit-lengths. My guess is that 8 was settled on because (a) 256 values is about right for holding something like ASCII (all alphanumeric latin characters, then some punctuation), plus leaving half the values unassigned; (b) 255 is bigger than most small integers you want to do arithmetic on, big enough to hold the size of small lists you might want, and big enough to fit instruction sets in comfortably.

4: 4 seems to the number of assembly-language-computer-like stuff; 4 1-bit flags, etc.

12 is interesting as the result of 16 - 4; eg if you have a 16-bit bitfield, and you need to use 4 of those bits for something, then you have 12 bits left to use for something else.

In powers-of-two-land, 16 is an elaboration of 4 (2^4); also very powers-of-two-sy/computersy

4096 is often found in computers, eg in memory page sizes. 4096 (and sometimes, 256) seem to be regarded as 'the smallest chunk of memory in which you could put something useful'.

2^16 = 65536 is the smallest power of two bitwidth that can really hold a decent amount, the length of lists that aren't 'small' (that is to say, the smallest number of the form 2^(2^n) such that the resulting number of instructions can hold a decently-sized computer program). 65536 to me seems like the powers-of-two equivalent of how some human languages/cultures use specific largish numbers as a placeholders for 'arbitrarily large' (but not necessarily infinite) (eg 10,000: http://www.merriam-webster.com/dictionary/myriad ; various examples of 10,000 in classical Chinese literature). 65536 is the first power of two bit-length that is a reasonable size for the number of memory locations in a general-purpose computer (although if you don't care if the bit-length is a power of 2, 2^12 = 4096 is a reasonable choice).

(this sort of reasoning makes you wonder why Unicode didn't make characters fit within 2 bytes; shouldn't 65536 be enough to express ourselves? I'm not very knowledgable about this but my impression is that they initially tried; a Unicode encoding called UCS-2, which uses two bytes, used to be popular; but politically, Unicode had to take the union of all of the characters of all large or powerful human communities, which led to a demand for more than 65536 code points; note that this is different from saying that 65536 isnt enought to form an alphabet, this is saying that 65536 isnt enough to encode all alphabets that through historical accident have been created so far and still seem important today (note: some ideogram languages do have more than 65536 ideograms, but perhaps not much more (i heard 70,000 for Chinese once), and i bet many of these are not in active use outside of proper names). To this day, the "BMP" or Basic Multilingual Plane, which is the first 65536 unicode code points, is often talked about.)

It is interesting to consider what the corresponding numbers for some non-binary number systems would be. For 3: 3^(3^1) = 27 (so a trinary number in 3 trinary memory locations, or 'trits', can hold 27 distinct values); 3^(3^2) = 19683. 3^(3^3) = 7625597484987. 3^(3^(3^3)) = too large to work with. So, the equivalent of the formula for 65536 ((2^(2^(2^2))) = 2^(2^(4)) = 2^16; the number of locations that can be addressed by (the number of bits that can be addressed by (the number of bits that can be addressed by (the number of bits that can be addressed by (one bit))))) is too large to work with. Even 3^(3^3) is in the trillions. 3^(3^2) = 3^9 is the one that is on the order of 65536; the smallest 'special number' for base 3 that denotes a number of instructions that could hold a decently-sized computer program; this is the number of locations that can be addressed by (the number of trits that can be addressed by (two trits)). For 5: 5^5 = 3125; 5^(5^2) = 298023223876953125, which is one the order of 298023 trillion (roughly on the order of 2^64). For 12, 12^12 = 8916100448256, roughly 9 trillion, or a little less than 2^48.

---

proj-oot-old-150618-ootNumerology

powers of 2

2 * 4 * 16 * 65536

2^(1 + 2 + 4 + 16)

2^23

8388608