notes-cog-ai-compressionIdeas

http://en.wikipedia.org/wiki/Hutter_Prize

Have a very short token that means 'your first guess is correct', e.g. "i'm not going to explicitly say what goes here, but compute the most probable guess, and that's it". And then an almost as short token that means "your second guess is correct", etc, up to some small number such as 7 (in theory the choice of the ceiling number can itself be optimized). E.g. if you see "at the end _ the", you can infer that _ should be "of".

(see http://rsif.royalsocietypublishing.org/content/early/2012/07/23/rsif.2012.0491.full for example phrases)

Construct a hierarchical ontology. Make assumptions about the statistics of the branching nature of the ontology (power law?).

---

also, it seems to me like one of the best extant compression algorithms, http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm (the one used by xz, LZMA2), doesn't even do any wavelet transform stuff? That would explain its poor performance on my image raster .mat files.

shouldn't you at least do trend subtraction? i understand that you don't know how numbers are being encoded but surely you can do something that works at least somewhat using current common encoding formats?

often even non-image data will have repeated packets/records of a certain length, a wavelet approach could help there

---

combining those two, we have a spatial template of certainty/uncertainty probability

generalizing beyond space, instead of a template parameterized by space, we get one parameterized by anything

applying my many-linked-state-machine-estimatorsidea, we have many bagged estmators

and also apply my idea to link them with spreading activation

and then also we also have an abstraction operation that allows some of those to orchestrate sequences of others

---

how would you compress an image of snow on a tv screen, where the viewer doesnt care about the exact piciture, but only the general feeling/texture?

---

apply typical stock market strategies to compression

---

https://en.wikipedia.org/wiki/DFA_minimization

http://blog.burntsushi.net/transducers/

https://news.ycombinator.com/item?id=10551280

---

when simplifying algebra expressions, there is an alternation of expansion and contraction phases.

similarly, when designing something, there is an alternation of expansion (oo let's think of 5000 cool ideas) and contraction (let's reduce complexity by unifying things and also by eliminating features that aren't worth their complexity cost) phases.

perhaps a similar thing could be useful in compression, esp. with regards to using (or building via unsupervised learning) a 'commonsense reasoning' logical framework to help compress

---

http://countercomplex.blogspot.com/2011/10/algorithmic-symphonies-from-one-line-of.html https://www.youtube.com/watch?v=tCRPUv8V22o&t=41s

---

treat (part of) data as N-dimensional array, select a block size, then use as context the (previously specified? or all?) blocks that are adjacent to the current block (this is a generalization of what some existing image compression formats do)

---

have a number of compression methods; data is divided into variable-sized blocks; each block begins with a selector byte that selects the compression method (the method then controls when it is finished; so the length of blocks may be fixed for some compression methods, and variable but specified in various different ways for other methods).

each method can hold state across invocations (this allows for, eg, a 'dictionary' method that is given a custom dictionary in one invocation, and then called to decompress data using the previously given custom dictionary in another invocation). Each method also gets the file position as input, as well as all previously decompressed data.

Some methods are 'iterative' methods that don't yield a full decompress on the first call, because they require as input some bytes that have not yet been decompressed. So the decompressor runs through the entire file, and then goes back and again calls each block that is not yet fully 'solved', doing this over and over until all blocks have been solved (or until state no longer changes). Perhaps, to maximally 'soften' this, each output file position byte has a probability distribution over all 256 possible values (and a 'solved' byte is where one value has 100% probability (and so all the other values have 0% probability)).

some ideas for methods:

---

good idea:

could we do something like this for Wikipedia? Maybe do the opposite; instead of sending the gist, expanding via AI, then filling in the details, encode the details, then fill in the gist. This is because even if the A.I. 'understands' the knowledge that a wikipedia page is trying to convey, there are many ways to put that into words, and unlike images these will have a high edit distance between them at the level of words or letters (so adding a last 'correction' to make it lossless would have to contain almost as much information as just compressing the whole thing without using A.I.). So maybe better would be to do something like mad-libs where you present the inessential sentence structure and words, and then use something like GPT to fill in the blanks.

---