notes-ai-workingMemoryAsCacheCoherency

Short-term working memory can only hold 7 +- 2 items. This is very small. It contrast both with the massive parallelism found in neural architecture and in pre-attentional processes in the cognitive psychology, and also with the larger number of registers found in today's computer hardware CPUs.

Another CPU architecture, the stack architecture, in which there are not general-purpose registers, but rather a stack, and instructions take as operands the top few items on the stack, seems to promote less 'working memory' items; but the items in human working memory don't seem to have a total ordering.

Here's another idea. First, take the idea from my friend RF that although our neural architecture is mostly parallel, our higher functions must be serial because that's the way that deductions in formal logic work; but empirically we see that these serial processes are slower (it takes us a relatively long time to think through even a short axiomatic deduction, compared to all the computation that our brains must be doing to e.g. catch a ball thrown at you).

Combine this with the narrative theory of consciousness, which posits that consciousness is related to the phenomenon of constructing and remembering a story or narrative about our life history. A narrative has a total ordering in time, so, like database transactions, this would mean that we would be taking (semantically if not physically) intrinsically unordered events/transactions and imposing a consistent total ordering on them.

As we see with database transactions, there is the possibility of contention (with two autonomous asynchronous concurrent processes trying to change the same items in memory at the same time in an inconsistent way), and the ordering mechanism must resolve this contention. As we see when trying to scale up databases from large active websites, resolving this sort of contention can easily become the primary bottleneck in massively parallel systems.

One resolution to this problem in websites is 'eventual consistency', in which changes made to the database by one webserver are not immediately seen by other webservers, and in which the data in the database may temporarily be in inconsistent states, but eventually a consistent state is guaranteed to come about.

Combine this with the global workspace theory of consciousness, which says that there are a zillion concurrent processes running in our brain, and conscious awareness is a broadcast medium, sort of away of selecting a few data items and broadcasting them to all of the other processes (the spotlight/theatre metaphor of attention). But the reason it's called a workspace is that these items can be changed. So the global workspace is really a set of data items (btw note that the idea of these items being just static data may be misleading) such that when a change is made to them, all other brain processes immediately see the change.

So now we see the connection with cache coherency. In multiprocessor CPU architectures, each processor has its own cache of part of main memory. This cache is transparent to the programmer, that is, from the point of view of the programmer, the program is accessing main memory, but the hardware sometimes secretly replaces reads and writes to main memory to reads and writes to the cache. In uniform, shared memory architectures, the semantics presented to the programmer is supposed to be that any changes to main memory are immediately seen by any later access to main memory from another processor. So we have the problem of cache coherency, that is, if the same part of main memory is currently mapped into the cache of multiple processors, and one of those processors writes to memory, the system must act as if that write was immediately conveyed to the other caches. A looser condition is (i think) that at the other processors don't immediately observe the change, but that the history of memory can be construed as a total ordering on changes, and all processors experience the same total ordering (see http://en.wikipedia.org/wiki/Cache_coherence ).

Just as database contention is often bottleneck for concurrent processing in large-scale websites, cache coherency is often the bottleneck for multiprocessor architectures with many processors.

One resolution to this problem in multiprocessors is to give up on cache coherency, for example, non-cache-coherent NUMA; NUMA stands for Non Uniform Memory Architecture. In NUMA, there is not one shared memory, but rather many memory banks, which may have special relationships with a subset of the processors. An extreme form of non-cache-coherent NUMA is distributed memory, which each processor has its own private memory.

You can see where i am going with this. We see that in computer systems, serialization/synchronization/resolution of contention of concurrent process writes to shared data often become the bottleneck. I propose that the reason for the surprisingly tiny capacity of human working memory is that the small number of items in working memory are being given special attention by a serialization/coherency system that allows writes to these few items by the zillions of concurrent processes in the brain to be serialized into a total ordering and to be almost immediately observed by all of the other concurrent processess accessing the same item.

I am suggesting that there would probably either be a geometrically scaling cost as a function of number of items, or a scaling cost proportional to the number of concurrent processor units and the number of items, in terms of an increase in neural wiring or processing power or some other constrained resource, to increasing working memory capacity, which is why evolution stopped at 7 +- 2; this number is probably already consuming a significant fraction of some constrained resource, meaning that it could only be increased with a significant increase in the size of the brain, or at the cost of taking significant intelligence out of some other brain function.

If the cost is proportional to the number of concurrent processor units, then it is unlikely that the cost is merely in the size of some special-purpose shared memory, for example the hippocampus. Like the interconnects between caches in the cache coherence problem, i think it is more likely that the cost lies either (a) in the connection between, on the one hand a small global shared memory, and on the other hand, the processing substrate for each of zillions of other asychronously executing processes; or (b) in direct connections between these zillions of other asychronously executing processes, without any actual globally shared memory. It is unclear if the cost is in additional wiring or in additional processing time being spent to resolve contention.

Chuck had another idea: another reason there could be a need for each slot in working memory to have one connection to each processing module is to enable each item in working memory to be able to simultaneously possess one value for every possible attribute, assuming that each attribute corresponds to one module (e.g. the orientation in a given pixel corresponding to a cortical column in V1).

If the cost is geometric or proportional, it seems unlikely that different individual of the same species would actually have one more or one less space in their working memory in a hardware sense. Therefore the +- 2 suggests that the system is somehow reconfigurable to accept slightly more items with some sort of tradeoff; perhaps the memory size/complexity of each item, or of the interconnections between the items, or perhaps some probabalistic tradeoff, or perhaps some tradeoff with how securely/successfully items can be transferred between working memory and longer-term memory, or with write latency or bandwidth.