notes-computer-jasper-jasperBrain

one of the original goals was to make a programming language which mirrored as much as possible the sort of computational architecture that the brain seems to have ("neural architecture" or "neurally plausible architecture"), in order to allow programmers to develop intuitions about how the brain might work.

how can this be done?

massive parallelism

as many CPUs as data elements are available. Active Data Structure styles are feasible

the 100-step rule.

memory mixed in with CPUs

no von Neumann bottleneck

the Connection Machine did this.

see http://en.wikipedia.org/wiki/Computational_RAM

labeled line hypothesis, labeled line variable hypothesis

the 'labeled line' hypothesis states that the output of specific neurons (CPU/memory computing elements) has specific semantics.

combined with massive parallelism and memory mixed in with CPUs, it seems likely that each instance of a semantically important 'variable' in a program would usually be bound to specific computing elements (neurons or groups of neurons).

spurn long serial computations

100-step rule

(although serial computation is available if really needed, e.g. people can think through a proof, slowly)

local memory/non-uniform memory hierarchy

e.g. the various CPUs have their own memory and do not have to wait on each other when they want to update something

distributed storage of object attributes

e.g. sometimes you can remember someone's face and various facts about them, but not their name

estimation of unknown, and robustness to invalid/unavailable inputs

if you see a lion, start running, and then look again but your eyes are blinking, or you are saccading, or there is noise in your visual input, you don't stop running because the lion-detector halts with an error; the lion-detector maintains an estimate of its result even when it is temporarily blinded by nonsense inputs

high fan-in, fan-out

polychrony

signal processing

auto-associative memory ?

What would be the correct 'indirect addressing modes' for a cognitive architecture? Does content-addressable or auto-associative memory fit in here? Are there primitives relating to traversal of a linked semantic network?

see [1].

semantic networks? content-addressable memory?

working memory can hold 7 +-2 items

this is very small. what does it correspond to in a computer architecture?

see [2].

maybe: dataflow

like spreadsheets, it seems that various small modules (e.g. cortical columns) in the brain compute the same thing as different input comes in

maybe: eventual consistency

this would seem to follow having zillions of CPUs not waiting on each other

maybe: SIMD

MIMD/multithreading is definitely possible in the brain. Maybe submodules, e.g. V1, also do something expressively equivalent to SIMD (although maybe not synchronized as precisely)

maybe: sparse firing patterns

robustness

it should be possible to write programs which don't crash upon an error, but which merely behave suboptimally (however in many cases this is not the desired behavior for computers, so it should also be possible to write programs that do crash)

observable while loops

hypothesis: when a subprocedure in the brain runs a 'while loop', instead of running the entire loop and then doing some other stuff and then returning a result, the result is almost always continuously recomputed based on the value from the last iteration of the 'while loop'. In other words, the hypothesis is that while loops are almost always only used for iterative refinement calculations.

this somewhat follows from the labeled line variable hypothesis

robustness to noise and injury

maybe: robustness through redundancy

different attributes of the same object represented in different modules

e.g. if you are thinking about a dog, the look of the dog goes in the visual areas, but its sound goes in the auditory areas; in contrast to, say, a C struct, where all of the attributes go next to each other in memory

if you imagine a visual shape, V2 is activated

there is a 'sensory binding problem'.

note that this may make detailed representations expensive; see Chuck's idea in [3]

few items in working memory

in humans, we can only hold 7 +-2 separate items in working memory at once (we can recall an arbitrarily long number of things if we link them together into a network, however)

one idea is to only provide 7 or so separately and concurrently transactionable active processes in global shared memory, e.g. like a small global shared memory, readable by all CPUs, with 7 CPUs cache-coherently maintaining it/writing to it, and with a direct connection from each other normal CPU (there are zillions) to one of those 7; or possibly 7 separate such shared memories, one per each special 'working memory' CPU; note however that in humans the number is not fixed exactly at 7 but can presumably be reconfigured to be slightly higher or lower with some tradeoff: see [4]

hmm.. that might make more sense if you combined it in some way with Chuck's idea in the previous.. hmm... the key is to find something that forces every normal CPU to have to either periodically do a little bit or work for each item, or to have a direct connections to each item, or both.

.. if these 7 items have their 'actual content' stored in the normal (distal) modules (and remember that we can access arbitrarily large networks of structures)ws, then maybe the 'working memory' itself is just something like an index; and then aha perhaps the 7 special CPUs are the routers

..otoh if we assume that the individual modules have the capacity to be in many states, but are only actually in a given state when they are actually representing that attribute value, then maybe the working memory items are like master CPUs telling the other modules which representations to load..

.. aha if the current values of the items in working memory are being stored in the distal modules, and if there are 7 of them, that means that each module must be 'thinking of' 7 things at once; so there immediately we get at least a linear scaling requirement with items in working memory.. but that's probably not enough because mice have much much less than 1/7th of our brain but they're pretty clever

.. and we can make that geometric if we say that the computational resources of each distal module must increase with the number of states that it can represent (rather than the log of the number of states, which is the typical thing).. so then if e.g. attribute 387 take 1 bit to represent, we need 2^7 computational resources if there are 7 items in working memory.. if these resources are not just 'bits' but also 'CPU time' then that explains it.. this would imply that we're solving some NP-hard problem which can be independently solved for each attribute

...Jensen's model: time division multiplexing, where each CPU must cycle thru each item in each period (e.g. there are 'cycles' of e.g. 14ms each, and each CPU must break these 14ms cycles into 7 2ms subcycles, one for each multiplexed item)..

..seems like focusing on the inter-module connections needed for chuck's idea might be a better idea..

or the routing. perhaps the cost of providing an effective connection between two modules is superlinear with a completely decentralized routing system. e.g. in order to route to ten times as many distal nodes, the computational machinery within the router must be expanded, but then you have to route within the router, and this addition of another layer itself requires more routing, etc, which leads to geometric cost. but it seems like decentralized routing could be done more efficiently than this. Perhaps if the function is routing mixed in with learning mixed in with computation in some flexible way its more costly. in this case the 7 routers/working memory CPUs would not be actual CPUs/modules, but rather emergent/virtual CPUs/modules that arise out of the network fabric

the stuff Doug found about foreign-modality information being available all over the brain is consistent with this model (it is in the middle of being routed), and also with a broadcast model in which every part of the brain has to listen to a broadcast of all of the attributes of each item in working memory.

which makes me realize that the NP-hard thingee seemed needed just because we need a proportional CPU time, not # bits, to increase with # of items; e.g. if you say 'we need 7% of your time to be spend listening to this broadcast', that's less scalable than 'you'll need 7 bits to store this data' (but why would it be a fixed proportion? it must be that the proportion is not fixed but efficiencies in it are only gained with geometric increases in computing power). a broadcast model seems good so that we can opportunistically make new creative connections.

or mb memory ordering: http://en.cppreference.com/w/c/atomic/memory_order

i wonder if the 7+-2 working memory stuff could be the only items where you are allowed to memory fence, e.g. apply sequential consistent ordering to? or, worse, the only processes that can demand/items for which a demand can be made for non-relaxed orderings on anything?

in the former case,

if thread A emits a,b, then any other thread C will observe a,b. But if thread A emits a then thread B emits b, another thread C might observe a then b, whereas D might observe b then a. But the special working memory items are sequentially consistent, e.g. all changes to any attribute of any of the globally visible working memory items are totally ordered.

in the latter case,

if thread A emits a,b, then any other thread C might observe b,a. Only the special working memory items are ordered at all (and i guess they are still totally ordered).

mb jasper shouldn't bake this stuff in.. but it should make it ez to model these sorts of hypotheses..

autoassociative memory

e.g. you can say "that website that had a picture of a yellow car" and recall a link to a memory of an event, from which you can look at other attributes of that event and get the URL

one stream of consciousness

a human has many preconscious concurrent processes, and ~7 working memory items, but only one conscious narrative, and certain preconscious processes are also limited, e.g. there's some sort of visual counting task whose speed is doubled in people who have the cortical connection between their hemispheres (corpus callosum) severed (they still have subcortical connectivity)

pointers

what is the equivalent of pointers in this model? parallax propeller's lack of an indirect addressing mode shows the necessity of either pointers or something else (in its case, self-modifying code) to make up for them.

relation to routing in the brain?