Bayle Shanks's website: proj-oot-ootBrainNotes1

custom multiple flavors of indirect addressing

like the types of meta

connection to the RDF verb/hypergraphs?

subsume the two fundamental hypergraph gets (normal and reified) into one operation with multiple flavors?

---

(lack of) call stacks in the brain; what do we have instead?

are call stacks just an abstraction to make it easier to write stuff down? yes, when they are used in a lambda-calculus-y (with no continuations) way; but no, when 'metaprogrammy' stuff like 'break' within a while loop is used, or when 'return' is used in more than one place in the fn, or when exceptions are used for non-exceptional control flow; in these cases, they are usually being used to inject complex sequential/imperative control flow without writing it out explicitly

what does the brain have instead of call stacks? well, for ordinary (lambda-calc-y) use, it just has substructure of neurons... but perhaps the real answer is dataflow variables (labeled lines), since the 'caller' isn't sitting around waiting for the 'callee' to return, it just continues on with the computation with the current (old) value

---

so i guess parts of the paradigm of brainlike computation are:

there is no program counter; instructions are nodes that are all running in parallel (of course, for convenience, in Oot we'd provide PC-like sequencing operations anyways)
each node produces an output value which is duplicated and sent to one or more consumers; eg each node acts as both an instruction and a variable
variables are dataflow variables; when a variable's value changes, all nodes that take that variable as input immediately (or with some delay) recompute based on the new input
it's not clear if there are guaranteed delays/sequencing between when one node sees an update and when another one does. i guess in the brain there could be some circuits for which the delays that introduce sequencing in a certain order where that order is important, so we should provide that option; but in lieu of it, i guess the default is just that, when one variable updates, each of its consumers 'eventually' sees the update, with no guaranteed ordering between consumers
conditionals are generally implemented by running both branches at once, then having a 'gate' node at the end which looks at the condition and uses that to select the output from one or from the other branch
but since we do want this to be a useful programming language, you can abstract any set of nodes into a function, which then can be called multiple times from various other functions, as if each call was a separate duplicate of all of the nodes in the function.
We also permit recursive calls without predetermined call depth limits, although i dunno how the brain would implement that (seems like it would have to be a loop with a PC, right?).
also, dunno how 'routing in the brain' works, of course

---

ericjang 1 day ago

I wasn't aware that Danny Hillis was a student of Minsky's and that the Connection Machine mainframe series was created with brain simulation in mind. That's really cool.

When thinking about implementing the network topology for a large-scale brain simulation, the network topology should reflect the 3D spatial local-ness of the real brain (to avoid redundant N x N communication between units). One seems to either arrive at a fat-tree CM5 architecture or a 3D lattice of asynchronous processors (but this is not very general-purpose).

https://www.youtube.com/watch?v=blvC0DA96dI

versteegen 18 hours ago

But a 3D lattice is actually quite bad in the sense of having a very high graph diameter for the same degree (6) as the number of nodes goes to infinity, compared to other designs, though probably not so bad for low n. Still far better than a square lattice, of course. Wiring of processors in clusters and supercomputers is one of the main uses of graphs with low degree and high diameter (see [1]). I also know that twisted hypercubes were used for connection graph of nodes in supercomputers, because they have a number of nodes which is a power of 2, fairly simple and uniform routing rules (in general other graphs don't), large bisection bandwidth, low diameter, high (maximal?) fault tolerance, and more.

[1] https://en.wikipedia.org/wiki/Table_of_the_largest_known_graphs_of_a_given_diameter_and_maximal_degree reply

---

ah, here's someone else doing massively parallel 'brainlike' computing!

http://apt.cs.manchester.ac.uk/projects/SpiNNaker/architecture/

http://apt.cs.manchester.ac.uk/projects/SpiNNaker/software/

---

is 'Arc Computation' fundamentally different from ordinary ANNs eg 'Node Computation' (where the state is stored in node labels)? Consider a Node A which has 4 neighbors, one of which is Node B. In Node Computation, the states are at A and B, and the relationship between the state in Node A and the state in Node B is 'neighbor'. In Arc Computation, the state is stored in the four edges; the relationship between these edges is not just 'neighbor', because a different node intervenes between each edge and its 'neighboring' edges. So we can't directly transform the arc computation graph into a 'dual' graph where the nodes become arcs and the arcs become nodes (note: in some contexts, 'dual' graph refers not to this, but to interchanging vertices (nodes) and faces).

But perhaps we could just treat this as if the edges in the dual graph are labeled. Eg say that the node representing the edge A-B is 'adjacent' to the node representing the edge A-C via an edge labeled 'A'. One thing to note is that this is not a regular lattice/celluar automata; the label 'A' is not found tesselated throughout the graph, it is unique for those 'edges' representing node A in the original graph.

However, we're losing the information of what the edge directions are, and perhaps the constraint that each edge connects two nodes; eg the mapping from graphs to their node-edge duals may not be 'one-to-one' (is that correct?); otoh we may gain information in that perhaps the mapping is not 'onto' either (is it?)

Another similar transformation is to transform the original graph to a hypergraph, in which nodes are hyperedges. This transformation is exact (i think). So is Node Computation and Arc Computation the same in the hypergraph context?

---

clockless circuits:

so apparently a circuit with arbitrary delays (that computes a known output) is:

https://en.wikipedia.org/wiki/Delay_insensitive_circuit

and apparently that cannot be Turing-complete.

So in order to be able to build Turing-complete stuff, we must make some assumptions. One of the 'least' assumptions we can make is that we have access to 'isochronic forks', which are forks in which if one endpoint of the fork sees a signal, they can assume that the other endpoint has already seen a signal too. The model making this assumption is called Quasi-Delay-Insensitive (QDI).

"An asynchronous circuit in which all forks are assumed isochronic corresponds to what has been called a speed- independent circuit, which is a circuit in which the delays in the interconnects (wires and forks) are negligible compared to the delays in the gates. " [1]

my note: seems to me that in the brain, we are often in the opposite condition from speed-independent; almost all the delays are in the wires and forks (axons), not in the gates (cell bodies) (not sure about this for local axons though). What does the brain do about this?

My guess is that the brain uses plasticity to explore which forks are longer and shorter, and then assumes that this timing relation will not change (or not change too rapidly, too frequently -- it can be re-running the 'calibration plasticity' constantly, and tolerate some errors). I wonder what that model is called in electrical engineering?

according to those wiki pages and the paper [2], computation in QDI circuits proceeds like you might expect; you send some additional signals to sort of indicate "i'm done transitioning now" and then you have handshake protocols between each computing element that ensure that they don't compute mid-transition.

so i would guess that we would see such handshake protocols in the brain.

---

great-looking paper: http://www.async.caltech.edu/Pubs/PDF/2006_ieee_soc.pdf (already added to to-reads) one cool thing about it is that it uses a CSP-like language called CHP (Communicating Hardware Processes)

---

" All circuits presented in the previous sections have stable and noninterfering PR sets. But in order to implement selections in which the conditions are not mutually exclusive (nondeterministic choice), for instance, to select between an external interrupt signal and the internal B next-instruction [ signal in a microprocessor, or for synchronization between the clocked components of a GALS system, at least one operator must provide a non- deterministic choice between two true guards. Since stability and noninterference are equivalent to determinism, a nondeterministic computation does not admit stable and noninterfering circuit implementation. Therefore, any implementation of a nondeterministic computation contains nonmonotonic transitions that require special nondigital solutions. Those special circuits are encapsulated inside two primitive building blocks: the arbiter and the synchronizer . A fundamental, and long misunderstood, issue related to implementing a nondeterministic choice is that of metastability . In classical physics, it is impossible to put an upper bound on the time it takes for a device to make a nondeterministic decision between two alternatives. When the system starts in a state where the physical parameters are such that either alternative can be selected, thedevicemayenterametastablestateinwhichitmay stay an arbitrary length of time before deciding one way or the other [6]. "

my note: if the issue was really just that nondeterminism couldn't be done fast, then since in most circuit-related cases we don't actually need randomness, we could have a buffer of PRNG'd numbers that are used to make the decision (or, in most cases, such as the example given of an interrupt competing with a next instruction, we'd probably just want round-robin: 0101010101). So i suspect that i am misunderstanding, or the paper is misexplaining, the real source of the problem here.

---

so if we have stuff like eg a million object-specific recoginizers running in parallel, then need 'map' to run the recognizers plus a 'reduce' that specifically does something like a select/epoll or winner-take-all to focus on (select) one of the recognizers that thinks it recognized

brain (at a low level) probably has no program counter (PC), and branches are probably implemented by computing both branches then having post-select gates (although mb not; mb the nonlinear dynamic system of a neuron cluster can use the same neurons to compute various different computations depending on what state it is in, or upon its inputs)

routing in the brain?

need to have mini map/reduces (divergence and convergence) within one 'map' step. Probably each 'map' step is written in a dataflow-like manner.

---

So, the brain is very concurrent. How do we model its concurrent computational algorithms? Many higher-level programming language constructs seem to be architecturally unsuited to the brain, which presumably isn't well-modeled by a set of CPUs running programs step-by-step via a program counter, all with access to, and communicating via, a shared memory.

So, you might think, ok, look at EE stuff relating to computer architecture. Models of circuits have elements that 'run' simultaneously and take finite time to send signals to each other.

But, this isn't quite suited either; at a low level it seems to match, but the conventional architecture on top of that doesn't:

clocked circuits
circuit elements with small indegree and outdegree (compare to neurons, which often have very many afferents)
strict separation between CPUs and memory (compare to the brain, which appears to mix them)
a few fast CPUs (compare to the brain, which appears to support many slow CPUs)
statically known circuits with reliable parts (compare to the brain; the exact connections of every brain is different (in higher animals) and in fact even evolves over time; and neurons are relatively unreliable, in fact many individual neurons may even die without noticibly impacting function)

(by virtue of assuming that evolution optimizes, and by noticing how smart humans are, should we conclude that the brain's architecture is better, at least for running whatever sort of algorithms human brains run, and if robustness is desired? Not necessarily; in making brains, evolution operated under a constraint that don't apply to computers, namely, each human brain had to be grown by a human body, without using a large amount of power or rare nutrients; and the human body that grew it itself had to be grown in a similar manner. By contrast, computers can be centrally manufactored in huge factories with expensive hard-to-produce equipment that uses a lot of power. For example of what this might entail, it may be hard to grow a large brain with every connection specified by the design, but it may be easier to manufacture a computer chip with that property. )

This suggests that the distributed systems model (that i talked about in plPartConcurrency; namely, a set of interactive Turing machines (Turing machines + I/O), which can also interact with each other (send messages to each other using I/O), and in which sometimes messages are lost in transit or corrupted, and sometimes the individual machines fail.

but we also need timing delays...

---

how about a:

slowly varying timing model

You can assume that there is a total ordering on how long each connection takes to transmit. This ordering is not constant but changes over time; but it changes relatively slowly, making it possible for your system to constantly recalibrate itself and then only be wrong sometimes. In addition, say that the ordering is not 100% reliable; sometimes the connections transmit significantly faster or slower than you expect (more to the point, express this stuff in terms of pairwise inequalities; if you looked at the entire total ordering, then yes it would change in some respect on almost every tick; but for any pair of connections, the question of which of them is slower is unlikely to change over the timescale of their latency).

I like this because it seems feasible for the brain (short-term and long-term plasticity could do the 'calibration'), and it reminds me of two things:

my memory consistency paper idea, which derived various memory consistency models from fixed relative timing assumptions
quasi-delay-insensitive (QDI) clockless circuits with asymmetric isochronic forks [3]

So, we have a distributed systems model:

set of turing machines
+ lossy channels
+ slowly varying timing models for some channels
+ arbitrary timing for other channels (these channels model over-the-network communication)

---

biologically, concurrency primitives are assessed in terms of 'consensus number', suggesting that the consensus problem is fundamental. Note also the similarity between the 'consensus problem' and EE's "arbiters", demonstrated especially with "sticky bits", a concurrency primitive with infinite consensus number. Winner-take-all networks probably solve this in the brain.

in fact there is a 'univerality of consensus proof' in the original paper (summary at the end of [4]