Bayle Shanks's website: ideas-computer-jasper-jasperCoreNotes

i like Parrot's custom bytecodes

---

types of memory barriers

load/load, load/store, store/load, store/store

"Sparc V8 has a “membar” instruction that takes a 4-element bit vector. The four categories of barrier can be specified individually"

also http://denovo.cs.illinois.edu/Pubs/10-cacm-memory-models.pdf wants a way to specify a barrier that only applies to labeled 'sync operations'

so varieties of memory barriers:

{load, store, *}/{load, store, *}, as sync or as data, and relative to only sync or to only data and i guess each load/store must mark if it is sync or data then and i guess each barrier should also say whether or not mmio (memory mapped io) should be affected ( http://lwn.net/Articles/283776/ ), or whether ONLY mmio should be affected so for example a memory barrier might say "load-sync/*-{sync*data}, cpu+mmio", meaning "don't move any sync loads or stores after this instruction before any sync loads before this instruction, including both cpu and mmio"

isync -- complete all previous instructions up to what is needed before you perform a context switch

incoherent instruction cache flush for self-modifying code

dependency barrier (causes dependencies to be ordered, only needed on Alpha)

command to wait until a write is 'visible' to all other CPUs, and command to wait until all reads from other CPUs are visible here; vs command to do the previous barriers but only w/r/t local reordering this two 'wait' commands in the previous sentence are similar to CRF's Reconcile and Commit , i guess. note that in my formulation, the sender must commit and the receiver must reconcile to ensure transmission. otherwise eventual consistency.

i'll define eventual consistency to say that FIFO consistency may be transiently violated but if you wait long enough and the sender doesn't change the value any more, you'll eventually see their latest value forever

general marker for sections of code dealing with threadlocal vs. shared memory? or is this part of the load/store stuff?

note: acquire/release can be implemented in terms of the above;

" Acquire semantics is a property which can only apply to operations which read from shared memory, whether they are read-modify-write operations or plain loads. The operation is then considered a read-acquire. Acquire semantics prevent memory reordering of the read-acquire with any read or write operation which follows it in program order.

    Release semantics is a property which can only apply to operations which write to shared memory, whether they are read-modify-write operations or plain stores. The operation is then considered a write-release. Release semantics prevent memory reordering of the write-release with any read or write operation which precedes it in program order." -- http://preshing.com/20120913/acquire-and-release-semantics/#comment-20810

a release is load/store + store/store, and and acquire is load/load, load/store.

http://fileadmin.cs.lth.se/cs/Education/EDAN25/F06.pdf 56/70 "Example of Cumulative Ordering"; on Power, there is also an option for some barriers to be 'cumulative' and some to not, which just means transitive, e.g. causal transitivity

???

And a couple of implicit varieties:

 (5) LOCK operations.

     This acts as a one-way permeable barrier.  It guarantees that all memory
     operations after the LOCK operation will appear to happen after the LOCK
     operation with respect to the other components of the system.

     Memory operations that occur before a LOCK operation may appear to happen
     after it completes.

     A LOCK operation should almost always be paired with an UNLOCK operation.

 (6) UNLOCK operations.

     This also acts as a one-way permeable barrier.  It guarantees that all
     memory operations before the UNLOCK operation will appear to happen before
     the UNLOCK operation with respect to the other components of the system.

     Memory operations that occur after an UNLOCK operation may appear to
     happen before it completes.

     LOCK and UNLOCK operations are guaranteed to appear with respect to each
     other strictly in the order specified.

     The use of LOCK and UNLOCK operations generally precludes the need for
     other sorts of memory barrier (but note the exceptions mentioned in the
     subsection "MMIO write barrier").

" -- https://www.kernel.org/doc/Documentation/memory-barriers.txt

 (*) There is no guarantee that a CPU will see the correct order of effects
     from a second CPU's accesses, even _if_ the second CPU uses a memory
     barrier, unless the first CPU _also_ uses a matching memory barrier (see
     the subsection on "SMP Barrier Pairing")." -- https://www.kernel.org/doc/Documentation/memory-barriers.txt

some interesting ideas from various weird x86 features:

https://en.wikipedia.org/wiki/Control_register#CR3

virtual memory mapping/address translation
nested virtualization
trusted computing
https://en.wikipedia.org/wiki/Ring_%28computer_security%29
- flags about what you have to be in a high ring to do: access data from that ring, execute code from that ring, profiling, get the time,

---

x86 has a 'REP' (repeat) prefix for some instructions: http://faydoc.tripod.com/cpu/movsb.htm , which i guess is like a 1-element for loop

http://www.cc.gatech.edu/~rama/Beehive/papers/git.cc.91.51.pdf table 1: hardware primitives (paraphrased):

read without coherence write without coherence read-global read data from shared memory, bypassing cache write-global globally perform the write read-update read data from shared memory and request future updates reset-update cancel the request for updates from shared memory flush-buffer stall this processor until all requests in the write-buffer are globally performed read-lock request a shared lock for a memory block write-lock request an exclusive lock for a memory block unlock release the lock

---

" Popek and Goldberg summarized the concept in 1974: "Formal Requirements for Virtualizable Third Generation Architectures". Communications of the ACM 17  Equivalence/Fidelity  A program running under the hypervisor should exhibit a behaviour essentially identical to that demonstrated when running on an equivalent machine directly.  Resource control / Safety  The hypervisor should be in complete control of the virtualized resources.  Efficiency/Performance  A statistically dominant fraction of machine instructions must be executed without hypervisor intervention. " -- ARMv7-A Architecture Overview, David Brash, Architecture Program Manager, ARM Ltd. www.linaro.org/documents/download/d7fe510d8eb46775afc3953d217b15224fbb93086598a (good read, slides with various details about ARM virtualization)

notes:

ARM has 2 level MMU lookup table (guest OS-controlled, and then hypervisor controlled)
we could have a similar setup; note that nested hypervisors could save time if they're willing to compose the hypervisor lookup part
a way to trap instructions
load/store auto-emulated to use the MMU
register saving between modes
ARM has virtualized interrupts but i dunno what that corresponds to for us
note from x86 that you want to virtualize clocks/timers too
ARM allows a "rich set of trap options (TLB/cache ops, ID groups, instructions)" for choosing which instructions to trap, and "Syndrome register support". Syndrome register seems to be a register that holds an exception especially for the hypervisor? Perhaps for 'exceptions' generated by trapping instructions? "Emulation of trapped instructions through syndromes"