proj-oot-ootAssemblyNotes25

(June 30 2019)

So i'm mostly happy with Boot right now (even though it's not fully implemented or specified and what is implemented and specified is not quite in agreement with my latest plans esp. regarding redoing the jump label stuff to make stuff compilable; i do plan to fix that sometime soon), and i've stopped focusing on it and moved up to thinking about OVM.

I still have two issues/decisions to make about potentially changing Boot though. I'm trying to think if we could make things even simpler:

Both of these potential changes come out of the same line of thinking:

Now that i'm thinking about OVM, Boot's function as an easy-to-implement portability layer is coming into focus. If people want something really easy to implement, they will implement Boot, not BootX?. And if they want to do more work to get better interop or performance, they will implement OVMlow, not BootX?. So really, the purpose of BootX? is some sort of project-internal efficiency or readability thing, not something that porters will really be concerned with. Except, right now if they want to add floating point functionality, or 64-bit int functionality, or many of the syscalls, they have to switch to BootX?. Which is why i'm thinking of (a) making Boot an all-inclusive, one-stop-shop for porters; so that if they implement Boot, they're done; and (b) making Boot even simpler by switching to the 32-bit representation.

In essence, 'Boot' would be its own layer, and BootX? (although still separately specified and probably separately useful for other projects) would from the point of view Oot porters be though of as 'part of that OVM layer stuff that we don't have to think about'.

Otoh, there's also something to be said for the current setup:

Either way though, i'm pretty happy with Boot. Which is why i'm thinking more about OVM.

---

For source code without any atomics/synchronization/fences (for example "a = 3; b = a + 2; print(b)"), i'd like to compile this code in the most obvious way on each platform (for example, on many platforms the previous example could compile to placing a constant value into a location, adding another constant value to that location, and then making a function call with the location as argument, with no fencing/memory barriers/synchronization at all). But if my language's memory ordering consistency model is stronger than that of some target platform, then in order to deliver the guarantees of my language's memory ordering consistency model, in some cases i may have to insert some sort of synchronization into the compiled code on that target platform.

To avoid having to think about that when porting to a new platform, if possible, i'd like to choose a sufficiently weak memory ordering consistency model for my language, so that on all of the likely target platforms the model's guarantees are delivered without me having to further understand the target platform's memory consistency model, at least for source code without synchronization. (in fact, the previous sentence is actually a little optimistic; the pragmatic reality is that i won't understand the target platform's memory consistency model very well and so i will just compile all unsynchronized code 'in the obvious way' without any thought at all, regardless of whether or not that fits the guarantees of the language's model; i'm just trying to choose a memory consistency model for the language the justifies this behavior, so that users of the language aren't surprised. Of course i could just say "the ordering of events (as observed by any thread except the thread that generated the events) is completely undefined in the absence of explicit synchronization" but that seems a bit much.).

Some major target platforms are RISC-V, C, ARM, x86-64.

So i figure that what i should do is make my language's memory consistency model something close to the strongest thing that is still weak enough such that meeting the guarantees of any of those major target platforms' memory consistency models implies that the guarantees of my language's memory consistency model are met.

So which one is weakest? Rather, are any of the those platforms' memory ordering consistency models weaker/stronger than any of the others for code without atomics, fences, memory barriers, or synchronization. And more precisely, given an arbitrary piece of unsynchronized code (perhaps barring pathologically platform-specific code), if the same code were translated into each of these platforms, the certain guarantees on ordering would apply on each platform depending on which memory ordering consistency model that platform offers -- is there any pairs of the platforms in this list such that one member of the pair will always satisfy not just its own guarantees, but also all of the guarantees promised by the other member of the pair?

eh, i think that's not QUITE my question, although it mostly is.

First, i'm willing to accept a not-completely-relaxed memory model provided that it seems like it's something that "everyone's doing" (that is, it's common enough that if i end up porting to a platform with a weaker model then i'll probably know, because the fact that this platform has such a weak model will be something is so unusual that people comment on it).

[1] suggests that ARM has Release Consistency (RC), and that link and also [2] suggest that x86 has something 'close to TSO'. And i sorta recall from the RISC-V discussion that TSO is stronger than RC. And https://github.com/riscv/riscv-isa-manual/blob/master/src/memory.tex says that RISC-V's memory model is called RVWMO and is a variant of release consistency.

So that suggests that, yeah, RISC-V's memory model is the weakish one that i seek.

The current RISC-V spec draft says: https://github.com/riscv/riscv-isa-manual/releases/download/draft-20190626-9f0e234/riscv-spec.pdf

"although the ISA does not currently contain native load-acquire or store-release instructions, nor RCpc variants thereof, the RVWMO model itself is designed to be forwards-compatible with the potential addition of any or all of the above into the ISA in a future extension. "

so is it really even a RC model?

The RISC-V FENCE instruction talks about whether reads or writes are being ordered, but not about acquire vs release.

Which C11/C++11 memory order corresponds to RC? You'd think it would be using memory_order_acquire and memory_order_release, or mb memory_order_acq_rel.

Is a RISC-V FENCE a C11 memory_order_acq_rel?

The RISC-V memory model also lets you choose between the PC (processor consistency) and SC (sequential consistency) variants of RC (that is, are the RC atomics ordered with respect to each other using PC or SC?).

Confusingly, in https://stackoverflow.com/questions/16179938/c11-memory-order-acquire-and-memory-order-release-semantics , two people think that the C folks totally got the meaning of the RC memory model wrong; but looking at https://en.cppreference.com/w/cpp/atomic/memory_order , it looks like the wording they are complaining about is not there, so perhaps it was somehow fixed already?

Also, https://en.cppreference.com/w/cpp/atomic/memory_order#Release-Consume_ordering states that Release-Consume ordering (dependency ordering) is implicit "On all mainstream CPUs other than DEC Alpha". Does this mean that it holds between all memory accesses even ones in code with no synchronization (my guess is, yes, it does mean that). Does RISC-V share this property?

So some other questions (todo read back over this section and add more):

(a little later): ok i researched some more and got some more info: